Types of Data
The Obvious data. Data stored in modern databases, reporting tools and such. This is the most common type of storage we think of when we talk about data and data analytics.
The Dark data. This is the data that is less accessible and is often distributed across many hardware nodes (such as servers). This data is never included in analytics simply because it is very hard to access, collect and gather into one central storage area to run analytical tools on.
Lost data. The data you don’t keep or keep overriding with the latest revisions. This data simply doesn’t exist unless you start taking action and collect it.
The value of data
There is so much talk about data and the value of it out on the web it is difficult to add any more value to the subject these days. The value comes from answering questions. It is the questions that we have as an organization that lead to us asking for the data. Collecting data just for the sake of having fancy reports is useless if it never answers the questions we had in the first place.
As a business, you may HAVE all the data but you may not be dealing with it in the right way. Often changes may need to be made to consolidate all the data so your valuable data is all in once place, making it easier to run reports and get real information from it. There’s no point hoarding data if you’re not going to use it.
The value of centralization. Putting all the answers on one platform. It may seem trivial but there are so many solutions implemented by different departments within the same organization. Some times the answer is no, we have different answering platforms (reporting system) for different questions.
The value of getting an answer for questions you never dared to ask. There are questions you have never been able to ask because you simply didn’t have the data to answer them. These questions may occur often or more rarely, but in general, you have never had accurate and data driven proof of it before.
The value in discovering new questions. Too often when given a solution, new areas of thinking are born. This is simply the way the human brain works, you need a small trigger to open up a world of opportunities. It is the same with data, given the right platform (to ask questions), many new questions arise and your data and the value of it become far more interesting.
How to use it effectively?
Start with what you currently have. There is no point rushing into decisions regarding a new solution. In many cases you already have the answers right in front of you but you just need to know how to derive the answers from your existing platform. It is very important not to start collecting data for the sake of it. Instead, start by asking quality questions. Once you know what you need it is easy to find the data to answer it. In most cases you already have it somewhere in your infrastructure, it’s just a matter of extracting it and putting it to use.
If you cannot get the data you are after it is time to look for ways to collect it. Many times the data is already there but it is likely to be in a format that is hard to use (like in files or system logs). It is time to look for a strategy that will make this data useful again. If the data simple isn’t available then you may want to look at ways to improve your internal systems and start collecting it. The focus then becomes to collect data that will be used for answering questions, not just for the sake of collecting it.
Purge old data that is not needed anymore. Data that is past its relevant date (like after 7 years). Old data takes up space on your system and slows down your systems. Old data can prevent you from expanding your data collections as you have to support old structures or objects that are long gone and no longer used. Old data appears in reports (or has to be addressed to be ignored) so it slows down your reporting tools and can confuse users by bombarding their reports with futile information.
Many organizations have too many BI tools, each for a different purpose. Collecting data from different sources (systems) is highly inefficient and makes organizations resistant to adopting new tools (as they already have too many). In this case you need to become more effective and centralize all your data into one repository (once repository for all internal systems). Collecting data in this way is very efficient and allows you to ask questions that span over many different data objects and types.