Database Selection & Design (Part IV)
— 5 V Properties of Big Data —
This internet resolution has caused data explosion from digital and social media, data is rapidly being produced in such large chunks. The amount of data the organizations gathering on a daily basis skyrocketed tremendously. This has become a great challenge for organizations to store and process it using conventional methods of business intelligence and analytics. The applications has to keep up with this growth. They have to accept, process and perform analytics on this data. Enterprises must implement modern tools to effectively capture, store and process such unprecedented amount of data in real-time.
Your design should consider this as a key requirement to analyze the current density of data and also review the forecasting of the data growth for the next few years. This influences the type of database you will need to run your application. This also influences the way you are going to store, partition and replicate.
Velocity of the data means how quickly you can have the available data for your processing. With IoT (Internet of things), the validity of data changes quickly with time. Sometimes it’s better to have limited data in real time than lots of data at a low speed. The backlog of traffic information in a specific location changes with time quickly. If this information is not gathered and processed timely, you might be serving information to your customers based on stale information, which may not be acceptable. So you may need your information to flow quickly — as close to real-time as possible.
This data is being generated from many different sources in so many different formats. Few are structured, semi-structured and most are unstructured.
- Structured data is one whose format, length and volume are clearly defined
- Semi-structured data is one that may partially conform to a specific data format
- Unstructured data is unorganized data and doesn’t conform with the traditional data formats.
A company can obtain data from many different sources: from in-house devices to smartphone GPS technology or what people are saying on social networks. The data formats range from plain text to videos, images, pdfs, reports etc. The structure of your data dictates how you need to store and retrieve your data. Understanding the structure of your data is key in selecting the database. Not all databases in the industry support all type of data structures.
The Veracity of big data or Validity, as it is more commonly known, is the assurance of quality or credibility of the collected data.
- Can you trust the data that you have collected?
- Is this data credible enough to glean insights from?
- Should we be basing our business decisions on the insights garnered from this data?
All these questions and more, are answered when the veracity of the data is known. Since big data is vast and involves so many data sources, there is the possibility that not all collected data will be of good quality or accurate in nature. Hence, when processing big data sets, it is important that the validity of the data is checked before proceeding for processing.
This refers to your ability to transform your data into an actionable business value. Gathering large amount of data becomes useless, if you don’t utilize that for any business processing and increase the value of the line of business. This is where big data analytics come into the picture. While many companies have invested in establishing data aggregation and storage infrastructure in their organizations, they fail to understand that the aggregation of data doesn’t equal value addition. What you do with the collected data is what matters. With the help of advanced data analytics, useful insights can be derived from the collected data. These insights, in turn, are what add value to the decision-making process.
Information derived from high volume, high velocity and validated data collected from varied sources can add value to the overall decision-making of the company. While most organizations today do have the intent to use data, many are struggling to effectively capture, store, process or harness it. Your design should be in such a way that this should be very seamless for business with ever changing dynamics of the user behaviors.
Link to the next part in this series: