Data analytics

A business manager asked: “What is the difference between Big Data analytics and Traditional analytics? We have been doing analytic for years and I do not see why big data is that important.”

Answer: There is a difference between Big Data analytics and Traditional analytics. Traditional analytics is based on business requirements where the data is defined, structured, collected, and analyzed into information about business performance so they can be compare with past information. For example sale this month is much less than last month or the company used more electricity this year than last year. Using this information management can make decisions regarding the operation of the company such as profits, quality, productivity, competitive, or wastes etc. Traditional data analysts identifies what data they want to collect and stores them in excel spreadsheet or database where they can be analyzed by statistical modeling tools to create business intelligence reports.

Big Data analytics is about predicting trends and patterns on what may happen in the future based on data from many sources (Variety). Since the data are not defined, some may be texts, pictures, video, or symbols (Bar code) etc. it requires a completely different minds set and approach. Because there are so many of them from so many sources, the data are huge (Volume) and it changes often (Velocity) so it is very difficult to collect and analyze using traditional methods. Because these data are not defined, some are structured and some are not, they cannot be stored in spreadsheets or database but have to be “re-modelled” and organized differently to see which information or patterns can be collected and identified for future predictions. For example, online business may collect unstructured data from social media to determine what products are being mentioned often by users; which trends are emerging, or which advertises are effective due to the number of user’s click etc.

Traditional business intelligence’s tools are designed for structured data such as text and number but NOT unstructured data such as bar code and pictures. You will need different tools, different algorithms and mathematical models for big data analysis because there are many sources from which data can be analyzed (variety). For example in traditional insurance analysis, you can capture risk factors based on a set of questions i.e., ages, health, accidents, values etc. Now for each question, there is many data available based on the user’s interface through social media, mobile devices etc. This information will definitely influence the final outcome (calculating the risk factor). Since these data are not in one place, it is unstructured and is being generated at every given moment through multiple sources and in huge volume therefore modelling tools will have to change and incorporate machine learning technology in order to capture all this.

Sources

  • Blogs of Prof. John Vu, Carnegie Mellon University