Big Data: The new frontier

Big Data can be described as “The new technologies designed to extract value from very large volumes of a wide variety of data by enabling high-velocity capture, discovery, and analysis.” The potential applications of big data analytics is still growing with new ideas, new applications and new values. An industry analyst wrote: “Big data is very similar to the Internet twenty years ago. When the internet was invented, only few people know what it could do or what may happen. The same thing is happening now as few people know what it could do and what may happen. If you look back to the beginning of internet, you saw few companies such as Google and Amazon were able to seize the opportunities and captured the market and how everything is changed. The same thing is happening now with Big data analytics and if you learn more about it and seize the opportunities, you will do very well. If you ignore it, you will miss a big opportunity and probably will not survive in the near future.”

Today Big data is beginning to impact many things, as more companies are realizing this potential and quickly implementing Big data analytics in marketing, sales, and operations. For example, the large retail store Target used big data analytics to capture a lucrative market: new parents. They know that new parents always spend more money shopping for their newborn baby so they collects data from its customers who are buying vitamins, lotion, clothes, towel, and things that pregnant women often buy and use special algorithms to determine the likelihood that certain customers might be pregnant. When Target identifies customers, they send them special coupons and discounts before they even have the baby. (For example, 20% discount on baby clothes, baby furniture, toys etc.) This help develops good relationships with new parents so they will shop at Target and not at other stores. By using Big data analytics, today Target captures an extremely profitable market as most new parents prefer to shop at Target.

Best Buy, another big electronic retail store also use big data analytics to increase sales. Best Buy uses special mobile app called “ShopSavvy” to communicate with customers when they are interested in buying something. This app allows customers to compare prices from Best Buy to competitors’ stores. When customers begin to compare prices, the software immediately tracks the store customers are in, identify what they want to buy, and check the identity, financial credit scores, and other information to determine if customers could pay for the merchandise. If they have good credit, Best Buy immediately offers a price match or special discount to make sure customers do not buy from other stores. The Big Data at Best Buy is built on a Hadoop cluster installed with special software to scan all competitors’ prices in real time to make sure that it always has the best price possible. Best Buy also collects information on how many stores are selling the same products, and how they price their products. Every time, a customer is identified, store manager must takes action to make sure that customer does not leave to buy from other stores by match the competitor’s price or offer other incentives such as deliver and install them at customer’s home at no extra charge etc.

Google uses big data analytics to identify flu outbreaks in the U.S. when it first happens when the Centers for Disease Control and Prevention (CDC) needs about two weeks to do it because it relies on reports from some regions before making decision. Google can do it faster because each day it receives more than three billion search queries. By using big data analytics, Google can identify a particular region where more people are searching for the terms “Flu” and uses complex algorithms to show a strong correlation between the number of search and the actual flu outbreak.

Online company eBay uses Big data analytics to identify talent loss to prevent workers from leaving the company. Their big data analytic software scans company’s employee records to look for worker who has been in a job for more than three years but has not been promoted, changed roles, or increase in wages to conclude that there is a higher probability that they may leave for another company. Since there is a shortage of skilled workers in the industry and the cost of finding and hiring workers are expensive, eBay must keep these workers happy. When a list of potential employees who may leave is identified, managers must act quickly to review and do whatever they can to prevent it from happening.

Since Big data analytics is a new field, many students are asking how they can get in this high demand jobs. To work in this area, you need to have at least a Master degree in Data Science or equivalent such as Master in Software Engineering or Master in Information System Management.

Big Data Scientist must be a domain expert who has the ability to explain how information analytics can help business leaders to make appropriate decisions in real time. Therefore, Big data scientist must understand the business processes across the company, from marketing, sales, distribution, operations, pricing, products, finance, risk, etc. The Big data scientist must be a database expert who has good understanding of external and internal data sources, how they are collected and stored. (That is why students in Information System Management are better fit in this field).

The Big data scientist must be able to extract, transform and load these data stores from internal source as well as retrieve data from external sources such as internet, social media or other sources then manipulate them using Hadoop, Hive, Pig, MapReduce, Mahoot etc. to analyze the data and generates special reports where special insight values are identified. This is NOT similar to traditional database and business intelligence techniques because it deals with massive amount of data from several sources in which many data are structured and unstructured (Note: Traditional business intelligence only deals with defined structured data stored in the database and focus on past data). Since Big data are dealing with prediction which is in real time or the future, Big data scientist must be able to determine the most appropriate statistical techniques for addressing the possibility. Big data scientist must be able to apply relevant techniques, and translate the results and generate “insights reports” in such a way that company leaders can understand and act very fast to capture the value. This require Big data scientist to have a thorough understanding of statistics (e.g., regression analysis, cluster analysis, and optimization techniques) techniques and the tools and languages used to run the analysis such as “SAS” or “R”. To do that Big data scientist must be able to write special software who implement computational techniques such as machine learning, natural language processing, graph/social network analysis, neural nets, and simulation modelling. Most of these applications are written in a variety of languages such as Java, Python, C++, Math lab and R. (That is why Software Engineering students are better fit in this field)

Sources

  • Blogs of Prof. John Vu, Carnegie Mellon University