Data mining

A student asked: “What Is Data Mining? How does it differ from database administration? I want to be a Data Mining Analyst, where could I learn about this field?

Answer: Data Mining can be defined as the exploration of very large databases through the use of special­ized tools and processes. The purpose of the data mining is to extract useful information from the data, and provide that information to managers or decision making people to use in business intelligence, or forecasting etc.

Data mining is the application of statistical analy­sis techniques used for extraction, retrieving and exploring raw data then analyze them and incorporate them into useful information using computer software for faster processing. Database administration is the maintaining of records by storing, updating, of many types of data using computer software known as a database management system (DBMS).

To learn about data mining, you need to have knowledge of the business domain, understand database and how it works, having data analysis skills and some techniques for filtering and cleansing data, measuring the quality of data, and dealing with missing data. Data mining is an advanced course taught mostly in the Master degree of Information System Management.

There are number of data mining algorithms and tools exist already, each has certain advantages and supports but learning to use a tool is easy. To be a good data analyst or data scientist, you will need to understand the process of data mining, its estimation models. You must know how to compare and select which techniques are appropriated for what you do. My advice is to try as many different techniques as possible so you area familiar with all of them and learn how to prepare the data for analysis, because it is a time-consuming task.

Sources

  • Blogs of Prof. John Vu, Carnegie Mellon University