Summary: This article discusses data explosion, knowledge discovery, more importantly, answers the question of what is data mining by introducing some data mining definitions.
Nowadays corporate and organizations are accumulating data at an enormous rate and from a very broad variety of sources such as customer transactions, credit card transactions, bank cash withdrawals to hourly weather data.
A lot of relational database servers have been built to store such massive quantities of data. To put the data into the database servers, online transactional process (OLTP) systems have been developed to help the business run smoothly based on their own business processes. Those OLTP systems store all the transactional data into the database for every transaction that happens to the business every second such as sale orders, purchase orders in the sale to headcount data in human capital management.
To enable the top executives to make decisions faster based on facts, online analytical processing (OLAP) systems such as data warehouses have been developed rapidly recently. There are a vast amount of data is recorded in the OLTP systems and pushing to OLAP systems for reporting purposes.
As the matter of fact, the data itself is critical to a company’s growth. It contains knowledge that could lead to important business decisions that bring business to the next level. The data is never been examined in a superficial manner. It is becoming data-rich but knowledge-poor.
We need information but we have been a huge amount of data flooding around companies, organizations even individuals. Because the amount of data is so enormous that human cannot process it fast enough to get the information out of it at the right time, machine learning technology has been established to solve this problem potentially
Knowledge discovery is a process that extracts implicit, potentially useful or previously unknown information from the data. The knowledge discovery process is described as follows:
Let’s examine the knowledge discovery process in the diagram above in details:
- Data comes from a variety of sources is integrated into a single data store called target data
- Data then is pre-processed and transformed into the standard format.
- The data mining algorithms process the data to the output in the form of patterns or rules.
- Then those patterns and rules are interpreted to new or useful knowledge or information.
The ultimate goal of the knowledge discovery and the data mining process is to find the patterns that are hidden among the huge sets of data and interpret them to useful knowledge and information. As described in the process diagram above, data mining is a central part of the knowledge discovery process. Let’s answer the question “what is data mining?” by examining several data mining definitions.
What is Data Mining – Data Mining Definitions
The data mining definition appears on the first papers on commercial data mining is defined as:
The process of extracting previously unknown, comprehensible and actionable information from large databases and using it to make crucial business decisions – Simoudis 1996.
This data mining definition has a business flavor and for business environments. However, data mining is a process that can be applied to any type of data ranging from weather forecasting, electric load prediction, product design, etc.
Data mining also can be defined as the computer-aid process that digs and analyzes enormous sets of data and then extracting the knowledge or information out of it. By its simplest definition, data mining automates the detections of relevant patterns in the database.