Summary: This article discusses about the data explosion, knowledge discovery and more importantly answer the question what is data mining by introducing some data mining definitions.
Nowadays corporate and organizations are accumulating data at an enormous rate and from a very broad variety of sources such as customer transactions, credit card transactions, bank cash withdrawal to hourly weather data. A lot of relational database servers have been built to store such massive quantities of data. To put the data into the database servers, online transactional process (OLTP) systems have been developed to help business run smoothly based on their own business processes. Those OLTP systems stores all the transactional data into the database for every transaction happens to the business in every second such as sale orders, purchase orders in sale to head count data in human capital management. To enables the top executives to make decisions faster based on facts, online analytical processing (OLAP) systems such as data warehouses have been developed rapidly recently. There are a vast amount of data is recorded in the OLTP systems and pushing to OLAP systems for reporting purpose. As the matter of fact, the data itself is critical to a company’s growth. It contains knowledge that could lead to important business decisions that bring business to the next level. These data is never been examined in a superficial manner. It is becoming data rich but knowledge poor.
We need information but what we have is a huge amount of data flooding around companies, organizations even individuals. Because of the amount of data is so enormous that human cannot process it fast enough to get the information out of it at the right time, the machine learning technology has been established to solve this problem potentially.
Knowledge discovery is a process that extracts implicit, potentially useful or previously unknown information from the data. The knowledge discovery process is described as follows:
Let’s examine the knowledge discovery process in the diagram above in details:
- Data comes from variety of sources is integrated into a single data store called target data
- Data then is pre-processed and transformed into standard format.
- The data mining algorithms process the data to the output in form of patterns or rules.
- Then those patterns and rules are interpreted to new or useful knowledge or information.
The ultimate goal of knowledge discovery and data mining process is to find the patterns that are hidden among the huge sets of data and interpret them to useful knowledge and information. As described in process diagram above, data mining is a central part of knowledge discovery process. Let answer the question “what is data mining?” by examining several data mining definitions.
What is Data Mining – Data Mining Definitions
The data mining definition appears on the first papers on commercial data mining is defined as:
The process of extracting previously unknown, comprehensible and actionable information from large databases and using it to make crucial business decisions - Simoudis 1996.
This data mining definition has business flavor and for business environments. However, data mining is a process that can be applied to any type of data ranging from weather forecasting, electric load prediction, product design, etc.
Data mining also can be defined as the computer-aid process that digs and analyzes enormous sets of data and then extracting the knowledge or information out of it. By its simplest definition, data mining automates the detections of relevant patterns in database.