What is Data Warehouse

Summary: in this article, we will discuss what is the data warehouse, the history of the data warehouse, and its benefits.

What is a data warehouse? Some popular data warehouse definitions

What is data warehouse

“A warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management’s decision-making process”

The terms of the data warehousing definition above are explained as below:

  • Subject-oriented: Data in an organization is organized in major objects or business processes. The common example of subject-oriented data is customer, product, vendor, and sale transaction.
  • Integrated: Data warehouse integrates data from various sources across departments within the organization. Different data sources can have different ways to define a specific object, for example, a product. However, in a data warehouse, there must be only one definition of products. This is achieved by using name conflict resolution in the data warehouse. And when we achieve this we say the data is integrated.
  • Non-volatile: Data in the data warehouse is not subject to change. Once placed in a data warehouse, data is not updated.
  • Time-variant: Data in data warehouse associates with time. The time can be a single moment or span of time. In many cases, the data is only relevant to the time.
  • Support management’s decision-making process the outcomes of a data warehouse are helping to make decisions based on historical data or facts. Then from business decisions, business processes can be optimized to increase efficiency and effectiveness.

Ralph Kimball defined data warehouse much simpler in his “The Data Warehouse Toolkit” book. According to Kimball, a data warehouse is “a copy of transaction data specifically structured for query and analysis“. This data warehouse definition provides less depth and insight than Inmon’s but no less accurate.

History of data warehouse

The data warehouse was developed in the late 1980s to meet growing demands for data analysis and information management that could not be achieved by operational systems. Because the operational systems were designed in such as way that optimizes for transactions only and the number of operational or transaction systems were growing quickly across departments inside an organization that makes the data integration more difficult. This created problems of data redundancy, data integration, analysis, and performance in reporting.

As a result, a separate system called a data warehouse is designed to solve those problems. Data warehouse systems can bring data from various source systems such as relational data management systems, flat files, spreadsheets, even remote data sources outside the organization. This data then is organized in such a way that optimized for reporting purposes. User-friendly reporting tools provided by the data warehouse system enable business users and decision-makers to access data in the form of useful information with ease of use.

Data warehouse benefits

There are many benefits that data warehouse brings to organizations:

  • Keep history data for analysis even if the source systems do not maintain historical data.
  • Allow center point of accessing data across enterprises.
  • Improve data quality by cleansing and transforming data when loading it into the data warehouse.
  • Give business users or decision-makers “a single version of the truth”  i.e. information is presented consistently.
  • Provide information instead of data to business users and decision-makers.
  • Provide optimized query performance without impacting the operational systems.
  • Provide information to improve the business processes.

In this article, we’ve examined Kimball and Inmon’s data warehouse definitions so you have an overview of what is the data warehouse. We also discussed the history of data warehouse and benefits it brings to organizations. If you want to know what is data warehouse from a system architecture point of view, check out the data warehouse architectures section.