Data warehouse design

Mathematics: Special Issue 'Big Data: Multidimensional Design and Modeling', deadline 30 September 2021
Mathematics: Special Issue "Big Data: Multidimensional Design and Modeling"

What is a data warehouse?

According to the Wikipedia, a data warehouse:

In computing, a data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for reporting and data analysis. DWs are central repositories of integrated data from one or more disparate sources. They store current and historical data and are used for creating analytical reports for knowledge workers throughout the enterprise. Examples of reports could range from annual and quarterly comparisons and trends to detailed daily sales analyses.

The data stored in the warehouse is uploaded from the operational systems (such as marketing, sales, etc., shown in the figure to the right). The data may pass through an operational data store for additional operations before it is used in the DW for reporting.

Diagram of a data warehouse
Source: Data Warehouse Overview (Wikipedia)

In the early nineties, Bill Inmon coined the term data warehouse:

A subject-oriented, integrated, time-variant, nonvolatile collection of data in support of management's decisions.

On the other hand, Ralph Kimball concisely defines a data warehouse as:

A copy of transaction data specifically structured for query and analysis.

Ralph Kimball provides a more precise definition by means of requirements:

  1. The data warehouse provides access to corporate or organizational data.
  2. The data in a data warehouse is consistent.
  3. The data in a data warehouse can be separated and combined by means of every possible measure in a business (the classic slice and dice requirement).
  4. The data warehouse is not just data, but also a set of tools to query, analyze, and present information.
  5. The data warehouse is the place where we publish used data.
  6. The quality of the data in the data warehouse is a driver of business reengineering.

According to the Wikipedia, a data mart:

A data mart is the access layer of the data warehouse environment that is used to get data out to the users. The data mart is a subset of the data warehouse that is usually oriented to a specific business line or team. Data marts are small slices of the data warehouse. Whereas data warehouses have an enterprise-wide depth, the information in data marts pertains to a single department. In some deployments, each department or business unit is considered the owner of its data mart including all the hardware, software and data. This enables each department to use, manipulate and develop their data any way they see fit; without altering information inside other data marts or the data warehouse. In other deployments where conformed dimensions are used, this business unit ownership will not hold true for shared dimensions like customer, product, etc.