The star schema is central to dimensional modeling. This is a non-standardized data structure suitable for the creation of complex business queries.

Star schema

Definition

The star schema is a logical, non-normalized schema which is used for consolidating, historizing and aggregating numeric data.

The role of dimensional modeling is to represent facts (which are measurable) and to allow them to be aggregated according to different dimensions. Often, the analogy to a data cube is evoked where each brick inside the cube represents a measurement that is observed along the axes or dimensions represented by the edges of the cube. The figure below shows, for example, that we can measure the sales revenue for a product sold at a given time in a given location.

However, the cube representation cannot show more than three dimensions; this is why a star schema consists of a central fact table and dimensions, which are linked by primary and reference keys (see figure below).

Dimension tables are used to describe business entities, for example products, stores, dates. They contain columns used to describe these entities. Fact tables contain measures used to describe business events, for example the number of products sold or the turnover generated.

To return to the definition given at the start of the paragraph:

  • Star schemas are “logical” schemas, as opposed to both physical schemas focusing on the representation of the implementation of data structures and conceptual schemas which are independent of the type of database used (relational, document-oriented, etc.).
  • These schemas are said to be “non-normalized” in the sense of the normal forms defined by the relational model. In a star schema, redundancies are deliberately added in order to optimize the speed of execution of complex requests. These redundancies are located at the dimension level.

When several star schemas share dimensions (see conformed dimensions), this is described as a fact constellation schema.

Reading Star schema: Star Schema. Christopher Adamson. pages 10-11

Reading Star schema: Agile Data Warehouse Design: Collaborative Dimensional Modeling, from Whiteboard to Star Schema Laurence Corr, and Jim Stagnitto. pages 7-8

Fact / Measure

A fact or measure corresponds to a value in a row of a fact table.

Reading Fact: Star Schema. Christopher Adamson. pages 6-9, 12

Reading Fact: Agile Data Warehouse Design: Collaborative Dimensional Modeling, from Whiteboard to Star Schema Laurence Corr, and Jim Stagnitto. pages 8-9

Dimension

A data dimension is a means of analysis that can be used to consider a measure. Dimensions enable navigation within the different levels of granularity.

Let’s return to the issue of redundancy by considering an example featuring a location dimension which characterizes the geographical environment of a store. We are interested in the city in which the store is located, its region, and its country. All stores located in the same city will therefore have the same region and country values. However, in this case, we decided to repeat all of the city, region and country values in each row of the dimension.

Reading Dimension: Star Schema. Christopher Adamson. pages 6-9, 10-11

Reading Dimension: Agile Data Warehouse Design: Collaborative Dimensional Modeling, from Whiteboard to Star Schema Laurence Corr, and Jim Stagnitto. page 9

Snowflake schema

A snowflake schema bears this name because it resembles a snowflake when each of the dimensions is fully normalized (see below).

A snowflake schema is designed to reduce redundancy by normalizing the data. Returning to the previous example concerning our store, this would consist in choosing to retain information about the city only, and to export the information about the region, etc. into another dimension.

Consequently, a snowflake schema is similar to the normalized logical schemas (generally in 3NF) used in the OLTP world.

Reading Snowflake schema: Star Schema. Christopher Adamson. pages 11, 157-160

Reading Snowflake schema: Agile Data Warehouse Design: Collaborative Dimensional Modeling, from Whiteboard to Star Schema Laurence Corr, and Jim Stagnitto. pages 5-6, 156

Star schema versus snowflake schema

These schemas reflect a clash of two different worlds. Snowflake schemas are designed to normalize data and thereby reduce redundancy, whereas business-process-oriented star schemas introduce a degree of redundancy that is useful in reducing the number of drills across to be performed.

In decision-support contexts, snowflake schemas should therefore be avoided because:

  • they are hard to understand,
  • they are not business-process oriented,
  • they are inefficient due to the numerous drills across to be performed,
  • the queries become increasingly complex to write.