Dimensions act as data dimensions for measures according to different properties. How should we proceed when elements in this catalog change slowly over time?
Prerequisites

- All the fundamental concepts

Context

Data dimensions enable us to analyze and explore our data. A dimension can be seen as a catalog in which “objects” are listed: lists of customers, products, and observation periods. These data catalogs are not definitive: the data may change. These changes may vary: from the correction of an error, or an occasional change concerning a small proportion of the data, to changes occurring rapidly or concerning larger volumes of data. In this section, we shall be focusing on infrequent changes that are managed by slowly changing dimensions.

Three types of slow changes are possible, corresponding to different historization needs:

  • Type-1 change - The value is overwritten (no historization)
  • Type-2 change - Each modification is backed up
  • Type-3 change - The latest modification is backed up

Let’s use the example of an Order_facts fact table designed to compile information about orders.


Source du schéma : Star Schema. Adamson, Christopher. Osborne/McGraw-Hill, 2010, page 30.

Reading Slowly changing dimension: Star Schema. Christopher Adamson. pages 44-46, 171

Reading Slowly changing dimension: Agile Data Warehouse Design: Collaborative Dimensional Modeling, from Whiteboard to Star Schema Laurence Corr, and Jim Stagnitto. pages 84-93

Type-1 change: the value is overwritten

Let’s take the customer Johnson Sue (1499). Her date of birth in the system is incorrect. Date of birth is generally of the FV (Fixed Value) type, i.e. it is not supposed to change. In this situation, we need to correct the value without saving the history (see figure below). The former value is therefore overwritten. This same approach can be applied to CV (Current Value)-type attributes whose history does not need to be saved.

Finally, it can be observed that this modification has no impact on the fact table. However, if aggregates have been pre-calculated (e.g. for all customers in a given age category), they will need to be recreated.

Source du schéma : Star Schema. Adamson, Christopher. Osborne/McGraw-Hill, 2010, page 47.

Reading Type-1 change (CV): Star Schema. Christopher Adamson. pages 46-48, 171

Reading Type-1 change (CV): Agile Data Warehouse Design: Collaborative Dimensional Modeling, from Whiteboard to Star Schema Laurence Corr, and Jim Stagnitto. pages 84-88

Type-2 change: addition of a new row for each modification

Let’s return to the same customer Johnson Sue (ID 1499), who is now moving from Arizona AZ to California CA. After discussion with business experts, it has been deemed important to save the complete history of changes in customers’ addresses when their state of residence changes. In fact, certain queries are specific to the performance of stores located in different geographical zones. This makes it important, for a given period, to identify all customers who used to live in a given US state.

The state attribute is therefore of the HV (Historic Value) type. To manage changes in this attribute, we shall create a new row in the dimension:

  • With the new surrogate key value: the customer Johnson Sue will therefore have 2 associated keys: 1499 and 2507, corresponding to each different address.
  • To establish a link between these 2 rows, we use the business key (BK) customer_id, which retains the value 9900011.

Source du schéma : Star Schema. Adamson, Christopher. Osborne/McGraw-Hill, 2010, page 49.

In this type of change, there is no need to recreate the aggregates in the fact table: nothing changes for the previous data. However, from now on, all orders placed by the customer Johnson Sue will be processed with her new identifier 2507.

However, attention must be paid to this type of change. Each modification leads to the addition of a new row. Caution should therefore be exercised with regard to the size of the dimension and its growth. If the dimension becomes too large, the solution applicable to large changing or rapidly changing dimensions should be adopted (see below).

Finally, this brings us back to the surrogate key concept. We have seen previously that these keys are useful in insulating us from the operational system, but here, we will discover that they are essential to the management of type-2 changes.

Reading Type-2 change (HV): Star Schema. Christopher Adamson. pages 48-51

Reading Type-2 change 2 (HV): Agile Data Warehouse Design: Collaborative Dimensional Modeling, from Whiteboard to Star Schema Laurence Corr, and Jim Stagnitto. pages 86-92

Type-3 change: addition of a column to back up the latest modification

Let’s consider a company that has divided its business territory into two regions: East and West. This company’s growth and reorganization have required it to subdivide these regions in a more detailed manner (Northeast, Southeast, etc.).

The figure below shows that the allocated zone of certain customers has changed. This type of change will occur rarely in the company’s history and the business experts only want to be able to observe the impact of this organizational breakdown. They therefore need to be able to submit queries according to both the old and new geographical breakdowns. The attribute concerned - region - is therefore of the PV (Previous Value) type, meaning that only the previous value can be retained.

The implementation of a type-3 change consists in creating two region attributes, one for the current value: region_current and the other for the previous value to be historized: region_previous. This type of change should be used sparingly, as it can very quickly increase the number of columns in the dimension.

Source du schéma : Star Schema. Adamson, Christopher. Osborne/McGraw-Hill, 2010, page 183.

At the fact-table level, it becomes necessary to recalculate all the aggregates (according to the current or previous perspective that you want to observe).

Reading Type-3 change (PV): Star Schema. Christopher Adamson. pages 180-186

Reading Type-3 change (PV): Agile Data Warehouse Design: Collaborative Dimensional Modeling, from Whiteboard to Star Schema Laurence Corr, and Jim Stagnitto. pages 88, 188-189

Hybrid changes

There are certain situations in which the previous solutions need to be combined. We shall not cover them here, but you can find further details in the references below.

Reading Hybrid changes (CV/HV): Star Schema. Christopher Adamson. pages 186-192

Reading Hybrid changes (CV/HV): Agile Data Warehouse Design: Collaborative Dimensional Modeling, from Whiteboard to Star Schema Laurence Corr, and Jim Stagnitto. pages 88, 186-188