This second activity on ETL processes shows how to implement more complex processes, including the creation of a bridge table.

Dimensional schema

In this second section on ETL processes, we will be focusing on the complete version of the dimensional schema studied in the Dimensional modeling - Part 2 activity, which therefore takes into account the fact that each pesticide is associated with several functions.

The dimensional schema to be produced is presented in the figure below. It describes the fact-table-based structure and dimensions that your ETL processes must create.

Dimensional schema
Dimensional schema of the dimensional database for pesticides in groundwater

Logical datamap

The logical datamap presented above builds on the logical datamap presented in the first part. Here are the main changes:

  • Addition of the Groupe Fonction (Function Group) bridge table.
  • Addition of a key to the Pesticides dimension.
  • Addition of the Fonction (Function) dimension.
Logical datamap
Logical datamap specifying the data transformation (click to expand)

Data

Download the data.

Creating the ETL process to improve pesticide management

We will now update our processes to take account of functions associated with pesticides.

Consult the Getting started section in the first part for information about installation and a presentation on the working environment.

Reminder of the context

Examining an extract from the current pesticide dimension (image below) reveals a problem with the description of a pesticide function. Take the example of Dinitrocrésol; this pesticide is used for its insecticide, acaricide, fungicide and herbicide functions, as shown by its acronym IAFH. This is unsatisfactory, because we need to be able to filter our data for insecticides only, for example.

Extract from the Pesticide dimension
Extract from the Pesticide dimension

Because a pesticide can have several functions, one solution would be to add as many Boolean attributes as the number of functions possible within the dimension. We will not be adopting this solution because it lacks genericity (if we want to add new functions) and complicates the use of our dashboards (with some fifteen Boolean variables rather than a single one indicating the function). The way to manage this “several to several” type of cardinality (a pesticide has several functions and one function concerns several pesticides) is to use a data template known as a “bridge table”. The Groupe Fonction dimensional table is therefore a bridge table in the dimensional schema for our data.

Dimensional schema
Dimensional schema and extracts of data from the dimensional database for pesticides in groundwater (click to expand)

The schema above now shows that:

  • a pesticide has a single group of functions (the pesticide Dinitrocrésol belongs to group 11)
  • a group of functions concerns several functions (group 11 has 4 different functions)

Therefore:

  • certain rows of pesticides are associated with several rows of the Groupe Fonction bridge table
  • each row of the Groupe Fonction bridge table is attached to a single function.

Process to be implemented

We will be implementing the bridge table mechanism on a step-by-step basis. Please note! You must create a new Easymorph project for each of these pages: