The aim of this activity is to familiarize you with EasyMorph, which is used to create ETL (Extraction, Transform, Load) processes. At the end of this activity, you should be able to specify and implement a data extraction, transformation and loading process.

Introduction

We will initially review the resources at your disposal. First, you will discover two formats:

  • The dimensional schema for specifying the target data schema.
  • The logical datamap for specifying the transformations to be made between source data and the target data schema.

In a second stage, you will discover the data and the environment to be used for this activity.

Dimensional schema

A dimensional schema is a logical representation of decision-making data. Regardless of the chosen data storage tool (OLAP data warehouse, in-memory solution, relational database, etc.), dimensional data modeling can therefore be used to structure this data (in terms of measurements and areas of analysis) while also incorporating historization mechanisms.

During this practical session, we will be exporting these tables in the form of csv files. Nevertheless, the ETL methodology presented remains indispensable. Indeed, although the current technologies are powerful enough to import all the source data into your dashboard tool, this means that you would be entrusting the tasks of cleaning, consolidating and structuring the data to the dashboard designer, and due to the development of “self-service” Business Intelligence tools, this designer is often no more than an advanced user who will not necessarily be aware of the issues associated with data consolidation.

The dimensional schema to be produced is presented in the figure below. It describes the fact-table-based structure and the dimensions that your ETL processes must create.

Dimensional schema
Dimensional schema of the dimensional database for pesticides in groundwater

Logical datamap

The logical datamap is a specification document which should present the transformations to be made to the data as precisely as possible, in order to obtain the target data shown in the dimensional schema.

During this practical session, you should refer to this logical datamap in order to monitor the transformations that you will be asked to make. Left-click the image to open the latest version of the document and keep it open in your tabs.

Logical datamap
Logical datamap specifying the data transformation (click to expand)

Data

Download the data.

Getting started

Installing the environment

If you are using the IMT Atlantic [VDI] environment (https://vdi.imt-atlantique.fr/ovirt-engine/):

  • Start the EasyMorph software

If you are using your personal workstation:

  • Download and install the [EasyMorph] software (https://easymorph.com/download.html) (Windows required).
  • Start the EasyMorph software

Presenting the environment

EasyMorph will enable you to specify the ETL processes used to extract data from different sources, transform it and export it into a target format. The image below shows how the interface is structured:

  1. A tool bar which can be used to import new event sources and run the ETL process
  2. A workspace in which each window represents a event source
  3. The currently selected event source
  4. An extract of data from the source according to the processing step
  5. The processing procedure with the sequence of operations to be applied. You can change the content of the display by selecting the appropriate icon (zone 4)
  6. The processing procedure in detail: selecting a step enables you to configure it, and the button at the bottom allows you to add a new step
Overview of the EasyMorph interface
Overview of the Easymorph interface (click to expand)

Creation of ETL processes: dimensions and fact tables

To simplify our introduction to the creation of ETL processes, we will initially limit ourselves to creating the Prélèvements (Samples) fact table and the Station (Plant) and Pesticide dimensions only. We will therefore be implementing the following simplified version of the dimensional schema:

Simplified dimensional schema
Simplified dimensional schema of the dimensional database for pesticides in groundwater

Process to be implemented

Please note! You must create a new Easymorph project for each of these pages: