OLTP and OLAP are data-processing categories, the former dedicated to operational systems and the latter to decision-support systems. We shall be presenting them and situating the Big Data concept in relation to them.

Definitions

Online transaction processing (OLTP)

A category of data-processing systems designed to generate, in real time, potentially competing queries for the purpose of managing an operational system.

Using the Quiventou supermarket chain as an example, the company’s operational information system is based on the OLTP system. It is used to manage the company’s business activities: stock status, orders, management of checkouts and purchases, human resources, and accounting. In OLTP, the queries are competing, because all the processes take place at the same time: new stocks arrive, products are withdrawn from stock as customers make purchases, etc. The database is intensively used by many people, especially in write mode. However, the queries are quite simple: a stock modification or the validation of a purchase transaction, etc.

Lecture OLTP : Agile Data Warehouse Design: Collaborative Dimensional Modeling, from Whiteboard to Star Schema Laurence Corr, and Jim Stagnitto. pages 3-6, 12-15

Lecture OLTP : Star Schema. Christopher Adamson. page 4, Chapter 2

Online analytical processing (OLAP)

A category of data-processing systems designed to respond to multi-dimensional queries about data for the purpose of analysis.

In the example of the Quiventou supermarket chain, the company uses an OLAP decision-support information system, which produces process-performance data to assist with the company’s management: is there too much stock? Has a given product sold well? What is the average age of consumers who purchase jars of babyfood? etc. In OLAP systems, the queries are complex and multi-dimensional. Indeed, the question about stocks could well become “Is the chocolate inventory level of supermarkets located in the Paris region too high in January?” or even “Was the inventory level of all product categories too high during 2019 in the city of Paris?”.

Lecture OLAP : Star Schema. Christopher Adamson. pages 4, 53-56

Big data

All technologies capable of managing very large volumes of data (above one hundred TB).

Comparing approaches

Characteristics OLTP OLAP Big Data
Objective Monitoring and running fundamental business tasks Providing information for strategic decision-making Operational and/or decision-support
Data Updated in real time, and reflecting the current processes Historized, summarized, integrated and multi-dimensional Structured or unstructured
Design Standardized Non-standardized with dimensional modeling Non-standardized with one model per key value, document or graph
Inquiries Simple and returning a small number of rows Complex with many aggregates Complex with many aggregates but the result may not be consistent
Updating Transactions Refresh batch Propagation to several nodes
Size Small (from the GB to the TB) Large (from one TB to around one hundred TB) Massive (over one hundred TB)

The positioning of OLAP and Big Data has aroused considerable debate for many years. During the surge in the development of Big Data, many commentators predicted the demise of OLAP technologies. In fact, this has not been the case, or not really …

OLAP technologies are based on the principle of having a sound knowledge of users’ needs, and on the structuring of data enabling almost immediate access to pre-calculated reports.

Big-data technologies are based on the principle of storing a massive body of data and being able to answer any question that may not even have previously come to mind. These technologies have several disadvantages:

  • the underlying distributed system does not guarantee the consistency of data (recent updates may not yet have been taken into account).
  • not all big-data technologies are equal in their ability to effectively calculate numeric aggregates, and OLAP remains a sound choice
  • the desire to store everything for subsequent interrogation is costly in terms of storage resources but also in human terms because it entails simply putting off the moment when choices have to be made … and with poorly structured data, this step can become complicated.

Finally, this leads us to a hybrid model: dimensional modeling is an effective way to structure data in a logical manner, but the physical layer is managed by big data technologies which are highly effective at calculating numeric aggregates, and are combined with efficient indexing systems. This model combines the best of both worlds.

In summary