PhD student, LISTIC, USMB



Telephone: +33(0) 765228805

Office: A221-A222

Address 1: LISTIC - Polytech Annecy-Chambery, BP 80439, 74944 Annecy le Vieux Cedex, France


Group: LISTIC - ReGaRD team

Theme: Modelling Data Warehousing in the context of Big Data



Data Warehouses are indispensable for all information systems as they play a key role in decision making. The typical architecture of a Doctoral School is mainly composed of four parts: data sources, data preparation, target data storage, and data access and analysis. At the heart of this architecture is the ETL process for Extracting, Transforming and Loading data into the target database for visualisation, reporting, analysis and decision making. In the era of Big Data, the major challenge for the community is to evolve the Doctoral School traditional architectures, and in particular the classical ETL process to support the requirements of . The state of the art reveals two limitations. The first concerns Big Data approaches based on various dedicated technologies, such as the Hadoop ecosystem, Flink, Kafka, Kibana, etc. These are evolving rapidly, to the point where they are no longer sufficient to meet the needs of the market. These are evolving rapidly, to the point where the architectures of Doctoral School are becoming obsolete compared to the latest technologies. The second is that there is no standard model for the representation and design of ETL processes. Despite the contributions of the work on ETL process modelling in the literature, the design of a generic ETL model capable of homogenising the different contemporary approaches is still a challenge. For these reasons, based on Model Driven Engineering (MDE) as a generic framework and Model Driven Architecture (MDA) as a specific framework, we seek in this thesis to propose a new generic ETL model and a new generic architecture for massive data warehousing supporting this model. This architecture could be instantiated according to specific technologies depending on the application domain. In addition, we also propose a methodology to help the expert to implement an architecture that meets the specificities of his company based on the generic architecture. Finally, we validate all the research work carried out on a practical case such as the medical field (Pandemic covid-19) or other applications.

Keywords: Data Warehouse, ETL Process Modeling, Data Warehousing Architectures, Knowledge Discovery, Meta-Model, Generic Methodology
Publications :
A Multi-Layer Modeling for the Generation of New Architectures for Big Data Warehousing -
A Two Level Architecture for Data Warehousing and OLAP Over Big Data -
Data Warehousing Process Modeling from Classical Approaches to New Trends: Main Features and Comparisons -


Supervisor: Sébastien Monnet & Mohamed Mohsen Gammoudi

Co-supervisor: Khadija Arfaoui

Start of the thesis: January 2021