Data Logistic
In this chapter the Data Logistics of the TEDAMOH ExMeX Framework are described.
The Batch & LPS Orchestration section describes how batches can be created, configured and deactivated. Batches are the basis for executing and orchestrating Load Process Steps (LPS). The section also shows how to configure LPS in batches.
After that, the Batch & LPS Execution section shows how the previously configured batches or LPSs can be executed.
The section Data Pipeline describes and highlights specific data pipelines, which are described by using examples for a better understanding.
The section DAPI gives an overview of all DAPIs: the DAPI istself, the asociated attributes and the data they provide. In the sub-section Common Attributes are all columns descibed which most DAPIs has in common.
The section Stored Procedures gives an overview of all stored procedures used for data logistics.
(Photo by CHUTTERSNAP on Unsplash)
Batch & LPS Orchestration
For the implementation of a data pipeline within the data warehouse, batches were introduced in the ExMeX Framework Core from the beginning.
Each data pipeline may start with it’s own Master-Batch or is part of a Child-Batch and can exists of four Levels of children-batches, aka batch hierarchy.
There are various possibilities to influence the behavior of the batches via global variables or batch-specific settings.
By assigning Load Process Steps (LPSs) to each batch, a finer orchestration of the LPSs with the batches is already possible. There are various possibilities to influence the behavior of the LPSs via global variables, batch settings or even LPS-specific settings.
(Photo by Reproductive Health Supplies Coalition on Unsplash)
A pile of old, used pallets for transporting heavy goods in trucks, trains etc. Just as data is transported (LPS as pallets) in one or more batches (truck or train) via hyperloop in a data pipeline.
Data Logistic Statistics
After the TEDAMOH ExMeX Framework Core has loaded data into the data warehouse for many days, weeks, months and perhaps even years, questions arise for the data modeler or the operator: What is the state of individual streams or Load Process Steps (LPS) over time or what does the system do on the first four days of each month? Or how can I present a continuous monitoring of my current load?
With the help of the Data Logistic Data Mart of the ExMeX Logging Framework, a uniform and stable access layer exists in the form of a star schema to access all this information and to be able to provide answers to the most diverse questions.
In the following sections the dimensions and facts of the Data Logistic Data Mart of the ExMeX Logging Framework are described in detail.
(Photo by Luke Chesser on Unsplash)
DAPI
Metadata For Execute Data Logistic Load Process Step
Purpose
The DAPI Metadata For Execute Data Logistic Load Process Step provides data logistic information to orchestrate any single Load Process Step (LPS).
Motivation
This DAPI provides each LPS with the information necessary for execution. The information is the time period with which the LPS extracts the data from the source.
Applicability
The DAPI plays a role in LPS loading the data from the source interfaces into the stage layer.
Content
All LPS which has generated jobs.
Attribute | Description |
ProcessStepId | Unique Id to identify a defined LoadProcessStep in table LoadProcessStep. |
ProcessStepLabel | Label to identify a defined LoadProcessStep. Defined during data model design process. |
ProcessStepType | Short name of different load types:
|
LoadBatchId | Unique Id to identify a defined Batch in table LoadBatch. |
AuditTrailId | Unique Id for each single LoadProcessStep. This is the atomic building block. |
AssertionTimeFrom | Current running LPS start timestamp. |
HighEndTimestamp | High End Timestamp as defined in table TEDAMOHConfigExMeXFramework. |
DeltaExportTimeStampFrom |
Time period begin timestamp. It is defined as
|
DeltaExportTimeStampBefore |
Time period before timestamp. It is defined as
|
Sample
In the following table are shown two example LPS to load data from a source interface into the Stage Layer target tables:
- ArtworkInMuseum
- Supllier
To extract data from the source interface the two LPS will use the time periode (Closed - Open notation) which are stated in the columns
- DeltaExportTimeStampFrom
- DeltaExportTimeStampBefore
LPS 9 will extract and load data with the time period [2019-09-24 13:54:29.6989653 - 2019-09-24 15:49:57.8288227[
LPS 10 will extract and load data with the time period [2019-09-24 13:54:29.6989653 - 2019-09-24 15:49:57.8288227[
ProcessStepId | ProcessStepLabel | ProcessStepType | LoadBatchId | AuditTrailId | AssertionTimeFrom | [...] | DeltaExportTimeStampFrom | DeltaExportTimeStampBefore |
9 | Source2StageArtworkInMuseum | DEF | 4 | 16107 | 2019-09-24 15:49:57.8734702 | [...] | 2019-09-24 13:54:29.6989653 | 2019-09-24 15:49:57.8288227 |
10 | Source2StageSupplier | DEF | 4 | 16108 | 2019-09-24 15:49:58.1498235 | [...] | 2019-09-24 13:54:29.6989653 | 2019-09-24 15:49:57.8288227 |