Data Logistic

In this chapter the Data Logistics of the TEDAMOH ExMeX Framework are described.

The Batch & LPS Orchestration section describes how batches can be created, configured and deactivated. Batches are the basis for executing and orchestrating Load Process Steps (LPS). The section also shows how to configure LPS in batches.

After that, the Batch & LPS Execution section shows how the previously configured batches or LPSs can be executed.

The section Data Pipeline describes and highlights specific data pipelines, which are described by using examples for a better understanding.

The section DAPI gives an overview of all DAPIs: the DAPI istself, the asociated attributes and the data they provide. In the sub-section Common Attributes are all columns descibed which most DAPIs has in common.

The section Stored Procedures gives an overview of all stored procedures used for data logistics.

chuttersnap eqwFWHfQipg unsplash

(Photo by CHUTTERSNAP on Unsplash)

Batch & LPS Orchestration

For the implementation of a data pipeline within the data warehouse, batches were introduced in the ExMeX Framework Core from the beginning.

Each data pipeline may start with it’s own Master-Batch or is part of a Child-Batch and can exists of four Levels of children-batches, aka batch hierarchy.

There are various possibilities to influence the behavior of the batches via global variables or batch-specific settings.

By assigning Load Process Steps (LPSs) to each batch, a finer orchestration of the LPSs with the batches is already possible. There are various possibilities to influence the behavior of the LPSs via global variables, batch settings or even LPS-specific settings.

reproductive health supplies coalition i2I0 u98Rh4 unsplash

(Photo by Reproductive Health Supplies Coalition on Unsplash)

A pile of old, used pallets for transporting heavy goods in trucks, trains etc. Just as data is transported (LPS as pallets) in one or more batches (truck or train) via hyperloop in a data pipeline.

Data Logistic Statistics

After the TEDAMOH ExMeX Framework Core has loaded data into the data warehouse for many days, weeks, months and perhaps even years, questions arise for the data modeler or the operator: What is the state of individual streams or Load Process Steps (LPS) over time or what does the system do on the first four days of each month? Or how can I present a continuous monitoring of my current load?

luke chesser JKUTrJ4vK00 unsplashWith the help of the Data Logistic Data Mart of the ExMeX Logging Framework, a uniform and stable access layer exists in the form of a star schema to access all this information and to be able to provide answers to the most diverse questions.

In the following sections the dimensions and facts of the Data Logistic Data Mart of the ExMeX Logging Framework are described in detail.

(Photo by Luke Chesser on Unsplash)

DAPI

Metadata For Execute Data Logistic Load Process Step

Purpose The DAPI Metadata For Execute Data Logistic Load Process Step provides data logistic information to orchestrate any single Load…
Purpose

The DAPI Metadata For Execute Data Logistic Load Process Step provides data logistic information to orchestrate any single Load Process Step (LPS).

Motivation

This DAPI provides each LPS with the information necessary for execution. The information is the time period with which the LPS extracts the data from the source.

Applicability

The DAPI plays a role in LPS loading the data from the source interfaces into the stage layer.

Content

All LPS which has generated jobs.

Attribute Description
ProcessStepId Unique Id to identify a defined LoadProcessStep in table LoadProcessStep.
ProcessStepLabel Label to identify a defined LoadProcessStep. Defined during data model design process.
ProcessStepType Short name of different load types:
  • DEF: Default - No specific process type
  • ILO: Initial Load - Used for special initial load processes
  • FUL: Full Load processes - incrementatl interface
  • INL: Incremental load processes - full load interface
  • WIL: Window load processes - selection of window of records to process
LoadBatchId Unique Id to identify a defined Batch in table LoadBatch.
AuditTrailId Unique Id for each single LoadProcessStep. This is the atomic building block.
AssertionTimeFrom Current running LPS start timestamp.
HighEndTimestamp High End Timestamp as defined in table TEDAMOHConfigExMeXFramework.
DeltaExportTimeStampFrom

Time period begin timestamp. It is defined as

  1. If override timestamp exist use it; else
  2. If LPS never executed us Low End Timestamp; else
  3. Use recent sucessful batch start timestamp of associated batch; else
  4. Use recent sucessful LPS start timestamp; else
  5. Low End Timestamp
DeltaExportTimeStampBefore

Time period before timestamp. It is defined as

  1. Use current running batch start timestamp of associated batch; else
  2. Use current running LPS start timestamp; else
  3. High End Timestamp
Sample

In the following table are shown two example LPS to load data from a source interface into the Stage Layer target tables:

  • ArtworkInMuseum
  • Supllier

To extract data from the source interface the two LPS will use the time periode (Closed - Open notation) which are stated in the columns

  • DeltaExportTimeStampFrom
  • DeltaExportTimeStampBefore

LPS 9 will extract and load data with the time period [2019-09-24 13:54:29.6989653  -  2019-09-24 15:49:57.8288227[

LPS 10 will extract and load data with the time period [2019-09-24 13:54:29.6989653  -  2019-09-24 15:49:57.8288227[

ProcessStepId  ProcessStepLabel ProcessStepType LoadBatchId AuditTrailId AssertionTimeFrom [...] DeltaExportTimeStampFrom DeltaExportTimeStampBefore
9 Source2StageArtworkInMuseum DEF 4 16107 2019-09-24 15:49:57.8734702 [...] 2019-09-24 13:54:29.6989653 2019-09-24 15:49:57.8288227
10 Source2StageSupplier DEF 4 16108 2019-09-24 15:49:58.1498235 [...] 2019-09-24 13:54:29.6989653 2019-09-24 15:49:57.8288227
Posted 2 years agoby admin.tedamoh