One To One Data Copy Pipeline

For implementing a One To One Data Copy Pipeline within the data warehouse the ExMeX Framework Core introduced with release 2.12.0 new table stereotypes for data modeling.

To use this function within PowerDesigner at least version 2.2.0 of the plugin PowerDesigner Extension - TEDAMOH ExMeX Data Vault is required!

To populate One To One Data Copy tables the ExMeX Framework Core introduced a new Load Process Step (LPS) pattern (LoadPatternId = 19).

Please always remember this:
  • After data modelling in PowerDesigner, regenerate the metadata.
  • After generating the metadata and adapting the scripts, install the metadata.

One To One Data Copy Pipeline (General)

Which Stereotypes and Columns for One To One Data Copy are available?

Table type: One To One Data Copy - Stores all loaded data without any changes or filter. Table stereotype: O -…

Table type: One To One Data Copy - Stores all loaded data without any changes or filter.

  • Table stereotype: O - One To One Data Copy

    This table stores (insert) all data which were available all time. There is no versioning of the data.

  • Mandatory column: Audit Trail Id (Name and Code as configured)

    This column stores an Id which ties the loaded data to the ExMeX Logging Framework.

Optional Columns

  • Optional column: Entity Id (Name and Code as configured)

    This column stores an Id which ties the target table to the ExMeX Logging Framework.

  • Optional column: Load Timestamp (LDTS) (Name and Code as configured)

    This column stores the AssertionTimeFrom timestamp inherited from the executing batch (timestamp at which the batch was started). See also database-object MetadataZone.MetadataForExecuteDataLogisticBatch.

  • Optional column: Load End Timestamp (LDTS) (Name and Code as configured)

    This column stores the AssertionTimeBefore timestamp inherited from the executing batch (timestamp at which the batch was started) for end-dating previous rows and the High End Timestamp for new inserted rows.
    Applicable only in Temporal- or CDC-Mode and if column Load Timestamp (LDTS) is available.

  • Optional column: Hash-Diff (Code: HashDiff)

    This column stores a hash, generated with all non CDC Compare Key columns. It increases speed to identify changes.
    Applicable only in Temporal- or Delta-Mode.

  • Optional column: Status CDC (Code: StatusCDC)

    This column stores a flag, generated by CDC: I,U,D
    To ensure data quality it is recommended to use a domain for the Status CDC column.
    Applicable only in Delta-Mode (No column stereotype on column Audit Trail Id (Name and Code as configured)) and only flag-values I,U are available.

"Optional" column stereotypes drives the mode

The "optional" stereotypes control the mode of how the target table is loaded. Therefore one of the column stereotypes must be used.

  • Column stereotype: Stack (Optional) -> Stack-Mode

    This stereotype identifies if an incoming full or delta load should be stacked into the target table.
    Use cases may be a Persitent Staging Area (PSA) or a History Zone (HiZ).
    Applicable only on column Audit Trail Id (Name and Code as configured).

  • Column stereotype: Truncate (Optional) -> Truncate-Mode

    This stereotype identifies if an incoming full (or maybe delta) load should be loaded into a previously truncated target table.
    Use cases may be a Sandbox or any Full Load with previous truncate (Truncate/Insert data - equates CDC I,U,D).
    Applicable only on column Audit Trail Id (Name and Code as configured).

  • Column stereotype: CDC - Compare Key (Optional) -> Delta Mode

    This stereotype identifies one or many columns which drives the compare to identify changes and capture them.
    Use cases may be a Sandbox or any Full/Delta Load (Update/Insert data - equates CDC I,U).
    Applicable on any (target) column except Audit Trail Id (Name and Code as configured).

  • Column stereotype: Temporal (Optional) -> Temporal-Mode

    This stereotype identifies if incoming data (full or delta) should be loaded historized into a target table.
    Use cases may be a Persitent Staging Area (PSA) or a History Zone (HiZ)
    Applicable only on column Load Timestamp (LDTS) (Name and Code as configured).

  • Column stereotype: CDC - Complete (Optional) -> CDC-Mode

    This stereotype identifies if incoming data (full or delta) should be loaded historized into a target table by using CDC information. This CDC information must be provided by a mandatory upfront CDC pipeline or external (e.g. from a source system) CDC data.
    Use cases may be a Persitent Staging Area (PSA) or a History Zone (HiZ)
    Applicable only on column Status CDC (Code: StatusCDC).
Posted 1 year agoby Dirk.Lerner

Any background information?

DAPI The DAPI MetadataZoneDapi.MetadataForGenerateOneToOneDataCopy is the basis for generating the LPS with the LoadPatternId 19. Generator The script for generating…

DAPI

The DAPI MetadataZoneDapi.MetadataForGenerateOneToOneDataCopy is the basis for generating the LPS with the LoadPatternId 19.

Generator

The script for generating the LPS with LoadPatternId 19 is One To One Data Copy.sql

Posted 1 year agoby Dirk.Lerner

One To One Data Copy Pipeline (All Layers)

How to design a stage table for One To One Data Copy pipeline?

Table of type SL - Volatile Zone As always, a table of type Volatile Zone is required to load the…

Table of type SL - Volatile Zone

As always, a table of type Volatile Zone is required to load the data 1:1 from the source into the Stage Layer. For the One To One Data Copy Pipeline there are no special requirements to consider. The following picture shows an example of a Volatile Zone table.

One To One Data Copy Stage Source TableOne To One Data Copy Stage Source Table

Posted 1 year agoby Dirk.Lerner

How to design a one to one data copy table (non-historized)?

Table of type O - One To One Data Copy To populate (aka copy) all data, which are stored in…

Table of type O - One To One Data Copy

To populate (aka copy) all data, which are stored in the previous shown Volatile Zone table to another non-historized table create a new table with stereotype O - One To One Data Copy.

With this table stereotype it is irrelevant in which layer (stage, core, access or other) of the data warehouse the table is located. Possible use cases for this table type could be:

  • HiZ - Loading (Variation)
  • PSA - loading (base for virtual Data Vault)
  • ODS - Loading
  • 1:1 Copy for any target of choice, e.g. a Sandbox

This table can contain as many or fewer columns compared to the Volatile Zone table. Depends on the design decision of the data modeler and the requirements for the One To One Data Copy table.

There are basically three different modes of loading data non-historized into a target table:

Stack-Mode

The data is "stacked" and can be distinguished by at least the Audit Trail Id column (Name and Code as configured), with additional optional columns if necessary.

Use cases may be a Persitent Staging Area (PSA) or a History Zone (HiZ).

One additional (metadata) column is mandatory:

  • Audit Trail Id (Name and Code as configured)

Some additional (metadata) column are optional and can be used as needed:

  • Entity Id (Name and Code as configured)
  • Load Timestamp (LDTS - Name and Code as configured)

The following image shows an example of Stack-Mode and some of the possible variations with optional columns.

One To One Data Copy Table - Stack-Mode and variations.One To One Data Copy Table - Stack-Mode and variations.

Truncate-Mode

The "old" data in the target table is deleted before new data is loaded.

Use cases may be a Sandbox or any Full Load with previous truncate (Truncate/Insert data - equates CDC I,U,D).

With this mode the deleted rows are documented as physically deleted in the ExMeX Logging Framework. In the Data Logistic Data Mart the corresponding data are entered in the two columns RowsDeletedPhysical and RowsDeletedPhysicalDateTime and are contained in the database object MetadataZoneMart.FctProcessStepVapourTrail.

One additional (metadata) column is mandatory:

  • Audit Trail Id (Name and Code as configured)

Some additional (metadata) column are optional and can be used as needed:

  • Entity Id (Name and Code as configured)
  • Load Timestamp (LDTS - Name and Code as configured)

The following image shows an example of Truncate-Mode and some of the possible variations with optional columns.

One To One Data Copy Table - Truncate-Mode and variations.One To One Data Copy Table - Truncate-Mode and variations.

Delta-Mode

New data is inserted into the target table and existing data is updated.

For comparison, all columns with the column stereotype CDC - Compare Key are used as key.

All context columns are checked for changes. If column Hash-Diff is used, the column Hash-Diff will be used to check for changes and not the context columns.

Metadata columns are not part of the comparison and are ignored.

Use cases may be a Sandbox or any Full/Delta Load (Update/Insert data - equates CDC I,U).

One additional (metadata) column is mandatory:

  • Audit Trail Id (Name and Code as configured)

Some additional (metadata) column are optional and can be used as needed:

  • Entity Id (Name and Code as configured)
  • Load Timestamp (LDTS - Name and Code as configured)
  • Hash-Diff (Code: HashDiff)
  • Status CDC (Code: StatusCDC) - For documentation only. Does not imply CDC functionality.

The following image shows an example of Delta-Mode and some of the possible variations with optional columns.

One To One Data Copy Table - Delta-Mode and variations.One To One Data Copy Table - Delta-Mode and variations.

Posted 1 year agoby Dirk.Lerner

How to design a one to one data copy table (historized)?

Valid as of release 2.13.0 Table of type O - One To One Data Copy To populate (aka copy) all…

Valid as of release 2.13.0

Table of type O - One To One Data Copy

To populate (aka copy) all data, which are stored in the previous shown Volatile Zone table to another historized table create a new table with stereotype O - One To One Data Copy.

With this table stereotype it is irrelevant in which layer (stage, core, access or other) of the data warehouse the table is located. Possible use cases for this table type could be:

  • HiZ - Loading (Variation)
  • PSA - loading (base for virtual Data Vault)
  • ODS - Loading
  • 1:1 Copy for any target of choice, e.g. a Sandbox

This table can contain as many or fewer columns compared to the Volatile Zone table. Depends on the design decision of the data modeler and the requirements for the One To One Data Copy table.

There are basically two different modes of loading data historized into a target table:

Temporal-Mode

New and changed data is inserted into the target table and existing data is versioned when changed.

All columns with the column stereotype CDC - Compare Key are used as key for comparison, like in Delta-Mode.

The column stereotype Temporal at the column Load Timestamp (LDTS - Name and Code as configured) activates the historization of the data and switches the load process into the Temporal-Mode. If this column stereotype is not set, the load process remains in Delta-Mode.

All context columns are checked for changes. If column Hash-Diff is used, the column Hash-Diff will be used to check for changes and not the context columns.

Metadata columns are not part of the comparison and are ignored.

Use cases can be a sandbox or any historized full/delta load (update/insert data - equivalent to CDC I,U).

One additional (metadata) column is mandatory:

  • Load Timestamp (LDTS - Name and Code as configured)

Some additional (metadata) column are optional and can be used as needed:

  • Audit Trail Id (Name and Code as configured)
  • Entity Id (Name and Code as configured)
  • Load End Timestamp (LEDTS - Name and Code as configured)
  • Hash-Diff (Code: HashDiff)

The following image shows an example of Temporal-Mode and some of the possible variations with optional columns.

One To One Data Copy Table - Temporal-Mode and variations.One To One Data Copy Table - Temporal-Mode and variations.

CDC-Mode

In CDC-Mode, the identification of changes to the data and the generation of the CDC flag (StatusCDC with 'I','U','D') for the data must either take place in an upstream CDC data pipeline or the source system must already provide this accordingly.

New and changed data are inserted into the target table based on the CDC flag (StatusCDC column) and existing data are versioned (virtual or with enddating) when changed.
Deleted data is also versioned (virtually or with enddating) and an additional deletion record is inserted for documentation and audit purposes.
The CDC-Mode is the only mode that can handle and display deleted records.

All columns of the Primary Key are used for versioning the data.

The column stereotype CDC - Complete at the column StatusCDC activates the historization of the data and switches the loading process into CDC-Mode.

The StatusCDC column must be connected in the mapping.

Use cases can be a sandbox or any historized full/delta load (update/insert data - equivalent to CDC I,U).

A primary key is mandatory.

Two additional (metadata) columns are mandatory:
  • Audit Trail Id (Name and Code as configured)
  • Status CDC (Code: StatusCDC)

Some additional (metadata) column are optional and can be used as needed:

  • Entity Id (Name and Code as configured)
  • Load Timestamp (LDTS - Name and Code as configured)
  • Load End Timestamp (LEDTS - Name and Code as configured)

The following image shows an example of CDC-Mode with an upstream CDC data pipeline and some of the possible variants with optional columns.

One To One Data Copy Table - CDC-Mode and variations.One To One Data Copy Table - CDC-Mode and variations.

Posted 1 year agoby Dirk.Lerner

How to design mappings for a One To One Data Copy table (non historized)?

Mapping table of type SL - Volatile Zone to O - One To One Data Copy (non historized) The mapping…

Mapping table of type SL - Volatile Zone to O - One To One Data Copy (non historized)

The mapping of the columns from the table of type SL - Volatile Zone to type O - One To One Data Copy is kept simple. Connect the columns to be loaded by drag & drop, give the mapping a suitable name and that's it. This applies to all modes: stack, truncate and delta.

The additional metadata columns must not be mapped. The Load Process Step takes care of the loading of the columns, if applicable:

  • Audit Trail Id (Name and Code as configured)
  • Entity Id (Name and Code as configured)
  • Load Timestamp (LDTS - Name and Code as configured)
  • Load End Timestamp (LEDTS - Name and Code as configured)
  • Hash-Diff (Code: HashDiff)
  • Status CDC (Code: StatusCDC)

The following picture shows an exemplary mapping.

One To One Data Copy Stage MappingOne To One Data Copy Stage Mapping

Posted 1 year agoby Dirk.Lerner

How to design mappings for a One To One Data Copy table (historized)?

Valid as of release 2.13.0 Mapping table of type SL - Volatile Zone to O - One To One Data…

Valid as of release 2.13.0

Mapping table of type SL - Volatile Zone to O - One To One Data Copy (historized)

The mapping of the columns from the table of type SL - Volatile Zone to type O - One To One Data Copy is kept simple. Connect the columns to be loaded by drag & drop, give the mapping a suitable name and that's it.

Temporal-Mode

The additional metadata columns must not be mapped. The Load Process Step takes care of the loading of the columns:

  • Audit Trail Id (Name and Code as configured)
  • Entity Id (Name and Code as configured)
  • Load End Timestamp (LEDTS - Name and Code as configured)
  • Hash-Diff (Code: HashDiff)

The following image shows an exemplary mapping for a table with stereotype O - One To One Data Copy in Temporal-Mode.

Mapping for a One To One Data Copy table in Temporal-ModeMapping for a One To One Data Copy table in Temporal-Mode.

CDC-Mode

The additional metadata columns must not be mapped. In this context, the StatusCDC column is not treated as a metadata column, since the CDC flag is supplied from the source table. The Load Process Step takes care of the loading of the column:

  • Audit Trail Id (Name and Code as configured)
  • Entity Id (Name and Code as configured)
  • Load Timestamp (LDTS - Name and Code as configured)
  • Load End Timestamp (LEDTS - Name and Code as configured)

The following image shows an exemplary mapping for a table with stereotype O - One To One Data Copy in CDC-Mode.

Mapping for a One To One Data Copy table in CDC-ModeMapping for a One To One Data Copy table in CDC-Mode.

Posted 1 year agoby Dirk.Lerner

One To One Data Copy Pipeline (Batch)

How to set up a batch for One To One Data Copy Pipeline?

ExMeX Companion App Starting with version 0.2.0 of the ExMeX Companion App, the data modeler can create a new batch…

ExMeX Companion App

Starting with version 0.2.0 of the ExMeX Companion App, the data modeler can create a new batch for the One To One Data Copy Pipeline in the ExMeX Companion App

ExMeX Framework Core Scripts

As always, the data modeler must add a new batch to the \etc\config\ConfigLoadBatchLogistic.sql file. The following SQL example shows an sample batch for the One To One Data Copy Pipeline.

-- Example Batch for One To One Data Copy Pipeline
MERGE INTO MetadataZone.LoadBatch AS target
	USING (	SELECT 'One To One Data Copy'
				  ,LoadBatchId
			FROM MetadataZone.LoadBatch
			WHERE BatchLabel = 'BigOne'
		   ) AS source (
				 BatchLabel
				,ParentLoadBatchId
				)
	ON (target.BatchLabel = source.BatchLabel)
WHEN MATCHED THEN
	UPDATE SET 
		target.ParentLoadBatchId = source.ParentLoadBatchId
WHEN NOT MATCHED THEN
	INSERT (
		 BatchLabel
		,ParentLoadBatchId
		)
	VALUES (
			 source.BatchLabel
			,source.ParentLoadBatchId
			)
;

Next, the data modeler must place the new batch in the dependencies to other batches (file \etc\config\ConfigLoadBatchLogistic.sql). In the following example SQLs the batch is arranged after the Stage Layer is loaded (Batch 'Load Stage with SSIS') and before the Batch 'Default - Hub'.

-- Stage to Stage
MERGE INTO MetadataZone.LoadBatchDependency AS target
	USING (	SELECT prelb.LoadBatchId
				  ,suclb.LoadBatchId
			FROM MetadataZone.LoadBatch prelb
			CROSS JOIN MetadataZone.LoadBatch suclb
			WHERE prelb.BatchLabel = 'Load Stage with SSIS'
			AND   suclb.BatchLabel = 'One To One Data Copy'
		   ) AS source (
				 PredecessorLoadBatchId
				,SuccessorLoadBatchId)
	ON (target.PredecessorLoadBatchId = source.PredecessorLoadBatchId
	AND target.SuccessorLoadBatchId = source.SuccessorLoadBatchId)
WHEN NOT MATCHED THEN
	INSERT 
	VALUES ( source.PredecessorLoadBatchId
			,source.SuccessorLoadBatchId)
;

-- Stage to Core
MERGE INTO MetadataZone.LoadBatchDependency AS target
	USING (	SELECT prelb.LoadBatchId
				  ,suclb.LoadBatchId
			FROM MetadataZone.LoadBatch prelb
			CROSS JOIN MetadataZone.LoadBatch suclb
			WHERE prelb.BatchLabel = 'One To One Data Copy'
			AND   suclb.BatchLabel = 'Default - Hub'
		   ) AS source (
				 PredecessorLoadBatchId
				,SuccessorLoadBatchId)
	ON (target.PredecessorLoadBatchId = source.PredecessorLoadBatchId
	AND target.SuccessorLoadBatchId = source.SuccessorLoadBatchId)
WHEN NOT MATCHED THEN
	INSERT 
	VALUES ( source.PredecessorLoadBatchId
			,source.SuccessorLoadBatchId)
;

In the next step, the data modeler must add the Load Process Steps respectively the tables to the batch. The assignment is done in the file \etc\config\ConfigLoadBatchProcessStepLogistic.sql. The following two SQL statements show as an example the assignment of all tables of type O - One To One Data Copy of the data model STAGE_LAYER to the new batch 'One To One Data Copy'.

-- Delete Load Process Steps from Batch
DELETE FROM MetadataZone.LoadBatchProcessStep
	WHERE LoadBatchId = (SELECT LoadBatchId
						 FROM MetadataZone.LoadBatch
						 WHERE BatchLabel = 'One To One Data Copy'
						)
;

-- DML to assign all CDC Load Process Steps to a batch
INSERT INTO MetadataZone.LoadBatchProcessStep (LoadBatchId, ProcessStepId)
		SELECT LoadBatchId, ProcessStepId
		FROM (SELECT LoadBatchId 
                FROM MetadataZone.LoadBatch 
                WHERE BatchLabel = 'One To One Data Copy') lb
	    CROSS JOIN (SELECT ProcessStepId
					FROM MetadataZone.EntityClass iec
					INNER JOIN MetadataZone.Mapping im
						ON  iec.EntityClassCode    = im.TargetEntityClassCode
						AND iec.ModelCode          = im.TargetModelCode
                    INNER JOIN MetadataZone.LoadProcessStep ilps
                        ON  ilps.MappingGUID       = im.MappingGUID
                    INNER JOIN (
                            SELECT ProcessStepTypeId
                            FROM MetadataZone.ProcessStepType
                            WHERE ProcessStepType = 'DEF'
                            ) pst
                        ON ilps.ProcessStepTypeId = pst.ProcessStepTypeId
                    -- Assign Data Models to batch
 					WHERE iec.ModelCode IN (
                                             'STAGE_LAYER'
                                            --,[...]
                                            )
                    -- Assign Stereotypes to batch
					  AND iec.Stereotype IN (
                                            -- [...]
                                            ,'One To One Data Copy'
                                            --,[...]
					  					    )
                    -- Assign tables (according to naming conventions) to batch
                    -- e.g. %edSat% for EdDating-Satellites
                      AND iec.EntityClassCode     LIKE '%%'
                    -- Include or exclude mappings according to naming conventions
                    -- NoRun% -> e.g. not used integrity runs
                    -- TestRun% -> mappings not ready to run on quality or production
                    --             but available in test-batches
                      AND im.MappingName NOT LIKE 'NoRun%'
                      AND im.MappingName NOT LIKE 'TestRun%'
                      AND im.MappingName NOT LIKE 'BusinessRun%'
					) lps
;
Posted 1 year agoby Dirk.Lerner