One To One Data Copy Pipeline (All Layers)

How to design a stage table for One To One Data Copy pipeline?

Table of type SL - Volatile Zone As always, a table of type Volatile Zone is required to load the…

Table of type SL - Volatile Zone

As always, a table of type Volatile Zone is required to load the data 1:1 from the source into the Stage Layer. For the One To One Data Copy Pipeline there are no special requirements to consider. The following picture shows an example of a Volatile Zone table.

One To One Data Copy Stage Source TableOne To One Data Copy Stage Source Table

Posted 1 year agoby Dirk.Lerner

How to design a one to one data copy table (non-historized)?

Table of type O - One To One Data Copy To populate (aka copy) all data, which are stored in…

Table of type O - One To One Data Copy

To populate (aka copy) all data, which are stored in the previous shown Volatile Zone table to another non-historized table create a new table with stereotype O - One To One Data Copy.

With this table stereotype it is irrelevant in which layer (stage, core, access or other) of the data warehouse the table is located. Possible use cases for this table type could be:

  • HiZ - Loading (Variation)
  • PSA - loading (base for virtual Data Vault)
  • ODS - Loading
  • 1:1 Copy for any target of choice, e.g. a Sandbox

This table can contain as many or fewer columns compared to the Volatile Zone table. Depends on the design decision of the data modeler and the requirements for the One To One Data Copy table.

There are basically three different modes of loading data non-historized into a target table:

Stack-Mode

The data is "stacked" and can be distinguished by at least the Audit Trail Id column (Name and Code as configured), with additional optional columns if necessary.

Use cases may be a Persitent Staging Area (PSA) or a History Zone (HiZ).

One additional (metadata) column is mandatory:

  • Audit Trail Id (Name and Code as configured)

Some additional (metadata) column are optional and can be used as needed:

  • Entity Id (Name and Code as configured)
  • Load Timestamp (LDTS - Name and Code as configured)

The following image shows an example of Stack-Mode and some of the possible variations with optional columns.

One To One Data Copy Table - Stack-Mode and variations.One To One Data Copy Table - Stack-Mode and variations.

Truncate-Mode

The "old" data in the target table is deleted before new data is loaded.

Use cases may be a Sandbox or any Full Load with previous truncate (Truncate/Insert data - equates CDC I,U,D).

With this mode the deleted rows are documented as physically deleted in the ExMeX Logging Framework. In the Data Logistic Data Mart the corresponding data are entered in the two columns RowsDeletedPhysical and RowsDeletedPhysicalDateTime and are contained in the database object MetadataZoneMart.FctProcessStepVapourTrail.

One additional (metadata) column is mandatory:

  • Audit Trail Id (Name and Code as configured)

Some additional (metadata) column are optional and can be used as needed:

  • Entity Id (Name and Code as configured)
  • Load Timestamp (LDTS - Name and Code as configured)

The following image shows an example of Truncate-Mode and some of the possible variations with optional columns.

One To One Data Copy Table - Truncate-Mode and variations.One To One Data Copy Table - Truncate-Mode and variations.

Delta-Mode

New data is inserted into the target table and existing data is updated.

For comparison, all columns with the column stereotype CDC - Compare Key are used as key.

All context columns are checked for changes. If column Hash-Diff is used, the column Hash-Diff will be used to check for changes and not the context columns.

Metadata columns are not part of the comparison and are ignored.

Use cases may be a Sandbox or any Full/Delta Load (Update/Insert data - equates CDC I,U).

One additional (metadata) column is mandatory:

  • Audit Trail Id (Name and Code as configured)

Some additional (metadata) column are optional and can be used as needed:

  • Entity Id (Name and Code as configured)
  • Load Timestamp (LDTS - Name and Code as configured)
  • Hash-Diff (Code: HashDiff)
  • Status CDC (Code: StatusCDC) - For documentation only. Does not imply CDC functionality.

The following image shows an example of Delta-Mode and some of the possible variations with optional columns.

One To One Data Copy Table - Delta-Mode and variations.One To One Data Copy Table - Delta-Mode and variations.

Posted 1 year agoby Dirk.Lerner

How to design a one to one data copy table (historized)?

Valid as of release 2.13.0 Table of type O - One To One Data Copy To populate (aka copy) all…

Valid as of release 2.13.0

Table of type O - One To One Data Copy

To populate (aka copy) all data, which are stored in the previous shown Volatile Zone table to another historized table create a new table with stereotype O - One To One Data Copy.

With this table stereotype it is irrelevant in which layer (stage, core, access or other) of the data warehouse the table is located. Possible use cases for this table type could be:

  • HiZ - Loading (Variation)
  • PSA - loading (base for virtual Data Vault)
  • ODS - Loading
  • 1:1 Copy for any target of choice, e.g. a Sandbox

This table can contain as many or fewer columns compared to the Volatile Zone table. Depends on the design decision of the data modeler and the requirements for the One To One Data Copy table.

There are basically two different modes of loading data historized into a target table:

Temporal-Mode

New and changed data is inserted into the target table and existing data is versioned when changed.

All columns with the column stereotype CDC - Compare Key are used as key for comparison, like in Delta-Mode.

The column stereotype Temporal at the column Load Timestamp (LDTS - Name and Code as configured) activates the historization of the data and switches the load process into the Temporal-Mode. If this column stereotype is not set, the load process remains in Delta-Mode.

All context columns are checked for changes. If column Hash-Diff is used, the column Hash-Diff will be used to check for changes and not the context columns.

Metadata columns are not part of the comparison and are ignored.

Use cases can be a sandbox or any historized full/delta load (update/insert data - equivalent to CDC I,U).

One additional (metadata) column is mandatory:

  • Load Timestamp (LDTS - Name and Code as configured)

Some additional (metadata) column are optional and can be used as needed:

  • Audit Trail Id (Name and Code as configured)
  • Entity Id (Name and Code as configured)
  • Load End Timestamp (LEDTS - Name and Code as configured)
  • Hash-Diff (Code: HashDiff)

The following image shows an example of Temporal-Mode and some of the possible variations with optional columns.

One To One Data Copy Table - Temporal-Mode and variations.One To One Data Copy Table - Temporal-Mode and variations.

CDC-Mode

In CDC-Mode, the identification of changes to the data and the generation of the CDC flag (StatusCDC with 'I','U','D') for the data must either take place in an upstream CDC data pipeline or the source system must already provide this accordingly.

New and changed data are inserted into the target table based on the CDC flag (StatusCDC column) and existing data are versioned (virtual or with enddating) when changed.
Deleted data is also versioned (virtually or with enddating) and an additional deletion record is inserted for documentation and audit purposes.
The CDC-Mode is the only mode that can handle and display deleted records.

All columns of the Primary Key are used for versioning the data.

The column stereotype CDC - Complete at the column StatusCDC activates the historization of the data and switches the loading process into CDC-Mode.

The StatusCDC column must be connected in the mapping.

Use cases can be a sandbox or any historized full/delta load (update/insert data - equivalent to CDC I,U).

A primary key is mandatory.

Two additional (metadata) columns are mandatory:
  • Audit Trail Id (Name and Code as configured)
  • Status CDC (Code: StatusCDC)

Some additional (metadata) column are optional and can be used as needed:

  • Entity Id (Name and Code as configured)
  • Load Timestamp (LDTS - Name and Code as configured)
  • Load End Timestamp (LEDTS - Name and Code as configured)

The following image shows an example of CDC-Mode with an upstream CDC data pipeline and some of the possible variants with optional columns.

One To One Data Copy Table - CDC-Mode and variations.One To One Data Copy Table - CDC-Mode and variations.

Posted 1 year agoby Dirk.Lerner

How to design mappings for a One To One Data Copy table (non historized)?

Mapping table of type SL - Volatile Zone to O - One To One Data Copy (non historized) The mapping…

Mapping table of type SL - Volatile Zone to O - One To One Data Copy (non historized)

The mapping of the columns from the table of type SL - Volatile Zone to type O - One To One Data Copy is kept simple. Connect the columns to be loaded by drag & drop, give the mapping a suitable name and that's it. This applies to all modes: stack, truncate and delta.

The additional metadata columns must not be mapped. The Load Process Step takes care of the loading of the columns, if applicable:

  • Audit Trail Id (Name and Code as configured)
  • Entity Id (Name and Code as configured)
  • Load Timestamp (LDTS - Name and Code as configured)
  • Load End Timestamp (LEDTS - Name and Code as configured)
  • Hash-Diff (Code: HashDiff)
  • Status CDC (Code: StatusCDC)

The following picture shows an exemplary mapping.

One To One Data Copy Stage MappingOne To One Data Copy Stage Mapping

Posted 1 year agoby Dirk.Lerner

How to design mappings for a One To One Data Copy table (historized)?

Valid as of release 2.13.0 Mapping table of type SL - Volatile Zone to O - One To One Data…

Valid as of release 2.13.0

Mapping table of type SL - Volatile Zone to O - One To One Data Copy (historized)

The mapping of the columns from the table of type SL - Volatile Zone to type O - One To One Data Copy is kept simple. Connect the columns to be loaded by drag & drop, give the mapping a suitable name and that's it.

Temporal-Mode

The additional metadata columns must not be mapped. The Load Process Step takes care of the loading of the columns:

  • Audit Trail Id (Name and Code as configured)
  • Entity Id (Name and Code as configured)
  • Load End Timestamp (LEDTS - Name and Code as configured)
  • Hash-Diff (Code: HashDiff)

The following image shows an exemplary mapping for a table with stereotype O - One To One Data Copy in Temporal-Mode.

Mapping for a One To One Data Copy table in Temporal-ModeMapping for a One To One Data Copy table in Temporal-Mode.

CDC-Mode

The additional metadata columns must not be mapped. In this context, the StatusCDC column is not treated as a metadata column, since the CDC flag is supplied from the source table. The Load Process Step takes care of the loading of the column:

  • Audit Trail Id (Name and Code as configured)
  • Entity Id (Name and Code as configured)
  • Load Timestamp (LDTS - Name and Code as configured)
  • Load End Timestamp (LEDTS - Name and Code as configured)

The following image shows an exemplary mapping for a table with stereotype O - One To One Data Copy in CDC-Mode.

Mapping for a One To One Data Copy table in CDC-ModeMapping for a One To One Data Copy table in CDC-Mode.

Posted 1 year agoby Dirk.Lerner