Reference Data Stage Pipeline

For implementing a Reference Data Stage Pipeline within the data warehouse the ExMeX Framework Core introduced with release 2.11.0 a new table stereotype for data modeling.

To use this function within PowerDesigner at least version 2.0.0 of the plugin PowerDesigner Extension - TEDAMOH ExMeX Data Vault is required!

To populate non-historized Reference Data Stage tables the ExMeX Framework Core introduced a new Load Process Step (LPS) pattern (LoadPatternId = 18).

Please always remember this:
  • After data modelling in PowerDesigner, regenerate the metadata.
  • After generating the metadata and adapting the scripts, install the metadata.

Reference Data Stage Pipeline (General)

Which Stereotypes for Reference Data Stage are available?

Table type: Reference - Stores all loaded non-historized reference data. Table stereotype: SL - Reference Zone This table stores (insert,…

Table type: Reference - Stores all loaded non-historized reference data.

  • Table stereotype: SL - Reference Zone

    This table stores (insert, update and delete) all current reference data which were available within the full current load. There is no versioning of the data.

    Hint: For historized reference data it is recommended to use Data Vault patterns in the History Zone of the Stage Layer.

  • Mandatory column: Audit Trail Id (Name and Code as configured)

    This column stores an Id which ties the loaded data to the ExMeX Logging Framework.

  • Mandatory column: Status CDC (Code: StatusCDC)

    This column stores a flag, generated by Reference Data Stage LPS: I,U
    In the case of the Reference Data Stage LPS only I and U are used.

    To ensure data quality it is recommended to use a domain for the Status CDC column (see also the following topics).

Posted 1 year agoby Dirk.Lerner

How to design a domain Status CDC?

To ensure data quality it is recommended to use a domain for the Status CDC column. Set up a domain…

To ensure data quality it is recommended to use a domain for the Status CDC column. Set up a domain with the following characteristics.

See section How to design a domain Status CDC? in Chapter Change Data Capture Pipeline.

Posted 1 year agoby Dirk.Lerner

Any background information?

DAPI The DAPI MetadataZoneDapi.MetadataForGenerateStageReference is the basis for generating the LPS with the LoadPatternId 18. Generator The script for generating…

DAPI

The DAPI MetadataZoneDapi.MetadataForGenerateStageReference is the basis for generating the LPS with the LoadPatternId 18.

Generator

The script for generating the LPS with LoadPatternId 18 is Reference-Stage.sql

Logging

Because of how the MERGE SQL statement logs data changed in the target, the ExMeX Logging Framework can display only the sum of inserted, updated, and deleted rows in the RowsInserted column of the FctProcessStepVapourTrail in the MetadataZone-Mart.

Quote from the Microsoft SQL Docs:

When used after MERGE, @@ROWCOUNT (Transact-SQL) returns the total number of rows inserted, updated, and deleted to the client.

Resources

With Release 2.13.0 this issue is solved. Inserted, Updated (Changed) and physical deleted rows are logged correctly!

Posted 1 year agoby Dirk.Lerner

Reference Data Stage Pipeline (Non-Historized)

How to design a reference table?

Table of type SL - Volatile Zone As always, a table of type Volatile Zone is required to load the…

Table of type SL - Volatile Zone

As always, a table of type Volatile Zone is required to load the data 1:1 from the source into the Stage Layer. For the Reference Data Stage Pipeline there are no special requirements to consider. The following picture shows an example of a Volatile Zone table.

Reference Data Stage Source Table My Reference TableReference Data Stage Source Table My Reference Table

Table of type SL - Reference Zone

To populate all data, which are stored in the previous shown Volatile Zone table to a non-historized reference data table create a new table with stereotype SL - Reference Zone.

A table of type SL - Reference Zone stores (insert, update and delete) all current reference data which were available within the full load. There is no versioning of the data.

This table can contain as many or fewer columns compared to the Volatile Zone table. Depends on the design decision of the data modeler and the requirements for the reference table.

Two additional (metadata) columns are mandatory:

  • Audit Trail Id (Name and Code as configured)
  • Status CDC

The metadata column Status CDC was already introduced in section Change Data Capture Pipeline. It is reused here because it serves the exact same purpose. To document the change. This column shows whether the row in the reference table is the initial insert (CDCStatus = 'I') or has already been updated (CDCStatus = 'U').

The primary key correspond to Compare Key described in section Change Data Capture Pipeline.

Reasons to populate data into the reference table are and identified by

  • New data: Primary key does not exist in reference table. Data will be inserted into reference table.
  • Changed (context) data: Changed dates are identified within all other, non metadata- and non-pk-, columns. Data will be updated in reference table.
  • Deleted data: Primary key does not exist anymore in Volatile Zone table. Data will be deleted in reference table.

 

Hint: For historized reference data it is recommended to use Data Vault patterns in the Reference Zone of the Stage Layer.

The following picture shows two examples of a reference table. One with a composite primary key including the column Valid From to detect bitemporal reference data changes as well as reference table with a single column primary key.

Reference Data Stage Target TableReference Data Stage Target Table

Posted 1 year agoby Dirk.Lerner

How to design mappings for a reference table?

Mapping table of type SL - Volatile Zone to SL - Reference Zone The mapping of the columns from the…

Mapping table of type SL - Volatile Zone to SL - Reference Zone

The mapping of the columns from the table of type SL - Volatile Zone to type SL - Reference Zone is kept simple. Connect the columns to be loaded by drag & drop, give the mapping a suitable name and that's it.

The two additional metadata columns must not be mapped. The Load Process Step takes care of the loading of the two columns:

  • Audit Trail Id (Name and Code as configured)
  • Status CDC

The following two pictures show exemplary mappings for a primary key with one column as well as with several columns (composite primary key).

Reference Data Stage Mapping to Ref Single Column KeyReference Data Stage Mapping to Ref Single Column Key

Reference Data Stage Mapping to Ref Composite KeyReference Data Stage Mapping to Ref Composite Key

Posted 1 year agoby Dirk.Lerner

Reference Data Stage Pipeline (Batch)

How to set up a batch for Reference Data Pipeline in Stage?

ExMeX Companion App Starting with version 0.2.0 of the ExMeX Companion App, the data modeler can create a new batch…

ExMeX Companion App

Starting with version 0.2.0 of the ExMeX Companion App, the data modeler can create a new batch for the Reference Data Stage Pipeline in the ExMeX Companion App.

ExMeX Framework Core Scripts

As always, the data modeler must add a new batch to the \etc\config\ConfigLoadBatchLogistic.sql file. The following SQL example shows an sample batch for the Reference Data Stage Pipeline.

-- Example Batch for Reference Data in Stage
MERGE INTO MetadataZone.LoadBatch AS target
	USING (	SELECT 'Reference Data Stage'
				  ,LoadBatchId
			FROM MetadataZone.LoadBatch
			WHERE BatchLabel = 'BigOne'
		   ) AS source (
				 BatchLabel
				,ParentLoadBatchId
				)
	ON (target.BatchLabel = source.BatchLabel)
WHEN MATCHED THEN
	UPDATE SET 
		target.ParentLoadBatchId = source.ParentLoadBatchId
WHEN NOT MATCHED THEN
	INSERT (
		 BatchLabel
		,ParentLoadBatchId
		)
	VALUES (
			 source.BatchLabel
			,source.ParentLoadBatchId
			)
;

Next, the data modeler must place the new batch in the dependencies to other batches (file \etc\config\ConfigLoadBatchLogistic.sql). In the following example SQLs the batch is arranged after the Stage Layer is loaded (Batch 'Load Stage with SSIS') and before the Batch 'Default - Hub'.

-- Stage to Reference Stage
MERGE INTO MetadataZone.LoadBatchDependency AS target
	USING (	SELECT prelb.LoadBatchId
				  ,suclb.LoadBatchId
			FROM MetadataZone.LoadBatch prelb
			CROSS JOIN MetadataZone.LoadBatch suclb
			WHERE prelb.BatchLabel = 'Load Stage with SSIS'
			AND   suclb.BatchLabel = 'Reference Data Stage'
		   ) AS source (
				 PredecessorLoadBatchId
				,SuccessorLoadBatchId)
	ON (target.PredecessorLoadBatchId = source.PredecessorLoadBatchId
	AND target.SuccessorLoadBatchId = source.SuccessorLoadBatchId)
WHEN NOT MATCHED THEN
	INSERT 
	VALUES ( source.PredecessorLoadBatchId
			,source.SuccessorLoadBatchId)
;

-- Stage to Core
MERGE INTO MetadataZone.LoadBatchDependency AS target
	USING (	SELECT prelb.LoadBatchId
				  ,suclb.LoadBatchId
			FROM MetadataZone.LoadBatch prelb
			CROSS JOIN MetadataZone.LoadBatch suclb
			WHERE prelb.BatchLabel = 'Reference Data Stage'
			AND   suclb.BatchLabel = 'Default - Hub'
		   ) AS source (
				 PredecessorLoadBatchId
				,SuccessorLoadBatchId)
	ON (target.PredecessorLoadBatchId = source.PredecessorLoadBatchId
	AND target.SuccessorLoadBatchId = source.SuccessorLoadBatchId)
WHEN NOT MATCHED THEN
	INSERT 
	VALUES ( source.PredecessorLoadBatchId
			,source.SuccessorLoadBatchId)
;

In the next step, the data modeler must add the Load Process Steps respectively the tables to the batch. The assignment is done in the file \etc\config\ConfigLoadBatchProcessStepLogistic.sql. The following two SQL statements show as an example the assignment of all tables of type SL - Reference Zone of the data model STAGE_LAYER to the new batch 'Reference Data Stage'.

-- Delete Load Process Steps from Batch
DELETE FROM MetadataZone.LoadBatchProcessStep
	WHERE LoadBatchId = (SELECT LoadBatchId
						 FROM MetadataZone.LoadBatch
						 WHERE BatchLabel = 'Reference Data Stage'
						)
;

-- DML to assign all Reference Load Process Steps to a batch
INSERT INTO MetadataZone.LoadBatchProcessStep (LoadBatchId, ProcessStepId)
		SELECT LoadBatchId, ProcessStepId
		FROM (SELECT LoadBatchId 
                FROM MetadataZone.LoadBatch 
                WHERE BatchLabel = 'Reference Data Stage') lb
	    CROSS JOIN (SELECT ProcessStepId
					FROM MetadataZone.EntityClass iec
					INNER JOIN MetadataZone.Mapping im
						ON  iec.EntityClassCode    = im.TargetEntityClassCode
						AND iec.ModelCode          = im.TargetModelCode
                    INNER JOIN MetadataZone.LoadProcessStep ilps
                        ON  ilps.MappingGUID       = im.MappingGUID
                    INNER JOIN (
                            SELECT ProcessStepTypeId
                            FROM MetadataZone.ProcessStepType
                            WHERE ProcessStepType = 'DEF'
                            ) pst
                        ON ilps.ProcessStepTypeId = pst.ProcessStepTypeId
                    -- Assign Data Models to batch
 					WHERE iec.ModelCode IN (
                                             'STAGE_LAYER'
                                            --,'StageLayerHiZ'
                                            )
                    -- Assign Stereotypes to batch
					  AND iec.Stereotype IN (
										    -- 'Volatile Zone'
                                            --,'History Zone'
                                            'Reference Zone'
                                            --,'CDC Complete'
                                            --,'CDC One Data Load Before'
                                            --,'CDC Partially'
					  					    )
                    -- Assign tables (according to naming conventions) to batch
                    -- e.g. %edSat% for EdDating-Satellites
                      AND iec.EntityClassCode     LIKE '%%'
                    -- Include or exclude mappings according to naming conventions
                    -- NoRun% -> e.g. not used integrity runs
                    -- TestRun% -> mappings not ready to run on quality or production
                    --             but available in test-batches
                      AND im.MappingName NOT LIKE 'NoRun%'
                      AND im.MappingName NOT LIKE 'TestRun%'
                      AND im.MappingName NOT LIKE 'BusinessRun%'
					) lps
;
Posted 1 year agoby Dirk.Lerner