Change Data Capture Pipeline

For implementing a CDC Data Pipeline within the data warehouse the ExMeX Framework Core introduced with release 2.9.0 new table and column stereotypes for data modeling.

To use this function within PowerDesigner at least version 2.0.0 of the plugin PowerDesigner Extension - TEDAMOH ExMeX Data Vault is required!

To populate full and partially CDC tables the ExMeX Framework Core introduced two new Load Process Step (LPS) pattern (LoadPatternId with ODLB = 16 and LoadPatternId with HiZ = 17).

Please always remember this:
  • After data modelling in PowerDesigner, regenerate the metadata.
  • After generating the metadata and adapting the scripts, install the metadata.

CDC Data Pipeline (General)

Which Stereotypes for CDC are available?

Extended table Type: Volatile Zone - 1:1 copy of source Column stereotype: CDC - Compare Key (Mandatory) This stereotype identifies…

Extended table Type: Volatile Zone - 1:1 copy of source

  • Column stereotype: CDC - Compare Key (Mandatory)

    This stereotype identifies one or many columns which drive the compare to identify changes and capture them.

  • Column stereotype: CDC - Ignore (Optional) 

    This stereotype identifies one or many columns which will be ignored in diff-columns which are used to detect changes. This means that the Load Process Step ignores changes in the data in these columns when comparing for differences.
    Nevertheless, the Load Process Step copies these data into the comparison tables (ODLB or HiZ) and CDC tables.
    If the data modeller subsequently removes or adds this stereotype to columns, note that the Hash-Diff CDC column in the comparison tables (ODLB or HiZ) must be recalculated.

Extended view Type: Volatile Zone (View) - Partial data flow of Volatile Zone

  • Column stereotype: CDC - Compare Key (Mandatory)

    This stereotype identifies one or many columns which drive the compare to identify changes and capture them.

  • Column stereotype: CDC - Ignore (Optional) 

    This stereotype identifies one or many columns which will be ignored in diff-columns which are used to detect changes. This means that the Load Process Step ignores changes in the data in these columns when comparing for differences.
    Nevertheless, the Load Process Step copies these data into the comparison tables (ODLB or HiZ) and CDC tables.
    If the data modeller subsequently removes or adds this stereotype to columns, note that the Hash-Diff CDC column in the comparison tables (ODLB or HiZ) must be recalculated.

Table type: One Data Load Before (ODLB) - Contains the previous full load to which the current full load should be compared.

  • Table stereotype: SL - CDC One Data Load Before

    This table store all data which were available one data load before the current load. The ODLB table is used to execute the change data capture.
    Hint: Alternatively an existing history table can be used.

  • Mandatory column: Hash-Diff CDC (Code: HashDiffCDC)

    This column stores a hash, generated with all non CDC Compare Key columns. It increases speed to identify changes.
    To ensure data quality it is recommended to use a domain for the Hash-Diff CDC column (see also the following topics).

Table type: Change Data Capture (CDC) - Stores all detected changes

  • Table stereotype: SL - CDC Complete

    This table stores all data identified by the CDC Data Pipeline: I,U,D

  • Table stereotype: SL - CDC Partially

    This table stores exclusively deleted data identified by the CDC Data Pipeline: D

  • Mandatory column: Status CDC (Code: StatusCDC)

    This column stores a flag, generated by CDC: I,U,D,X
    To ensure data quality it is recommended to use a domain for the Status CDC column (see also the following topics).

  • Column stereotype: MD - Audit Trail Id (Optional)

    This stereotype identifies one column which will store the Audit Trail Id provided by ExMeX Logging Framework.

Extended table Type: All Satellites (with CDC enabled)

  • Mandatory column: Status CDC (Code: StatusCDC)

    This column stores a flag, generated by CDC: I,U,D,X
    To ensure data quality it is recommended to use a domain for the Status CDC column (see also the following topics).

  • Column stereotype: CDC - Complete (Mandatory) or CDC - Partially (Mandatory)

    This stereotype identifies a column which drives the Satellite either to load in Full or Partially CDC Mode.
Posted 2 years agoby Dirk.Lerner

How to design a domain Status CDC?

To ensure data quality it is recommended to use a domain for the Status CDC column. Set up a domain…

To ensure data quality it is recommended to use a domain for the Status CDC column. Set up a domain with the following characteristics.

  • Data type: CHAR(1)
  • Mandatory: Yes
  • List of values, set complete = yes:
    I = Insert (new rows)
    U = Update (existing rows)
    D = Delete (existing rows)
    X = Undefined (only within partially CDC, which only detects Delete)

The domain ensures that only the values listed above are allowed and possible to insert. NULL is not possible.

Example within PowerDesigner

CDC Data Pipeline General Domain Status CDC Data typeCDC Data Pipeline General Domain Status CDC Data type
CDC Data Pipeline General Domain Status CDC List of valuesCDC Data Pipeline General Domain Status CDC List of values

 

Posted 1 year agoby Dirk.Lerner

How to design a domain Hash-Diff CDC?

To ensure data quality it is recommended to use a domain for the Hash-Diff CDC column in stage ODLB tables.…

To ensure data quality it is recommended to use a domain for the Hash-Diff CDC column in stage ODLB tables. Set up a domain with the following characteristics.

  • Data type: VARBINARY(20)
  • Mandatory: Yes

The domain ensures that always a Hash-Diff CDC value is provided. NULL is not possible.

Example within PowerDesigner

CDC Data Pipeline General Domain Hash Diff CDC Data typeCDC Data Pipeline General Domain Hash Diff CDC Data type

Posted 1 year agoby Dirk.Lerner

CDC Data Pipeline (Stage Layer)

How to design a CDC table?

Table of type SL - Volatile Zone As always, a table of stereotype SL - Volatile Zone is required to…

Table of type SL - Volatile Zone

As always, a table of stereotype SL - Volatile Zone is required to load the data 1:1 from the source into the Stage Layer. For the CDC Data Pipeline there are some requirements to consider. The following picture shows an example of a Volatile Zone table.

CDC Data Pipeline Stage Volatile Zone Table Artwork In MuseumCDC Data Pipeline Stage Volatile Zone Table Artwork In Museum

As already written in Which Stereotypes for CDC are available? there are two new stereotypes for columns in the CDC Data Pipeline:

  • CDC - Compare Key (Mandatory)
  • CDC - Ignore (Optional)

In the Volatile Zone table shown above, the column Artwork is set to stereotype CDC - Compare Key. This means for the CDC Load Process Step (LPS) that this column (or columns) is used to identify the new, changed or deleted records.

All other columns are used in the Change Data Capture process. This means that the CDC LPS checks these columns for changes.

The columns Other Column 4 and Other Column 5 are set to stereotype CDC - Ignore. This means for the CDC LPS that this columns (or a single column) are ignored in the Change Data Capture process.

Changed data in columns with the column stereotype CDC - Ignore are only updated in the table of type SL - CDC One Data Load Before, if the content of a non-CDC - Ignore column has also changed.

As described above, changed data will be ignored by CDC - Ignore and therefore changed data will not be transported into the OLDB or CDC table. As long as no non-CDC - Ignore column has changed!

If the columns in the Core Layer (Data Vault) are not needed, then the columns should be removed completely! We do NOT recommend to use the column Stereotype CDC - Ignore to "remove" columns! This could lead to misunderstandings why or why data has not been historized in the satellite.

View of type SL - Volatile Zone View

To reduce the data flow to a partial data flow a view of stereotype SL - Volatile Zone View is required. For the CDC Data Pipeline there are some requirements to consider. The following picture shows an example of a Volatile Zone view to create a partial data flow.

CDC Data Pipeline Stage Volatile Zone View for partial data flow Artwork In MuseumCDC Data Pipeline Stage Volatile Zone View for partial data flow Artwork In Museum

As already written in Which Stereotypes for CDC are available? there are two new stereotypes for columns in the CDC Data Pipeline:

  • CDC - Compare Key (Mandatory)
  • CDC - Ignore (Optional)

In the Volatile Zone view shown above, the column Artwork is set to stereotype CDC - Compare Key. This means for the CDC Load Process Step (LPS) that this column (or columns) is used to identify the new, changed or deleted records.

All other columns are used in the Change Data Capture process. This means that the CDC LPS checks these columns for changes.

Changed data in columns with the column stereotype CDC - Ignore are only updated in the table of type SL - CDC One Data Load Before, if the content of a non-CDC - Ignore column has also changed.

As described above, changed data will be ignored by CDC - Ignore and therefore changed data will not be transported into the OLDB or CDC table. As long as no non-CDC - Ignore column has changed!

If the columns in the Core Layer (Data Vault) are not needed, then the columns should be removed completely! We do NOT recommend to use the column Stereotype CDC - Ignore to "remove" columns! This could lead to misunderstandings why or why data has not been historized in the satellite.

Table of type SL - CDC One Data Load Before

For the CDC LPS a new table type is necessary with the stereotype SL - CDC One Data Load Before (ODLB).

This table contains the version of the data that the LPS loaded into the table of type SL - Volatile Zone during the previous execution.

In the case of the Partial CDC:
ODLB and HiZ tables are not maintained (loaded) by the Load Process Step (LPS). The loading of the ODLB or HiZ table must be ensured by another, additional LPS.

This table can contain as many or fewer columns compared to the Volatile Zone table. Depends on the design decision of the data modeler and the requirements for the ODLB and CDC table.

One additional column is mandatory:

  • Hash-Diff CDC (Code: HashDiffCDC)

The Hash-Diff CDC column contains a hash value that the LPS has calculated from all columns that do not have a stereotype in the Volatile Zone Table. By comparing this hash value, the LPS detects whether there are any changes in content between the current and the previous run in any of the columns.

The column Hash-Diff CDC is only necessary in the CDC Data Pipeline where the CDC table has the stereotype SL - CDC Complete.

The following picture shows an example of an ODLB table in the Stage Layer.

CDC Data Pipeline Stage ODLB Table Artwork In Museum ODLBCDC Data Pipeline Stage ODLB Table Artwork In Museum ODLB

 

Variation

It is possible for the CDC LPS, instead of creating a new OLDB table, to use an existing table from the History Zone. However, we do NOT recommend this for a Complete/Partially CDC Data Pipeline because a) the use of the column Hash-Diff CDC would only be possible with considerable effort and b) the existing table will NOT be longer the table with originally intended characteristics, e.g. a history table (see warning below).

If a table from the history zone is used for a Partially CDC Data Pipeline, it should be noted that for the CDC LPS only exactly one previous data version is visible in the table of the history zone. This could be achieved e.g. via

  • an updatable view or
  • the housekeeping (e.g. set HousekeepingUnit = ‘ProcessStepRun’ and HousekeepingAmount = 1).

Keep in mind that the MERGE statement will not only write deleted rows into the CDC table but also delete them in the History Zone table!

Table of type SL - CDC Complete

For the Complete CDC data pipeline a last new table type is necessary with the stereotype SL - CDC Complete.

This table contains only changes (Insert, Update and Delete) but all of the data that the CDC LPS identified. Data which has not changed are finally ignored by the CDC LPS.

This table can contain as many or fewer columns compared to the Volatile Zone table. Depends on the design decision of the data modeler and the requirements for the ODLB and CDC table.

One additional column is mandatory:

  • Status CDC (Code: StatusCDC) - By the way (as of release 2.11.0), it does not matter where in the table this column is. It does not have to be in the last position.

The following picture shows an example of an Complete CDC table in the Stage Layer.

CDC Data Pipeline Stage Complete CDC Table Artwork In Museum CDCCDC Data Pipeline Stage Complete CDC Table Artwork In Museum CDC

As of Release 2.11.0, you can assign a stereotype to a column in tables with type SL - CDC Complete/Partially:

  • MD - Audit Trail Id (Optional)

This stereotype identifies one column which will store the Audit Trail Id provided by ExMeX Logging Framework.

The following picture shows an example of an Complete CDC table in the Stage Layer with a column of type MD - Audit Trail Id.

CDC Data Pipeline Stage Complete CDC Table Stg Clever CDC with Audit Trail IdCDC Data Pipeline Stage Complete CDC Table Stg Clever CDC with Audit Trail Id

The following pictures showing all parts for a Complete CDC Data Pipeline assembled.

CDC Data Pipeline Stage Complete and Partial Data FlowComplete CDC data pipeline and with Partial Data Flow via View

CDC Data Pipeline Stage Complete CDC Table Clever CDC with Audit Trail IdCDC Data Pipeline Stage Complete CDC Table Clever CDC with Audit Trail Id

Table of type SL - CDC Partially

For the Partially CDC data pipeline a new table type is necessary with the stereotype SL - CDC Partially and an additional view which unions identified deleted rows and the full load of the Volatile Zone table.

ODLB and HiZ tables are not maintained (loaded) by the Load Process Step (LPS). The loading of the ODLB or HiZ table must be ensured by another, additional LPS.

This table contains only deleted data that the CDC LPS identified. Data which has not changed, new or updated are finally ignored by the (partially) CDC LPS.

This table can contain as many or fewer columns compared to the Volatile Zone table. Depends on the design decision of the data modeler and the requirements for the ODLB and CDC table.

One additional column is mandatory:

  • Status CDC (Code: StatusCDC) - By the way (as of release 2.11.0), it does not matter where in the table this column is. It does not have to be in the last position.

The following picture shows an example of an PartiallyCDC table in the Stage Layer.

CDC Data Pipeline Stage Partially CDC Table Vehicle License Plate CDCCDC Data Pipeline Stage Partially CDC Table Vehicle License Plate CDC

Now the data modeler has to design the view for unifying the full load and the deleted data sets identified by the CDC LPS. The following picture shows a view that combines the Volatile Zone (set the StatusCDC = 'X' in SQL) table and the Partially CDC (set by the CDC LPS - StatusCDC = 'D') table with UNION ALL.

CDC Data Pipeline Stage Partially CDC View Vehicle License Plate CDCCDC Data Pipeline Stage Partially CDC View Vehicle License Plate CDC

The following picture shows all parts for a Partially CDC Data Pipeline assembled.

CDC Data Pipeline Stage PartiallyCDC Data Pipeline Stage Partially

Optimizing CDC Performance

To improve the performance of the CDC (MERGE statement), we recommend the following index guidelines:

  • Create an index on the CDC Compare Key columns (join columns) in the source table that is unique and covering.
  • Create a unique clustered index on the join columns (associated to the CDC Compare Key columns in the Stage Table) in the target table (ODLB or history).

These indexes ensure that the join keys are unique and the data in the tables is sorted. Query performance is improved because the query optimizer does not need to perform extra validation processing to locate and update duplicate rows and additional sort operations are not necessary.

Posted 1 year agoby Dirk.Lerner

How to design mappings for a CDC data Pipeline with ODLB?

Depending on how the data models are structured in the project, it is necessary that the data modeler creates the…

Depending on how the data models are structured in the project, it is necessary that the data modeler creates the Stage Layer Data Model (with the tables of type SL - Volatile Zone) for the Stage Layer Data Model (with the tables of type SL - CDC One Data Load Before) as data source.

For clarity, the Data Modeller can create the Stage Layer Data Model (with the tables of type SL - CDC One Data Load Before) for the Stage Layer Data Model (with the tables of type SL - CDC Complete/Partially) a second time as Data Source. However, this is not necessary, but optional!

CDC Data Pipeline Stage List of Data Sources

First: Mapping table of type SL - Volatile Zone (or SL - Volatile Zone View) to SL - One Data Load Before

The mapping of the columns from the table of type SL - Volatile Zone (or SL - Volatile Zone View) to type SL - One Data Load Before is kept simple. Connect the columns to be loaded by drag & drop, give the mapping a suitable name and that's it.

The additional Hash-Diff CDC column must not be mapped. The Load Process Step takes care of the loading of the column.

The following picture shows an exemplary mapping.

CDC Stage to ODLB Mapping EditorCDC Stage to ODLB Mapping Editor

Only columns that are present and mapped in the table of type SL - One Data Load Before will be loaded by the CDC LPS and written to the table of type SL - CDC Complete/Partially at the end. Not to be mistaken for the optional column stereotype CDC - Ignore.

Second: Mapping table of type SL - One Data Load Before to SL - CDC Complete/Partially

The mapping of the columns from the table of type SL - One Data Load Before to type SL - CDC Complete/Partially, again, is kept simple. Connect the columns to be loaded by drag & drop, give the mapping a suitable name and that's it.

The additional Status CDC column must not be mapped. The Load Process Step takes care of the loading of the column.

The following picture shows an exemplary mapping.

CDC ODLB to CDC Mapping EditorCDC ODLB to CDC Mapping Editor

Posted 1 year agoby Dirk.Lerner

How to design mappings for a CDC data Pipeline with History Zone?

First: Mapping table of type SL - Volatile Zone (or SL - Volatile Zone View) to SL - History Zone The…

First: Mapping table of type SL - Volatile Zone (or SL - Volatile Zone View) to SL - History Zone

The mapping of the columns from the table of type SL - Volatile Zone (or SL - Volatile Zone View) to type SL - History Zone is kept simple. Connect the columns to be loaded by drag & drop, give the mapping a suitable name and that's it.

If you have chosen this option, this should already have happened elsewhere.

Second: Mapping table of type SL - Volatile Zone to SL - CDC Complete/Partially

The mapping of the columns from the table of type SL - Volatile Zone to type SL - CDC Complete/Partially, again, is kept simple. Connect the columns to be loaded by drag & drop, give the mapping a suitable name and that's it.

The additional Status CDC column must not be mapped. The Load Process Step takes care of the loading of the column.

The following picture shows an exemplary mapping.

CDC Stage to CDC HiZ Mapping EditorCDC Stage to CDC HiZ Mapping Editor

Posted 1 year agoby Dirk.Lerner

Any background information?

DAPI The DAPI MetadataZone.MetadataForGenerateStageCDC is the basis for generating the LPS with the LoadPatternId 16 (ODLB table) and LoadPatternId 17…

DAPI

The DAPI MetadataZone.MetadataForGenerateStageCDC is the basis for generating the LPS with the LoadPatternId 16 (ODLB table) and LoadPatternId 17 (History Zone table).

Generator

The script for generating the LPS with LoadPatternId 16 is CDC-ODLB-Stage.sql

The script for generating the LPS with LoadPatternId 17 is CDC-HiZ-Stage.sql

Logging

Because of how the MERGE SQL statement logs data changed in the target, the ExMeX Logging Framework can display only the sum of inserted, updated, and deleted rows in the RowsInserted column of the FctProcessStepVapourTrail in the MetadataZone-Mart.

Quote from the Microsoft SQL Docs:

When used after MERGE, @@ROWCOUNT (Transact-SQL) returns the total number of rows inserted, updated, and deleted to the client.

Resources

Posted 1 year agoby Dirk.Lerner

CDC Data Pipeline (Core Layer)

How to design a Satellite to populate with CDC data?

To add CDC functionality to a satellite, simply add the column Status CDC and mark it with a stereotype. For…

To add CDC functionality to a satellite, simply add the column Status CDC and mark it with a stereotype.

For the Status CDC column the appropriate stereotype (CDC - Partially or CDC - Complete) must be selected, depending on the CDC data of the corresponding source table (of type SL - CDC Complete or SL - CDC Partially) in the Stage Layer.

Activating the CDC functionality is kept as simple as possible. The data modeler does not need to do more than add the column and set stereotypes for the column stereotype in the Data Vault Pattern.

The following pictures show an example of the configuration for a satellite with CDC functionality. One with Partially CDC and one with Complete CDC.

CDC Data Pipeline Core Raw Data Zone Table Sat Smart with CDC CompleteCDC Data Pipeline Core Raw Data Zone Table Sat Smart with CDC Complete

 

CDC Data Pipeline Core Raw Data Zone Table Sat Smart with CDC PartiallyCDC Data Pipeline Core Raw Data Zone Table Sat Smart with CDC Partially

The following table gives an overview of which Data Vault table stereotypes support CDC.

Data Vault Table  Table Stereotype CDC Complete CDC Partially
 Hub  CL - Hub  No  No
 Link  CL - Link  No  No
 Satellite (Hub)  CL -Satellite  Yes
 Yes
 CL - Satellite HashDiff  Yes (see hint)  Yes
 CL - Satellite Insert Only  No  No
 CL - Satellite bitemporal  No  No
 CL - Satellite bitemporal SI AlignsWith  No  No
 Satellite (Link)       CL - Link-Satellite  Yes  Yes
 CL - Link-Satellite EndDating  Yes  Yes
 CL - Link-Satellite HashDiff  Yes (see hint)  Yes
 CL - Link-Satellite Insert Only  No  No
 CL - Link-Satellite bitemporal  No  No
 CL - Link-Satellite bitemporal SI AlignsWith  No  No

To ensure data quality it is recommended to use a domain for the Status CDC column (see also the following topics).

Satellite with Hash-Diff - Only makes sense with Partially otherwise Hash-Diff is not needed.

Posted 1 year agoby Dirk.Lerner

How to design mappings for a CDC satellite (Hub)?

Mapping table of type SL - CDC Complete/Partially to CL - Satellite* The mapping of the columns from the table…

Mapping table of type SL - CDC Complete/Partially to CL - Satellite*

The mapping of the columns from the table of type SL - CDC Complete/Partially to type CL - Satellite* is kept simple. Connect the columns to be loaded from the table or view by drag & drop, give the mapping a suitable name and that's it.

The Business Key (Source, here column: Smart BK) has to be mapped to the Surrogate Key (Target, here column: Smart SK) in the Satellite.

The additional CDC column has to be mapped, too:

  • Status CDC

The exemplary additional Data Vault metadata columns must not be mapped. The Load Process Step takes care of the loading of the columns:

  • Load Timestamp
  • Load End Timestamp
  • Entity Id
  • Audit Trail Id (Name and Code as configured)

The following picture show exemplary mappings for a CDC Satellite. It doesn’t matter if the Satellite is of Partially or Complete CDC functionality.

CDC Stage to Core Mapping Editor Satellite HubCDC Stage to Core Mapping Editor Satellite Hub

Don’t forget the integrity load to the associated Hub!

Posted 1 year agoby Dirk.Lerner

How to design mappings for a CDC satellite (Link)?

Mapping table of type SL - CDC Complete/Partially to CL - Link Satellite* The mapping of the columns from the…

Mapping table of type SL - CDC Complete/Partially to CL - Link Satellite*

The mapping of the columns from the table of type SL - CDC Complete/Partially to type CL - Link Satellite* is kept simple. Connect the columns to be loaded from the table or view by drag & drop, give the mapping a suitable name and that's it.

The Business Key (Source, here columns: Registered Vehicle and License Plate) has to be mapped to the Surrogate Key (Target, here column: License Plate Belongs Vehicle SK) in the Link-Satellite (here and End-Dating Satellite).

The additional CDC column has to be mapped, too:

  • Status CDC

The exemplary additional Data Vault metadata columns must not be mapped. The Load Process Step takes care of the loading of the columns:

  • Load Timestamp
  • Load End Timestamp
  • Entity Id
  • Audit Trail Id (Name and Code as configured)

The following picture show exemplary mappings for a CDC Link-Satellite. It doesn’t matter if the Link-Satellite is of Partially or Complete CDC functionality.

CDC Stage to Core Mapping Editor edSatellite LinkCDC Stage to Core Mapping Editor edSatellite Link

Don’t forget the integrity load to the associated Link and Hubs!

Posted 1 year agoby Dirk.Lerner

CDC Data Pipeline (Batch)

How to set up a batch for a CDC Data Pipeline in Stage?

ExMeX Companion App Starting with version 0.2.0 of the ExMeX Companion App, the data modeler can create a new batch…

ExMeX Companion App

Starting with version 0.2.0 of the ExMeX Companion App, the data modeler can create a new batch for the Reference Data Stage Pipeline in the ExMeX Companion App.

ExMeX Framework Core Scripts

As always, the data modeler must add a new batch to the \etc\config\ConfigLoadBatchLogistic.sql file. The following SQL example shows an sample batch for the CDC Data Pipeline.

-- Example Batch for CDC Data Pipeline
MERGE INTO MetadataZone.LoadBatch AS target
	USING (	SELECT 'CDC Data Pipeline'
				  ,LoadBatchId
			FROM MetadataZone.LoadBatch
			WHERE BatchLabel = 'BigOne'
		   ) AS source (
				 BatchLabel
				,ParentLoadBatchId
				)
	ON (target.BatchLabel = source.BatchLabel)
WHEN MATCHED THEN
	UPDATE SET 
		target.ParentLoadBatchId = source.ParentLoadBatchId
WHEN NOT MATCHED THEN
	INSERT (
		 BatchLabel
		,ParentLoadBatchId
		)
	VALUES (
			 source.BatchLabel
			,source.ParentLoadBatchId
			)
;

Next, the data modeler must place the new batch in the dependencies to other batches (file \etc\config\ConfigLoadBatchLogistic.sql). In the following example SQLs the batch is arranged after the Stage Layer is loaded (Batch 'Load Stage with SSIS') and before the Batch 'Default - Hub'.

-- Stage to CDC Stage
MERGE INTO MetadataZone.LoadBatchDependency AS target
	USING (	SELECT prelb.LoadBatchId
				  ,suclb.LoadBatchId
			FROM MetadataZone.LoadBatch prelb
			CROSS JOIN MetadataZone.LoadBatch suclb
			WHERE prelb.BatchLabel = 'Load Stage with SSIS'
			AND   suclb.BatchLabel = 'CDC Data Pipeline'
		   ) AS source (
				 PredecessorLoadBatchId
				,SuccessorLoadBatchId)
	ON (target.PredecessorLoadBatchId = source.PredecessorLoadBatchId
	AND target.SuccessorLoadBatchId = source.SuccessorLoadBatchId)
WHEN NOT MATCHED THEN
	INSERT 
	VALUES ( source.PredecessorLoadBatchId
			,source.SuccessorLoadBatchId)
;

-- Stage to Core
MERGE INTO MetadataZone.LoadBatchDependency AS target
	USING (	SELECT prelb.LoadBatchId
				  ,suclb.LoadBatchId
			FROM MetadataZone.LoadBatch prelb
			CROSS JOIN MetadataZone.LoadBatch suclb
			WHERE prelb.BatchLabel = 'CDC Data Pipeline'
			AND   suclb.BatchLabel = 'Default - Hub'
		   ) AS source (
				 PredecessorLoadBatchId
				,SuccessorLoadBatchId)
	ON (target.PredecessorLoadBatchId = source.PredecessorLoadBatchId
	AND target.SuccessorLoadBatchId = source.SuccessorLoadBatchId)
WHEN NOT MATCHED THEN
	INSERT 
	VALUES ( source.PredecessorLoadBatchId
			,source.SuccessorLoadBatchId)
;

In the next step, the data modeler must add the Load Process Steps respectively the tables to the batch. The assignment is done in the file \etc\config\ConfigLoadBatchProcessStepLogistic.sql. The following two SQL statements show as an example the assignment of all tables of type SL - CDC One Data Load Before, SL - CDC Complete and SL - CDC Partially of the data model STAGE_LAYER to the new batch 'CDC Data Pipeline'.

-- Delete Load Process Steps from Batch
DELETE FROM MetadataZone.LoadBatchProcessStep
	WHERE LoadBatchId = (SELECT LoadBatchId
						 FROM MetadataZone.LoadBatch
						 WHERE BatchLabel = 'CDC Data Pipeline'
						)
;

-- DML to assign all CDC Load Process Steps to a batch
INSERT INTO MetadataZone.LoadBatchProcessStep (LoadBatchId, ProcessStepId)
		SELECT LoadBatchId, ProcessStepId
		FROM (SELECT LoadBatchId 
                FROM MetadataZone.LoadBatch 
                WHERE BatchLabel = 'CDC Data Pipeline') lb
	    CROSS JOIN (SELECT ProcessStepId
					FROM MetadataZone.EntityClass iec
					INNER JOIN MetadataZone.Mapping im
						ON  iec.EntityClassCode    = im.TargetEntityClassCode
						AND iec.ModelCode          = im.TargetModelCode
                    INNER JOIN MetadataZone.LoadProcessStep ilps
                        ON  ilps.MappingGUID       = im.MappingGUID
                    INNER JOIN (
                            SELECT ProcessStepTypeId
                            FROM MetadataZone.ProcessStepType
                            WHERE ProcessStepType = 'DEF'
                            ) pst
                        ON ilps.ProcessStepTypeId = pst.ProcessStepTypeId
                    -- Assign Data Models to batch
 					WHERE iec.ModelCode IN (
                                             'STAGE_LAYER'
                                            --,'StageLayerHiZ'
                                            )
                    -- Assign Stereotypes to batch
					  AND iec.Stereotype IN (
										    -- 'Volatile Zone'
                                            --,'History Zone'
                                            --'Reference Zone'
                                            ,'CDC Complete'
                                            ,'CDC One Data Load Before'
                                            ,'CDC Partially'
					  					    )
                    -- Assign tables (according to naming conventions) to batch
                    -- e.g. %edSat% for EdDating-Satellites
                      AND iec.EntityClassCode     LIKE '%%'
                    -- Include or exclude mappings according to naming conventions
                    -- NoRun% -> e.g. not used integrity runs
                    -- TestRun% -> mappings not ready to run on quality or production
                    --             but available in test-batches
                      AND im.MappingName NOT LIKE 'NoRun%'
                      AND im.MappingName NOT LIKE 'TestRun%'
                      AND im.MappingName NOT LIKE 'BusinessRun%'
					) lps
;
Posted 1 year agoby admin.tedamoh