Batch & LPS Orchestration

For the implementation of a data pipeline within the data warehouse, batches were introduced in the ExMeX Framework Core from the beginning.

Each data pipeline may start with it’s own Master-Batch or is part of a Child-Batch and can exists of four Levels of children-batches, aka batch hierarchy.

There are various possibilities to influence the behavior of the batches via global variables or batch-specific settings.

By assigning Load Process Steps (LPSs) to each batch, a finer orchestration of the LPSs with the batches is already possible. There are various possibilities to influence the behavior of the LPSs via global variables, batch settings or even LPS-specific settings.

reproductive health supplies coalition i2I0 u98Rh4 unsplash

(Photo by Reproductive Health Supplies Coalition on Unsplash)

A pile of old, used pallets for transporting heavy goods in trucks, trains etc. Just as data is transported (LPS as pallets) in one or more batches (truck or train) via hyperloop in a data pipeline.

Batch Orchestration (Core Scripts)

How to add a Master-Batch?

A master-batch is the natural entry point for a data pipeline and all the batches and load process steps (LPS)…

A master-batch is the natural entry point for a data pipeline and all the batches and load process steps (LPS) it contains. Usually, only this batch is called via the stored procedure uspBatchExecuteParallelLoadProcessStep.

ExMeX Companion App

Starting with version 0.2.0 of the ExMeX Companion App, the data modeler can add a master-batch in the ExMeX Companion App.

ExMeX Framework Core Scripts

As always, the data modeler or DevOp has to edit the file \etc\config\ConfigLoadBatchLogistic.sql.

The following SQL example shows how to add a sample master-batch 'ExMeX Master'. The same SQL example can also be used to convert an existing batch into a master-batch.

To define a master-batch, the parent batch must be NULL!

/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
-- Insert Batch without Parent Batch                                          --

-- Master Level
MERGE INTO MetadataZone.LoadBatch AS target
	USING (	SELECT 'ExMeX Master'
		   ) AS source (BatchLabel)
	ON (target.BatchLabel = source.BatchLabel)
WHEN NOT MATCHED THEN
	INSERT (BatchLabel)
	VALUES (source.BatchLabel)
WHEN MATCHED THEN
    UPDATE SET 
    target.ParentLoadBatchId = NULL            
;
/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/

It does matter where in the file the SQL is inserted. It should be at the very beginning, of course. Before children-batches or dependencies are inserted.

After reinstalling the metadata, the master-batch is added.

You can double check your batch hierarchy with

SELECT *
FROM MetadataZone.BatchHierarchy
Posted 1 year agoby Dirk.Lerner

How to add a Children-Batch (Hierarchy)?

A child batch usually contains one or more Load Process Steps (LPS), grouped or organised according to the type of…

A child batch usually contains one or more Load Process Steps (LPS), grouped or organised according to the type of LPS or for logical and logistical reasons. All LPS in a batch are executed in parallel by the ExMeX Framework Core, if possible, depending on the maximum parallality setting.

ExMeX Companion App

Starting with version 0.2.0 of the ExMeX Companion App, the data modeler can add one or more Children-Batch(es) in the ExMeX Companion App.

ExMeX Framework Core Scripts

As always, the data modeler or DevOp has to edit the file \etc\config\ConfigLoadBatchLogistic.sql.

The following SQL example shows how to add a sample children batch ‘ExMeX MD‘ (Level 1 in the hierarchy) to the master-batch 'ExMeX Master'. The same SQL example can also be used to reorganize a children-batch to another master- or children-batch.

/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
-- Level 1

MERGE INTO MetadataZone.LoadBatch AS target
	USING (	SELECT 'ExMeX MD'
				  ,LoadBatchId
			FROM MetadataZone.LoadBatch
			WHERE BatchLabel = 'ExMeX Master'
		   ) AS source (
				 BatchLabel
				,ParentLoadBatchId
				)
	ON (target.BatchLabel = source.BatchLabel)
WHEN MATCHED THEN
	UPDATE SET 
		target.ParentLoadBatchId = source.ParentLoadBatchId
WHEN NOT MATCHED THEN
	INSERT (
		 BatchLabel
		,ParentLoadBatchId
		)
	VALUES (
			 source.BatchLabel
			,source.ParentLoadBatchId
			)
;
/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/

The following SQL example shows how to add several sample children batches (‘ExMeX MD - Hub‘, ‘ExMeX MD - Link‘, ‘ExMeX MD - Satellite‘) (Level 2  in the hierarchy) to the children-batch 'ExMeX MD'. The same SQL example can also be used to reorganize a children-batch to another master- or children-batch.

/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
-- Level 2

MERGE INTO MetadataZone.LoadBatch AS target
	USING (	SELECT 'ExMeX MD - Hub', LoadBatchId 
                  FROM MetadataZone.LoadBatch WHERE BatchLabel = 'ExMeX MD'
            UNION ALL
            SELECT 'ExMeX MD - Link', LoadBatchId 
               FROM MetadataZone.LoadBatch WHERE BatchLabel = 'ExMeX MD'
            UNION ALL
            SELECT 'ExMeX MD - Satellite', LoadBatchId 
               FROM MetadataZone.LoadBatch WHERE BatchLabel = 'ExMeX MD'
		   ) AS source (
				 BatchLabel
				,ParentLoadBatchId
				)
	ON (target.BatchLabel = source.BatchLabel)
WHEN MATCHED THEN
	UPDATE SET 
		target.ParentLoadBatchId = source.ParentLoadBatchId
WHEN NOT MATCHED THEN
	INSERT (
		 BatchLabel
		,ParentLoadBatchId
		)
	VALUES (
			 source.BatchLabel
			,source.ParentLoadBatchId
			)
;
/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/

It does matter where in the file the SQL is inserted. After adding the batches used in the hierarchy. For levels of batches in the hierarchy are possible.

After reinstalling the metadata, the children-batch(es) is/are added.

You can double check your batch hierarchy with

SELECT *
FROM MetadataZone.BatchHierarchy
Posted 1 year agoby Dirk.Lerner

How to add a Batch dependeny?

The ExMeX Framework Core is designed in such a way that batches can be executed in parallel on the same…

The ExMeX Framework Core is designed in such a way that batches can be executed in parallel on the same level. If this is not desired, a dependency can be created between batches. For example, all raw data must be processed before applying business rules to the raw data.

ExMeX Companion App

Starting with version 0.2.0 of the ExMeX Companion App, the data modeler can add a batch dependency in the ExMeX Companion App.

ExMeX Framework Core Scripts

As always, the data modeler or DevOp has to edit the file \etc\config\ConfigLoadBatchLogistic.sql.

The following SQL example shows how to add sample batch dependencies for the dependency between batches 'ExMeX MD - Hub' and 'ExMeX MD - Link' as well as the dependency between batches 'ExMeX MD - Link' and 'ExMeX MD - Satellite'.

The former are the predecessors and the latter are the successors.

This means that the batch 'ExMeX MD - Hub' must be finished before the batch 'ExMeX MD - Link' starts and the batch 'ExMeX MD - Link' must be finished before the batch 'ExMeX MD - Satellite' can start.

/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
-- Insert Batch Dependency                                                    —

MERGE INTO MetadataZone.LoadBatchDependency AS target
	USING (	SELECT prelb.LoadBatchId
				  ,suclb.LoadBatchId
			FROM MetadataZone.LoadBatch prelb
			CROSS JOIN MetadataZone.LoadBatch suclb
			WHERE prelb.BatchLabel = 'ExMeX MD - Hub'
			AND   suclb.BatchLabel = 'ExMeX MD - Link'
            UNION ALL
            SELECT prelb.LoadBatchId
				  ,suclb.LoadBatchId
			FROM MetadataZone.LoadBatch prelb
			CROSS JOIN MetadataZone.LoadBatch suclb
			WHERE prelb.BatchLabel = 'ExMeX MD - Link'
			AND   suclb.BatchLabel = 'ExMeX MD - Satellite'
		   ) AS source (
				 PredecessorLoadBatchId
				,SuccessorLoadBatchId)
	ON (target.PredecessorLoadBatchId = source.PredecessorLoadBatchId
	AND target.SuccessorLoadBatchId = source.SuccessorLoadBatchId)
WHEN NOT MATCHED THEN
	INSERT 
	VALUES ( source.PredecessorLoadBatchId
			,source.SuccessorLoadBatchId)
;
/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/

It does matter where in the file the SQL is inserted. After adding the batches used in the dependency.

After reinstalling the metadata, the dependency is added.

You can double check your batch dependency with

SELECT *
FROM MetadataZoneCore.LoadBatchDependencyWithLabel
Posted 1 year agoby Dirk.Lerner

How to delete a Batch dependeny?

The ExMeX Framework Core is designed in such a way that batches can be executed in parallel on the same…

The ExMeX Framework Core is designed in such a way that batches can be executed in parallel on the same level. If this is not desired, a dependency can be created between batches. For example, all raw data must be processed before applying business rules to the raw data.

If this dependency is not to exist any longer, it can be removed.

ExMeX Companion App

Starting with version 0.2.0 of the ExMeX Companion App, the data modeler can delete an existing batch dependency in the ExMeX Companion App.

ExMeX Framework Core Scripts

As always, the data modeler or DevOp has to edit the file \etc\config\ConfigLoadBatchLogistic.sql.

The following SQL example shows a sample batch dependency deletion for the dependency between batches 'Source Interface A 2 Stage' and 'Load Stage with SSIS'. The former is the predecessor and the latter the successor.

/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
-- Delete Batch Dependency                                                    --

INSERT INTO @DeleteBatchDependency (PredecessorBatchLabel,SuccessorBatchLabel)
    VALUES ('Source Interface A 2 Stage','Load Stage with SSIS')
;
/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/

In the table variable @DeleteBatchDependency, the data modeler or DevOp enters the dependency to be deleted between for each two batches via insert DML.

It does not matter where in the file the SQL is inserted.

After reinstalling the metadata, the dependency is deleted.

You can double check your batch dependency with

SELECT *
FROM MetadataZoneCore.LoadBatchDependencyWithLabel
Posted 1 year agoby Dirk.Lerner

How to activate/deactivate a Batch?

If there is a need to remove a batch from regular execution, a data modeller or DevOp can disable it.…

If there is a need to remove a batch from regular execution, a data modeller or DevOp can disable it. All child batches of the batch deactivated by the data modeller or DevOp are also automatically deactivated.

Later, of course, the data modeller or DevOp can activate the batch again, including all child batches.

ExMeX Companion App

Starting with version 0.2.0 of the ExMeX Companion App, the data modeler can activate/deactivate one or more batch(es) in the ExMeX Companion App.

ExMeX Framework Core Scripts

As of now this feature is not available as script.

Posted 1 year agoby Dirk.Lerner

Global & Batch Variables Configuration

 How do global or batch variables affect batch execution?

The global and batch configuration variables listed in the following sections directly or indirectly influence the execution of batches. In…

The global and batch configuration variables listed in the following sections directly or indirectly influence the execution of batches. In general, the global variables shown in this section affect all batches.

However, some of these global variables can be overwritten once for the current execution when calling the usp uspBatchExecuteLoadProcessStep (see Manual Batch.sql) or permanently per batch via the configuration of the batches with the ExMeX Companion App.

An overview of the global variables and their current configuration is provided by the view MetadataZone.ConfigExMeXFramework or in the Getting Started -> Configuration section.

Mode of action according to priority

1stVariable passed to the stored procedure uspBatchExecuteLoadProcessStep

2ndVariable configured in the batch(currently via the ExMeX Companion App)

3rdGlobally configured variables

Posted 1 year agoby Dirk.Lerner

How to use variable ExitOnError?

Valid as of release 2.12.0 How a batch behaves within a data pipeline in the case of an LPS error…

Valid as of release 2.12.0

How a batch behaves within a data pipeline in the case of an LPS error can be changed by the variable ExitOnError

ExitOnError: If an error occurs in the batch, the batch is terminated (Exit) or the error is ignored (Off) and the batch continues; Default is to NULL in usp uspBatchExecuteParallelLoadProcessStep (0 = Off, 1 = Exit').

Mode of action according to priority

1stExitOnError Variable passed to the stored procedure uspBatchExecuteLoadProcessStep - e.g. in script ManualBatch.sql

2ndVariable configured in the batch - via the ExMeX Companion App

3rdGlobally configured variables - during installation

If the setting of the batch or the global variable is to take effect, then the parameter @ExitOnError must be set to NULL (@ExitOnError = NULL) or optionally omitted when calling the stored procedure uspBatchExecuteLoadProcessStep! Otherwise, the value passed is always used by the stored procedure as described in the priority order.

ExMeX Companion App

With the ExMeX Companion App, the data modeler can configure individual batch settings for ExitOnError within a batch in the ExMeX Companion App.

From version 0.3.0 it is possible to configure the global variable with the ExMeX Companion App.

ExMeX Framework Core Scripts

Not available.

ExMeX Framework Core Global Variable configuration

To configure this variable, the data modeler or DevOp has to execute the stored procedure MetadataZoneCore.uspConfigExMeXFramework within the ExMeX Framework Core to update the global config variable.

EXEC MetadataZoneCore.uspConfigExMeXFramework
     @VariableName         = 'ExitOnError'
    ,@VariableValue        = '0' 
    ,@VariableDatatype     = 'BIT'   
    ,@VariableDescription  = 'If an error occurs in the batch, the batch is terminated or the error is ignored and the batch continues; 0 = Off, 1 = Exit' 
	

The current settings of all global variables can be checked with the following DML:

SELECT *
FROM MetadataZone.ConfigExMeXFramework

Posted 1 year agoby Dirk.Lerner

How to use variable MaxSqlCommandInChunk?

Valid as of release 2.12.0 How a batch behaves within a data pipeline in the case of executing parallel LPS…

Valid as of release 2.12.0

How a batch behaves within a data pipeline in the case of executing parallel LPS can be changed by the variable MaxSqlCommandInChunk.

MaxSqlCommandInChunk: Defines the max number of LPS which are executed in parallel.

Mode of action according to priority

1stVariable configured in the batch - via the ExMeX Companion App

2ndGlobally configured variables - during installation

ExMeX Companion App

With the ExMeX Companion App, the data modeler can configure individual batch settings for ExitOnError within a batch in the ExMeX Companion App.

From version 0.3.0 it is possible to configure the global variable with the ExMeX Companion App.

ExMeX Framework Core Scripts

Not available.

ExMeX Framework Core Global Variable (during installation)

To configure this variable, the data modeler or DevOp has to execute the stored procedure MetadataZoneCore.uspConfigExMeXFramework within the ExMeX Core Framework to update the global config variable.

EXEC MetadataZoneCore.uspConfigExMeXFramework
     @VariableName         = 'MaxSqlCommandInChunk' 
    ,@VariableValue        = '10' 
    ,@VariableDatatype     = 'INT'
    ,@VariableDescription  = 'Defines the max number of LPS which are executed in parallel.'

The current settings of all global variables can be checked with the following DML:

SELECT *
FROM MetadataZone.ConfigExMeXFramework
Posted 1 year agoby Dirk.Lerner

Global & LPS Variable Configuration

How do global or LPS variables affect LPS execution?

The global and LPS configuration variables listed in the following sections directly or indirectly influence the execution of any LPS.…

The global and LPS configuration variables listed in the following sections directly or indirectly influence the execution of any LPS. In general, the global variables shown in this section affect all LPSs.

An overview of the global variables and their current configuration is provided by the view MetadataZone.ConfigExMeXFramework or in the Getting Started -> Configuration section.

Mode of action according to priority

1stGlobally configured variables are used during LPS generation

2ndGlobally configured variables are applied during LPS execution

Posted 1 year agoby admin.tedamoh

How to use variable UseTargetEntityDB?

Valid as of release 2.12.0 The database on which an LPS is executed can be set via the variable UseTargetEntityDB.…

Valid as of release 2.12.0

The database on which an LPS is executed can be set via the variable UseTargetEntityDB.

If UseTargetEntityDB = 0 (OFF), the LPS is executed on the database in which the ExMeX Framework Core is installed.

If UseTargetEntityDB = 1 (ON), the LPS is executed on the database in which the table of the LPS to be loaded is located. The LPS is preceded by

USE [<TargetDatabaseName>].

Mode of action according to priority

1stGlobally configured variable UseTargetEntityDB applied during generation of LPS. No ad-hoc changes are yet possible!

Applicable

The variable affects all Data Vault load processes (Hub, Link and Satellites) as well as LPSs like CDC-, One-To-On- or References-LPS generated by the ExMeX Framework Core.

The variable does not affect ETL/ELT packages that are outside the responsibility of the ExMeX Framework Core and are "only" called by the ExMeX Framework Core. For example, SSIS or SSAS packages/scripts.

ExMeX Companion App

From version 0.3.0 it is possible to configure the global variable with the ExMeX Companion App.

ExMeX Companion App Config configuration Window UseTargetEntityDB

ExMeX Framework Core Scripts

Not available.

ExMeX Framework Core Global Variable configuration

To configure this variable, the data modeler or DevOp has to execute the stored procedure MetadataZoneCore.uspConfigExMeXFramework within the ExMeX Framework Core to update the global config variable.

EXEC MetadataZoneCore.uspConfigExMeXFramework
     @VariableName         = 'UseTargetEntityDB'
    ,@VariableValue        = '0' 
    ,@VariableDatatype     = 'BIT'   
    ,@VariableDescription  = 'Switch functionality to enable or disable usage of SQL-Statement USE [<TargetDatabaseName>] within all LPS. 0 = Off, 1 = On' 	

The current settings of all global variables can be checked with the following DML:

SELECT *
FROM MetadataZone.ConfigExMeXFramework

Posted 1 year agoby admin.tedamoh

How to use variable FKConstraintOnSatelliteLPS and FKConstraintOnLinkLPS?

Valid as of release 2.12.0 Whether an LPS deactivates existing RI constraints in the database for the target database object…

Valid as of release 2.12.0

Whether an LPS deactivates existing RI constraints in the database for the target database object (Link or Satellite) before loading and reactivates them after execution can be set via the variable FKConstraintOnSatelliteLPS or FKConstraintOnLinkLPS.

If FKConstraintOnSatelliteLPS = 0 (OFF), the LPS "ignores" all RI constraints on Satellites. This means that if RI constraints are installed, they remain active during loading.

If FKConstraintOnSatelliteLPS = 1 (ON), the LPS "deactivates" all RI constraints on the Satellite before loading if the other conditions of the global variables DayUsedForAvgFkCheck and RowsMaxInsertedFkCheck are fulfilled. This means that if RI constraints are installed, they are set to inactive during loading and only reactivated after loading.

If FKConstraintOnLinkLPS = 0 (OFF), the LPS "ignores" all RI constraints on Links. This means that if RI constraints are installed, they remain active during loading.

If FKConstraintOnLinkLPS = 1 (ON), the LPS "deactivates" all RI constraints on the Link before loading if the other conditions of the global variables DayUsedForAvgFkCheck and RowsMaxInsertedFkCheck are fulfilled. This means that if RI constraints are installed, they are set to inactive during loading and only reactivated after loading.

DayUsedForAvgFkCheck defines the number x (1-255) to receive average rows written for a given LPS within the last x days, using statistic data of the ExMeX Logging Framework.

RowsMaxInsertedFkCheck is the threshold value on average rows written within the last x days (defined by global variable DayUsedForAvgFkCheck) to decide whether a RI constraint should be deactivated or not. This threshold value is determined each time the LPS is executed!

Example: On average over the last 10 days (DayUsedForAvgFkCheck) more than 200,000 (RowsMaxInsertedFkCheck) rows were loaded for the currently executed link LPS (FKConstraintOnLinkLPS = 1 (ON)), then all RI constraint on the link are deactivated.
If, on the other hand, less than 200,000 (RowsMaxInsertedFkCheck) rows have been loaded on average over the last 10 days (DayUsedForAvgFkCheck) for the currently executed link LPS (FKConstraintOnLinkLPS = 1 (ON)), then all RI constraint remain activated on the link.

Mode of action according to priority

1stGlobally configured variables FKConstraintOnLinkLPS and FKConstraintOnSatelliteLPS are applied during generation of LPS. No ad-hoc changes are yet possible!

Applicable

The variables affects all Data Vault load processes (Link and Satellites, which may be affected by RI-Constraints) generated by the ExMeX Framework Core.

The variable does not affect ETL/ELT packages that are outside the responsibility of the ExMeX Framework Core and are "only" called by the ExMeX Framework Core. For example, SSIS or SSAS packages/scripts.

ExMeX Companion App

From version 0.3.0 it is possible to configure the global variable with the ExMeX Companion App.

ExMeX Companion App Config configuration Window FKConstraintOnSatelliteLPS

ExMeX Framework Core Scripts

Not available.

ExMeX Framework Core Global Variable configuration

To configure this variable, the data modeler or DevOp has to execute the stored procedure MetadataZoneCore.uspConfigExMeXFramework within the ExMeX Framework Core to update the global config variable.

EXEC MetadataZoneCore.uspConfigExMeXFramework
     @VariableName         = 'FKConstraintOnSatelliteLPS' 
    ,@VariableValue        = '0' 
    ,@VariableDatatype     = 'BIT'
    ,@VariableDescription  = 'Switch functionality to enable or  FK-Constraint checks on Satellite (Child) to Hub/Link (Parent). 0 = Off, 1 = On'

EXEC MetadataZoneCore.uspConfigExMeXFramework
     @VariableName         = 'FKConstraintOnLinkLPS'
    ,@VariableValue        = '0'    
    ,@VariableDatatype     = 'BIT'
    ,@VariableDescription  = 'Switch functionality to enable or  FK-Constraint checks on Link (Child) to Hub (Parent). 0 = Off, 1 = On'

EXEC MetadataZoneCore.uspConfigExMeXFramework
     @VariableName         = 'DayUsedForAvgFkCheck'   
    ,@VariableValue        = '10'
    ,@VariableDatatype     = 'TINYINT' 
    ,@VariableDescription  = 'Defines the number x (1-255) to receive average rows written for a given LPS within the last x days, using statistic data of the ExMeX Logging Framework.'

EXEC MetadataZoneCore.uspConfigExMeXFramework
     @VariableName         = 'RowsMaxInsertedFkCheck'     
    ,@VariableValue        = '200000'       
    ,@VariableDatatype     = 'INT' 
    ,@VariableDescription  = 'Threshold value on average rows written within the last x days  (defined by global variable DayUsedForAvgFkCheck) to decide whether a FK should be deactivated or not.'

The current settings of all global variables can be checked with the following DML:

SELECT *
FROM MetadataZone.ConfigExMeXFramework
Posted 1 year agoby admin.tedamoh

Batch & LPS Execution

No questions yet.