How to do?

How to override the 'DeltaExportTimestamp' for any Load Process Step?

Purpose The purpose to override the value in column DeltaExportAssertionTimeFrom (in table MetadataForExecuteDataLogisticLoadProcessStep) for a Load Process Step (LPS) —at…
Purpose

The purpose to override the value in column DeltaExportAssertionTimeFrom (in table MetadataForExecuteDataLogisticLoadProcessStep) for a Load Process Step (LPS) —at execution time in data logistics— is to load load a different time period than provided by the ExMeX Framework Core.

Motivation

The motivation for overriding the extract time period is driven by different use cases but not limited to these:

  • Set a timestamp to perform a LPS initial load from a source to one or many target object(s)
  • Set a timestamp to perform a LPS reload of a defined time period from a source to one or many target object(s) in the case of errors or other issues
  • Set a different timestamp for a LPS to ensure an extract of a slightly different time period from a source which is not reliable
Applicability

The override functionality is mainly applicable within a (delta) data load from any source into the staging layer. Prerequisite is timestamped data in the source to identify the data in the defined extract time period.

Content

The stored procedure, which is used to override the value in column DeltaExportAssertionTimeFrom (in table MetadataForExecuteDataLogisticLoadProcessStep) is provided by the ExMeX Framework Core and is named uspSetDeltaExportTimePeriod. For more details about how to use the stored procedure go to section Framework in Depth -> Stored Procedure -> usp - Set Delta Export Time Period

A batch, which contains all LPS to override the DeltaExportAssertionTimeFrom timestamp can be submitted to the procedure as a value.

A single LPS within a batch, to override the DeltaExportAssertionTimeFrom timestamp can be submitted to the procedure as values.

The LPS’s, which received a new timestamp by this stored procedure, can use the newly set value in column DeltaExportAssertionTimeFrom during execution. This information is provided by the DAPI MetadataForExecuteDataLogisticLoadProcessStep.

Considerations and consequences

The newly set override timestamp is only used within the ExMeX Framework Core, if the stored procedure uspProcessStepVapourTrail is executed in the correct way. For more details about how to use the stored procedure go to section Framework in Depth -> Stored Procedure -> usp - Process Step Vapour Trail

Override a timestamp of a batch, in column DeltaExportAssertionTimeFrom in table MetadataForExecuteDataLogisticBatch is not possible.

Example

The following code snippet shows how a timestamp for LPS 1256 can be changed:

Override timestamp for a single LPS

-- Execute stored procedure
EXEC MetadataZone.uspSetDeltaExportTimePeriod 
  @BatchLabel                   = 'Example Hub (Core)'
 ,@LoadProcessStepId            = 1256
 ,@DeltaExportAssertionTimeFrom = '2020-01-01 10:04:46.5187419'
 ;

Double check, if any override timestamps applied for a single LPS

This SQL can be used to add a check of override timestamps to your operational reports.

WITH 
cteRecentOverride AS (
    SELECT MAX(AuditTrailId) AS MaxAuditTrailId
          ,LoadBatchId
          ,ProcessStepId
    FROM MetadataZone.SatProcessStepDeltaExportOverride
    GROUP BY LoadBatchId
            ,ProcessStepId
)
SELECT spsdeo.AuditTrailId
      ,spsdeo.LoadBatchId
      ,spsdeo.DeltaExportAssertionTimeFrom
      ,spsdeo.DeltaExportAssertionTimeBefore
,spsdeo.AssertionTimeFrom FROM MetadataZone.SatProcessStepDeltaExportOverride spsdeo INNER JOIN cteRecentOverride ctero ON spsdeo.AuditTrailId = ctero.MaxAuditTrailId WHERE ctero.ProcessStepId = 1256
Posted 2 years agoby admin.tedamoh

How to restart a batch in the case of a failure?

Purpose In case a Load Process Step (LPS), and therefore a batch, in the ExMeX framework fails during loading, it…
Purpose

In case a Load Process Step (LPS), and therefore a batch, in the ExMeX framework fails during loading, it is necessary to know what and which steps to do.

Motivation

If, for example, more than 1,000 LPS are orchestrated in 45 batches, and a few of them have errors (e.g. duplicates, domain violation) then it is not necessary to run all LPS again. With a Restart it is possible to restart only a) failed or b) not yet executed LPS.

Applicability

A restart is typically required when an LPS or batch has failed. Usually this happens shortly after a rollout of a new artifact or when data in the sources changes and therefore does not fit into the target domain anymore. Another reason can be a technical breakdown of the underlying technology (e.g. restarting a server without notice). Thus, a restart is generally necessary during operation of the data warehouse solution.

Content

For more details about the usp uspBatchExecuteLoadProcessStep read the topic Stored Procedure.

By executing the usp in restart mode with the passing of the batch to be executed a second time (BatchLabel) and the corresponding BatchRunVapourTrailId, it determines all LPS that failed in this batch (and sub-batches) and have not yet been executed. Before executing this LPS, usp opens the batches with the faulty or not yet executed LPS again. In case a batch has not yet been executed (e.g. due to a batch abort caused by a server restart) the usp opens a new batch. This approach ensures that all LPS that have failed or have not yet been executed are assigned to their original batch and that a complete picture of the process flow and runtimes is obtained during evaluations.

Considerations and consequences

this section is meant to offer some alternative views and experiences as to what it means to take a certain decision. What are the known impacts?

Sample
Detect failed batches

The easiest way to detect failed batches is to execute the following SQL statement. If there are rows printed in the result set than there are a) still running or b) failed batches.

SELECT mfedlb.*
FROM MetadataZone.MetadataForExecuteDataLogisticBatch mfedlb
INNER JOIN MetadataZone.BatchHierarchy bh
    ON mfedlb.BatchId = bh.LoadBatchId
INNER JOIN MetadataZone.LoadBatch lb
    ON bh.LoadBatchIdMaster = lb.LoadBatchId
WHERE lb.BatchLabel = '<YourMasterBatch>'
AND mfedlb.BatchVapourTrailId IS NOT NULL
 
Detect failed LPS

To identified failed LPS, use the provided script ./DataLogisticStatistic.orig.sql in your root directory. It is recommended to copy the file and remove "orig." bevor edit it. All LPS with a ProcessStatusNumber not equal 0 (zero) are failed or already checked (999,999).

Or use the following easy going SQL statement:

SELECT psvt.*
FROM MetadataZone.ProcessStepVapourTrail psvt
INNER JOIN MetadataZone.MetadataForExecuteDataLogisticBatch mfedlb
    ON psvt.BatchRunVapourTrailId = mfedlb.BatchVapourTrailId
INNER JOIN MetadataZone.BatchHierarchy bh
    ON mfedlb.BatchId = bh.LoadBatchId
INNER JOIN MetadataZone.LoadBatch lb
    ON bh.LoadBatchIdMaster = lb.LoadBatchId
WHERE lb.BatchLabel = '<YourMasterBatch>'
AND mfedlb.BatchVapourTrailId IS NOT NULL
AND psvt.ProcessStatusNumber <> 0
 
Check and fix it

Only check on LPS associated to your master batch and sub-batches. Keep in mind that there can be other master batches active!

Before restarting a failed batch, all failed LPS has to be fixed, either DDL or data, and the batches in the data logistics metadata has to be updated accordingly:

UPDATE MetadataZone.BatchRunVapourTrail 
SET
     BatchStatusNumber  = 999999
    ,BatchEndDateTime   = COALESCE(BatchEndDateTime,SYSUTCDATETIME())
    ,BatchStatusMessage = 'Fixed by ' + CURRENT_USER + ': ' + BatchStatusMessage
WHERE BatchRunVapourTrailId >= 1244
AND BatchStatusNumber NOT IN (0,999999)
;

 

You can also update the LPS in the same way, but you do not need to:

UPDATE MetadataZone.ProcessStepVapourTrail 
SET 
     ProcessStatusNumber  = 999999 
    ,ProcessEndDateTime   = COALESCE(ProcessEndDateTime,SYSUTCDATETIME())
    ,ProcessStatusMessage = 'Fixed by ' + CURRENT_USER + ': ' + ProcessStatusMessage 
WHERE BatchRunVapourTrailId >= 1244
AND ProcessStatusNumber NOT IN (0,999999) 
;

 

To fix this execute the following SQL statement:

UPDATE spsde
SET
    AssertionTimeBefore = SYSUTCDATETIME()
FROM MetadataZone.SatProcessStepDeltaExport spsde
INNER JOIN MetadataZone.ProcessStepVapourTrail psvt
    ON  spsde.ProcessStepId       = psvt.ProcessStepId
    AND spsde.AssertionTimeBefore = '2999-12-31 00:00:00.0000000'
INNER JOIN MetadataZone.MetadataForExecuteDataLogisticBatch mfedlb
    ON psvt.BatchRunVapourTrailId = mfedlb.BatchVapourTrailId
INNER JOIN MetadataZone.BatchHierarchy bh
    ON mfedlb.BatchId = bh.LoadBatchId
INNER JOIN MetadataZone.LoadBatch lb
    ON bh.LoadBatchIdMaster = lb.LoadBatchId
WHERE lb.BatchLabel = '<YourMasterBatch>'
AND mfedlb.BatchVapourTrailId IS NOT NULL
AND psvt.ProcessStatusNumber <> 0
 
Housekeeping before restart

In the case of an uncontrolled failure of a LPS it could happen that information which are needed for delta data export are not properly ended. To check if there are any rows in a failure state execute the SQL statement:

SELECT spsde.*
FROM MetadataZone.SatProcessStepDeltaExport spsde
INNER JOIN MetadataZone.ProcessStepVapourTrail psvt
    ON  spsde.ProcessStepId       = psvt.ProcessStepId
    AND spsde.AssertionTimeBefore = '2999-12-31 00:00:00.0000000'
INNER JOIN MetadataZone.MetadataForExecuteDataLogisticBatch mfedlb
    ON psvt.BatchRunVapourTrailId = mfedlb.BatchVapourTrailId
INNER JOIN MetadataZone.BatchHierarchy bh
    ON mfedlb.BatchId = bh.LoadBatchId
INNER JOIN MetadataZone.LoadBatch lb
    ON bh.LoadBatchIdMaster = lb.LoadBatchId
WHERE lb.BatchLabel = '<YourMasterBatch>'
AND mfedlb.BatchVapourTrailId IS NOT NULL
AND psvt.ProcessStatusNumber <> 0
 
Restart <YourMasterBatch>

In practice, any batch can be restarted. It is generally recommended to restart the master batch or the batch that was originally executed. The easiest way is to use the provided script ./Manual Batch.orig.sql in your root directory. It is recommended to copy the file and remove "orig." bevor edit it. Or just use the following SQL statement:

EXEC MetadataZone.uspBatchExecuteLoadProcessStep 
  @BatchLabel                   = '<YourMasterBatch>'
 ,@BatchExec                    = 'E'
 ,@BatchRestartRunVapourTrailId = 10456;
 

 

Done. Wait until everything is finished.

Posted 2 years agoby Dirk.Lerner

How to execute SSIS packackes by ExMeX Framework?

TEDAMOH ExMeX Framework version: 2.9.0 or greater. Purpose With this generator template it is possible to execute SSIS packets with…

TEDAMOH ExMeX Framework version: 2.9.0 or greater.

Purpose

With this generator template it is possible to execute SSIS packets with the stored procedure uspBatchExecuteLoadProcessStep, and thus orchestrated in batches.

Motivation

With this generator template, it is possible to load data from various sources that cannot be connected via T-SQL into the stage layer of the data warehouse using the ETL tool Microsoft SSIS.

Alias

-

Applicability

In the solution design this generator template is used to load external data iinto the data warehouse stage layer.

Content

To orchestrate the scripts generated by the SSIS generator template within a batch associate all data models/LoadProcessSteps (aka Mappings) with ProcessStepType = 'SSIS' in config file ConfigLoadBatchProcessStepLogistic.[orig.]sql, see example below. Environment specific settings for calling a SSIS package are done in the environment config file SetEnvSourceInterface.[orig.]sql, located in /etc/env, see example below.

Considerations and consequences

The provided generator template is a basic pattern to call SSIS packeges. It may be necessary to add additional parameter to call a SSIS package. The orig generator template (SSIS-Stage.orig.sql) is stored in directory \library\generate\Miscellaneous\mssqlserver. To modify the generator template edit the file SSIS-Stage.sql

Sample

The following steps describe how to enable SSIS generator template usage in a batch.

1) Change script /etc/env/SetEnvSourceInterface.sql as described below. See also /etc/env/SetEnvSourceInterface.orig.sql.

/*+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
/* Define SSIS environment                                                   */
/*+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
SET @SSISProjectName   = N'<YourProjectName'
SET @SSISFolderName    = N'<YourFolder>'
SET @SSISLinkedServer  = N'<YourLinkedSSISServer>'

2) Change script /etc/config/ConfigLoadBatchProcessStepLogistic.sql as described below. See also /etc/config/ConfigLoadBatchProcessStepLogistic.orig.sql.

REPLACE

        INNER JOIN (
                SELECT ProcessStepTypeId
                FROM MetadataZone.ProcessStepType
                WHERE ProcessStepType = 'DEF'

WITH

        INNER JOIN (
                SELECT ProcessStepTypeId
                FROM MetadataZone.ProcessStepType
                WHERE ProcessStepType = SSIS'

3. Reinstall Metadata!

Posted 2 years agoby Dirk.Lerner

How to update HashDiff column after satellite split?

Within the TEDAMOH ExMeX Framework a tool is provided to Update HashDiff of Satellite. All tools are located in the…

Within the TEDAMOH ExMeX Framework a tool is provided to Update HashDiff of Satellite.

All tools are located in the folder \library\tool.

Purpose

After splitting a satellite into at least two satellite a recalculation of the HasDiff column is needed.

Motivation

If the HasDiff column is not recalculated with the reduced number of context columns the LoadProcessStep for HashDiff-Satellites will detect all rows as changed. Even if there where no changes. The reason is, that the HashDiff column was until the split of the satellite calculated with more context columns.


Alias

-

Applicability

After splitting a satellite and the data was refilled.

Content

The tool detects all HashDiff satellites which are stored in the metadata. Then it generates the update schripts for the satellites and update all HashDiff columns in the detected satellites. At the end the tool does housekeeping and cleanup all generated scripts.

Considerations and consequences

Since the tool is in an early state it works synchronous and therefore it may take some time. Depending on the amount of satellites and contained rows.

How to execute the tool?
  1. Go to the directory \library\tool\Update HashDiff of Satellite\.
  2. Open the file "Manual Batch Update Hashdiff.sql" in the SSMS.
  3. Adjust path :setvar InitPath
  4. Execute and wait
  5. Query data logistics for batch 'Default - No specific batch load' with file DataLogisticStatistic.orig.sql to monitor ongoing execution.
  6. Done
Posted 2 years agoby Dirk.Lerner

How to add a Load Process Step for external logging purposes?

Purpose If the ExMeX Logging Framework should log Load Process Steps (LPS) from external (Non ExMeX Framework Core) Motivation The…
Purpose

If the ExMeX Logging Framework should log Load Process Steps (LPS) from external (Non ExMeX Framework Core)

Motivation

The motivation behind is to enable up- or downstream processes, e.g. populating other application like SSAS, TM1 or Exasol, to log their steps whitin the ExMeX Logging Framework. Also to enable logging steps within SQL Stored Procedures which are not generated by the ExMeX Framework Core but part of the data logistics. This can be stored procedures to maintain PIT or Bridge tables.

If it is necessary to have an overall view of all loading processes (end-to-end) of a data warehouse solution, it is not easily possible to obtain this overall view.

With the LPS Side Injection Tool the usage of ExMeX Logging Framework from outside the ExMeX Framework Core is now able - all processes with a data warehouse solutiona are no able to log their steps.

Applicability

The LPS Side Injection Tool serves the capability to enables an external logging of LPS within a data warehouse solution. It is not intended as an abbreviation for modelling tasks.
LPS created with the LPS Side Injection Tool do not generate loading scripts in the ExMeX Framework Core. They are used for the purpose of logging only.

Therefore the most common use case for the LPS Side Injection Tool is to log external Load Process Steps.

Prerequisits

The following prerequisits are necessairy:

  • At least Version 2.10.3 of ExMeX Framework Core has to be installed.
  • Download the LPS Side Injection Tool in the Download Section and follow the instructions
  • A batch in MetadataZone.LoadBatch is available which contains LPS Side Injections
  • The external load process steps has to use the stored procedures uspBatchVapourTrail (for logging batches) and uspProcessStepVapourTrail (for logging LPS)

Example

  • Open script ./library/tool/LPS Side Injection/LPSSideInjection.sql in SSMS
  • Change the following lines (73 anf following) to add a LPS Side Injection
    --#-----------------------------------------------------------------------------
    --# Edit Insert below
    --# Recommendation: DO NOT INSERT more than 1 (one) row
    --#-----------------------------------------------------------------------------
    
    INSERT INTO @LPSSideInjection
        -- Source
        VALUES (
                -- Model, add a fictitious model
                'LPSSideInjection','LPS Side Injection'
                -- Add a fictitious Source Entity
                ,'<YourFictitiousSourceEntityCode>','<YourFictitiousSourceEntityName>','<YourFictitiousSourceDatabase>','<YourFictitiousSourceSchema>'
                -- Add a fictitious Target Entity
                ,'<YourFictitiousTargetEntityCode>','<YourFictitiousTargetEntityName>','<YourFictitiousTargetDatabase>','<YourFictitiousTargetSchema>'
                -- Batch to which LPS is associated, must exist in table MetadataZone.LoadBatch
                ,'<YourBatchLabel>'
                )
    ;
    
    --##############################################################################
    --#-----------------------------------------------------------------------------
    --# DO NOT EDIT BELOW
    --#-----------------------------------------------------------------------------
  • Execute LPS Side Injection Tool in SSMS in SQLCMD Mode.
  • The following SQL statement shows the injected LPS in the ExMeX Logging Framework:
    SELECT *
    FROM MetadataZone.MetadataForExecuteDataLogisticLoadProcessStep
    WHERE ProcessStepType = 'LPSSI'
    
Considerations and consequences

Stay always with a fictitious model like shown in the above code snippet ('LPSSideInjection','LPS Side Injection'). Otherwise it could happen that you loose all LPS Side Injections after an upcoming Metadata Installation into the ExMeX Framework Core!

 

Posted 1 year agoby Dirk.Lerner

Housekeeping

In the topic Housekeeping all questions about housekeeping are covered. Housekeeping is introduced with version 2.10.0 of the TEDAMOH ExMeX Framework.

How to set the amount of data kept in history tables, aka housekeeping?

Purpose If the data pipeline into the data warehouse stores data in the History Zone as simple historicized tables, this…
Purpose

If the data pipeline into the data warehouse stores data in the History Zone as simple historicized tables, this data should only be stored for a limited period of time or number of loading runs.

Motivation

The motivation behind the so-called housekeeping is to keep the amount of data in the History Zone to a minimum. If it is necessary to store data for a longer period of time, then an Active Archive Zone is a good choice.

Applicability

The housekeeping is designed to keep only a certain amount of data (days, months or loading slices) in a table. In this way, housekeeping enables the amount of data to be kept as small as possible but as large as necessary, e.g. in the Hist-tables or a PSA.

Content

Housekeeping is a two-stage procedure.

First Stage - Default in Framework

A generally valid default for housekeeping is to be made in the global variable configuration settings of the ExMeX Framework Core. For more information on how to change global variables go to section Getting Started -> Configuration.

The possible values and global variables are listed in the following table. You can get the current settings of these variables with the shown DML.

SELECT *
FROM MetadataZone.ConfigExMeXFramework
WHERE VariableName IN (
     'HousekeepingHiZUnit'
    ,'HousekeepingHiZAmount'
    ,'HousekeepingHiZAttributeCode'
)
Variable Name Value
 HousekeepingHiZUnit

Possible valid units are:

'DAY'
'WEEK'
'MONTH'
'QUARTER'
'YEAR'
'ProcessStepRun'

 HousekeepingHiZAmount

Possible values are positive integer numbers:

= 0: keep history to infinity.
> 0: keep amount of history. Remove older data.

 HousekeepingHiZAttributeCode  On which global available column the housekeeping should be done.

Example

Configuration

HousekeepingHiZAmount: 3
HousekeepingHiZUnit: ProcessStepRun
HousekeepingHiZAttributeCode: AuditTrailId

This means that

Keep last 3 load slices (aka Load Process Steps) and use column AuditTrailId. Delete all older load slices:

AuditTrailId
17799
17720
17641
17562
17483
17404
17327

Second Stage - Ovveride defaults

It is possible to overwrite the default housekeeping settings for a table individually. The easiest way to do this is to use the TEDAMOH ExMeX Companion App in the filter settings.

ExMeX Companion App Filter configuration Window Housekeeping

Or with SQL by using the usp uspEntityClassAttributeFilter. For more details about the Stored Procedure uspEntityClassAttributeFilter go to section Framework Core In Depth -> Stored Procedures.

Example

Configuration

Amount: 5 (Override existing default value)
Unit: ProcessStepRun
Table: StgCleverHiZ
Column: AuditTrailId

SQL to call usp

-- Default (old) setting
SELECT TargetModelCode
      ,TargetDatabaseName
      ,TargetSchemaName
      ,TargetEntityClassCode
      ,TargetAttributeCode
      ,HousekeepingAmount
      ,HousekeepingUnit
FROM MetadataZone.MetadataForGenerateSource2StageLayer
WHERE TargetEntityClassCode = 'StgCleverHiZ'
;

-- Override settings
DECLARE @EntityClassFilterCondition VARCHAR(2048); EXEC MetadataZone.uspEntityClassAttributeFilter @ModelCode = 'StageLayerHiZ' ,@EntityClassCode = 'StgCleverHiZ' ,@AttributeCode = 'AuditTrailId' ,@EntityClassAttributeFilterCondition = 'LAST' ,@EntityClassAttributeFilterUnit = 'ProcessStepRun' ,@EntityClassAttributeFilterValue = '5' ,@IsHousekeeping = 1 ,@EntityClassFilterCondition = @EntityClassFilterCondition OUTPUT ;
SELECT @EntityClassFilterCondition;
-- New setting SELECT TargetModelCode ,TargetDatabaseName ,TargetSchemaName ,TargetEntityClassCode ,TargetAttributeCode ,HousekeepingAmount ,HousekeepingUnit FROM MetadataZone.MetadataForGenerateSource2StageLayer WHERE TargetEntityClassCode = 'StgCleverHiZ'

This means that

Keep last 5 load slices (aka Load Process Steps) in table StgCleverHiZ and use column AuditTrailId. Delete all older load slices:

AuditTrailId
17799
17720
17641
17562
17483
17404
17327

Considerations and consequences

It is possible to set housekeeping definitions on more than one column. This ist NOT recommended. But if used, than make sure the housekeeping process can handle more than one column for housekeeping.

How to use and implement the housekeeping definitions, see the next How-To.

Posted 2 years agoby Dirk.Lerner

How to use DAPI information for a housekeeping process?

See also previous How-To: How to set the amount of data kept in history tables, aka housekeeping? Purpose If the…

See also previous How-To: How to set the amount of data kept in history tables, aka housekeeping?

Purpose

If the data pipeline into the data warehouse stores data in the History Zone as simple historicized tables, this data should only be stored for a limited period of time or number of loading runs.

Motivation

The motivation behind the so-called housekeeping is to keep the amount of data in the History Zone to a minimum. If it is necessary to store data for a longer period of time, then an Active Archive Zone is a good choice.

Applicability

The housekeeping is designed to keep only a certain amount of data (days, months or loading slices) in a table. In this way, housekeeping enables the amount of data to be kept as small as possible but as large as necessary, e.g. in the Hist-tables or a PSA.


This question focus on how to use the information of housekeeping within a housekeeping process.

Content

To use information for housekeeping the DAPI MetadataForGenerateSource2StageLayer in Schema MetadataZone has to be used. All information about which and how many data to keep are stored in two additional columns of the DAPI:

  • HousekeepingAmount - How many data to keep
  • HousekeepingUnit - Unit of data to keep
HousekeepingUnit

Possible valid units are:

'DAY'
'WEEK'
'MONTH'
'QUARTER'
'YEAR'
'ProcessStepRun'

HousekeepingAmount

Possible values are positive integer numbers:

= 0: keep history to infinity.
> 0: keep amount of history. Remove older data.

The values are associated to the tablecolumn which drives the housekeeping.

Example (ProcessStepRun)

Doing housekeeping for

Table: StgCleverHiZ
Column: AuditTrailId
Unit: ProcessStepRun

SQL to lookup on DAPI

-- Default (old) setting
SELECT TargetModelCode
      ,TargetDatabaseName
      ,TargetSchemaName
      ,TargetEntityClassCode
      ,TargetAttributeCode
      ,HousekeepingAmount
      ,HousekeepingUnit
FROM MetadataZone.MetadataForGenerateSource2StageLayer
WHERE TargetEntityClassCode = 'StgCleverHiZ'
;

HowToUseHousekeepingInformation for Housekeeping - Here: Keep the last 3 ProcessStepRuns, aka load slices.

This means that

Keep last 3 load slices (aka Load Process Steps) in table StgCleverHiZ and use column AuditTrailId. Delete all older load slices:

AuditTrailId
17799
17720
17641
17562
17483
17404
17327

Example (DAY)

Doing housekeeping for

Table: StgCleverHiZ
Column: AuditTrailId (contains now a timestamp)
Unit: DAY

SQL to lookup on DAPI

-- Default (old) setting
SELECT TargetModelCode
      ,TargetDatabaseName
      ,TargetSchemaName
      ,TargetEntityClassCode
      ,TargetAttributeCode
      ,HousekeepingAmount
      ,HousekeepingUnit
FROM MetadataZone.MetadataForGenerateSource2StageLayer
WHERE TargetEntityClassCode = 'StgCleverHiZ'
;

HowToUseHousekeepingInformation for Housekeeping - Here: Keep the last 10 days.

This means that

Keep last 10 days in table StgCleverHiZ and use column AuditTrailId, which contains a timestamp, to identify data for housekeeping. Delete all older dates:
The following applies NOW() = 2020-07-30

AuditTrailId
2020-07-30
2020-07-29
2020-07-[...]
2020-07-22
2020-07-21
2020-07-20
2020-07-19
2020-07-18
2020-07-[...]

Considerations and consequences

It is possible to set housekeeping definitions on more than one column. This is NOT recommended. But if used, than make sure the housekeeping process can handle more than one column for housekeeping.

 

Posted 2 years agoby Dirk.Lerner

Filtering data items

In the topic Filtering data items all questions about filtering source tables for Load Process Steps are covered. Filtering is introduced with version 2.10.0 of the ExMeX Framework Core. We highly recommend to use the ExMeX Companion App to apply filters.

How to use DAPI information to filter data within Load Process Step (LPS)?

See also previous How-To: How to use DAPI information to filter data within Load Process Step (LPS)? Purpose If the…

See also previous How-To: How to use DAPI information to filter data within Load Process Step (LPS)?

Purpose

If the data pipeline into the data warehouse loads a huge amount of data, e.g. full loads, but only a small part of data is needed, filtering the data before export from any source table is the way to reduce the amount of data which has to be populated.

Motivation

The motivation behind filtering is to keep the amount of data as small as possible within the scope. This will increase loading performance and reduce space for storing (not needed) data. The influence on the performance of the data warehouse through less, unused data should not be underestimated. Reducing data to the bare minimum has a positive effect on backup/restore, query speed, the IO subsystem and much more.

Applicability

The filter feature serves first of all to reduce the amount of data that the data pipeline loads into the data warehouse to the data really needed in the scope. This can mean that an unfiltered data stream (100%) can be reduced to a fraction (<5%).

Therefore the most common use case is to apply the filter to columns of the incoming interface.

Content

To use the filter conditions the DAPI MetadataForGenerateSource2StageLayer in Schema MetadataZone has to be used. All information about which columns and how to filter are stored in two additional columns of the DAPI:

  • SourceAttributeFilterCondition - The filter condition itself
  • SourceAttributeFilterOuterClusterType - How to "connect" logical to other column filter conditions
HousekeepingUnit

The filter condition on the associated column

SourceAttributeFilterOuterClusterType

Possible values are:

AND
OR

Currently only AND is possible. To use OR some more information are needed. This will be available in an upcoming release.

Example

There are set some filters in the previous How-To: How to use DAPI information to filter data within Load Process Step (LPS)? on table: Lieferant

Let's check the filter set on Lieferant in the DAPI MetadataForGenerateSource2StageLayer:

-- Table with new filter
SELECT SourceModelCode
      ,SourceDatabaseName
      ,SourceSchemaName
      ,SourceEntityClassCode
      ,SourceAttributeCode
      ,SourceAttributeFilterOuterClusterType
      ,SourceAttributeFilterCondition
FROM MetadataZone.MetadataForGenerateSource2StageLayer
WHERE SourceEntityClassCode = 'Lieferant'
  AND SourceModelCode       = 'SourceModelA'
ORDER BY SourceAttributeOrder 

HowToUseFilterCheck on filter - result in DAPI.

How to interpret the data?

The data pipeline only populates data from the source table which fulfill the filter conditions:

Take data where the number in column LieferantNummer is greater or equal 2015 and any text in LieferantLand is either 'DE' or 'US'. Ignore all other data.

How do you now assemble the selection from the source?

Let's have a look at the result set:

How To Use Filter DAPIFilter Conditions - Here: Take data where LieferantNummer is greater or equal 2015 and LieferantLand is either 'DE' or 'US'

It doesn't matter right now in which order you assemble the filter together. For now take first no. 1, than no. 2. Vice versa is also fine. To select data from the source table just add the above shown filter conditions after the dummy condition 1 = 1 (so there is no need to care about how to us the column SourceAttributeFilterOuterClusterType) as shown in the following SQL-statement:

SELECT *
FROM Lieferant
-- Dummy condition
WHERE 1 = 1

-- Add SourceAttributeFilterOuterClusterType of (1)
-- Add SourceAttributeFilterCondition of TargetAttributeCode LieferantNummer(1)
AND ([LieferantNummer] >= 2015)

-- Add SourceAttributeFilterOuterClusterType of (2)
-- Add SourceAttributeFilterCondition of TargetAttributeCode LieferantLand (2)
AND ([LieferantLand] IN ('DE','US'))

That's it. Nothing more to do. Just copy and paste the pieces together.

The following table shows an example of how the data would be filtered:

LieferantNummer LieferantLand SomeOtherColumns Comment
7799 DE [...] Filter conditions are fulfilled
1727 FR [...] Both filter condition are not fulfilled.
7720 NR [...] Filter condition of LieferantLand is not fulfilled.
173 AC [...] Both filter condition are not fulfilled.
7641 US [...]  Filter conditions are fulfilled
122 AC [...] Both filter condition are not fulfilled.
1404 US [...] Filter condition of LieferantLand is not fulfilled.
Considerations and consequences

It is possible to set one or more filter conditions on more than one column within the table. If used, make sure the load process step can handle more than one filter condition on each column within the table.

  • Stay with the pattern to concat the filter conditions together!
Posted 2 years agoby Dirk.Lerner

How to enable filter on a (source) table to reduce data loaded into a (target) table?

Purpose If the data pipeline into the data warehouse loads a huge amount of data, e.g. full loads, but only…
Purpose

If the data pipeline into the data warehouse loads a huge amount of data, e.g. full loads, but only a small part of data is needed, filtering the data before export from any source table is the way to reduce the amount of data which has to be populated.

Motivation

The motivation behind filtering is to keep the amount of data as small as possible within the scope. This will increase loading performance and reduce space for storing (not needed) data. The influence on the performance of the data warehouse through less, unused data should not be underestimated. Reducing data to the bare minimum has a positive effect on backup/restore, query speed, the IO subsystem and much more.

Applicability

The filter feature serves first of all to reduce the amount of data that the data pipeline loads into the data warehouse to the data really needed in the scope. This can mean that an unfiltered data stream (100%) can be reduced to a fraction (<5%).

Therefore the most common use case is to apply the filter to columns of the incoming interface.

Content

It is possible to set one or more filter for each column of a table individually. The easiest way to do this is to use the TEDAMOH ExMeX Companion App in the filter settings.

ExMeX Companion App Filter configuration Window Initial

Or with SQL by using the usp uspEntityClassAttributeFilter. For more details about the Stored Procedure uspEntityClassAttributeFilter go to section Framework Core In Depth -> Stored Procedures.

Example

Let's do a simple example of setting a filter on a table of the incoming interface SourceModelA:

Table: Lieferant
Column: LieferantNummer
Filter Condition: =>
Value: 2015

This means that only suppliers which has a supplier number higher or equal 2015 are loaded into the Data Warehouse:

AuditTrailId
7799
7720
7641
122
173
1404
1727

The following SQL shows how to execute the stored procedure to set the filter. For more information how to execute the stored procedure read about usp - Entity Class Attribute Filter in section Stored Procedure.

-- Table without filter
SELECT SourceModelCode
      ,SourceDatabaseName
      ,SourceSchemaName
      ,SourceEntityClassCode
      ,SourceAttributeCode
      ,SourceAttributeFilterOuterClusterType
,SourceAttributeFilterCondition FROM MetadataZone.MetadataForGenerateSource2StageLayer WHERE SourceEntityClassCode = 'Lieferant' AND SourceModelCode = 'SourceModelA'
; -- Set filter DECLARE @EntityClassFilterCondition VARCHAR(2048); EXEC MetadataZone.uspEntityClassAttributeFilter @ModelCode = 'SourceModelA' ,@EntityClassCode = 'Lieferant' ,@AttributeCode = 'LieferantNummer' ,@EntityClassAttributeFilterCondition = '=>' ,@EntityClassAttributeFilterValue = '2015' ,@EntityClassAttributeFilterDatatype = 'INT' ,@EntityClassFilterCondition = @EntityClassFilterCondition OUTPUT ; SELECT @EntityClassFilterCondition;

Later, let's add another filter with the ExMeX Companion App:

ExMeX Companion App Filter configuration Window LIKE condition

Let's check the filter in the DAPI MetadataForGenerateSource2StageLayer:

-- Table with new filter
SELECT SourceModelCode
,SourceDatabaseName ,SourceSchemaName ,SourceEntityClassCode ,SourceAttributeCode ,SourceAttributeFilterOuterClusterType ,SourceAttributeFilterCondition FROM MetadataZone.MetadataForGenerateSource2StageLayer WHERE SourceEntityClassCode = 'Lieferant' AND SourceModelCode = 'SourceModelA'
ORDER BY SourceAttributeOrder

HowToUseFilterHow to set a filter - result in DAPI.

Considerations and consequences

It is possible to set one or more filter conditions on more than one column within the table. If used, make sure the load process step can handle more than one filter condition on each column within the table.


How to use and implement the set filter conditions, see the next How-To: How to use DAPI information to filter data within Load Process Step (LPS)? .

 

Posted 2 years agoby Dirk.Lerner

Tipps, Tricks And Fun

In the topic Tipps, Tricks And Fun all questions about other stuff, not related to the ExMeX Framework but interesting in conjunction with, are covered. Hopefully some of them are useful in your daily work.

How to Copy & Paste Load Process Step from SSMS?

Issue The character length of 65k characters for the Data Vault load scripts to be generated is too small. Test…

Issue

The character length of 65k characters for the Data Vault load scripts to be generated is too small.

Test

A test with a satellite with more than 400 columns, where the columns are 42 characters long on average, showed that the previous length of 65k characters is sufficient to generate the scripts.

Cause

The problem is not the generation itself, but that the SSMS "only" represents 65k characters per cell in the result set and cuts off the rest (see picture).

Solution

After loading the metadata, all generated Load Process Steps are stored in a database object: MetadataZone.BatchExecuteLoadProcessStep. These Load Process Steps can be extracted from the table using a command line. Execute the following command in the CMD or PowerShell:

bcp "SELECT * FROM [TEDAMOH-ExMeX-Develop].MetadataZone.BatchExecuteLoadProcessStep" QUERYOUT "C:\output.txt" -T -c -S .\SQLEXPRESS

More information about the bcp Utility in Microsoft Docs.

This writes the entire contents of the table to the output.txt file. Et voila, there is the complete statement.

Pitfall getting generated SQL for LPS out of table with SSMSPitfall getting generated SQL for LPS out of table with SSMS.

Posted 2 years agoby Dirk.Lerner