Housekeeping
In the topic Housekeeping all questions about housekeeping are covered. Housekeeping is introduced with version 2.10.0 of the TEDAMOH ExMeX Framework.
How to set the amount of data kept in history tables, aka housekeeping?
Purpose
If the data pipeline into the data warehouse stores data in the History Zone as simple historicized tables, this data should only be stored for a limited period of time or number of loading runs.
Motivation
The motivation behind the so-called housekeeping is to keep the amount of data in the History Zone to a minimum. If it is necessary to store data for a longer period of time, then an Active Archive Zone is a good choice.
Applicability
The housekeeping is designed to keep only a certain amount of data (days, months or loading slices) in a table. In this way, housekeeping enables the amount of data to be kept as small as possible but as large as necessary, e.g. in the Hist-tables or a PSA.
Content
Housekeeping is a two-stage procedure.
First Stage - Default in Framework
A generally valid default for housekeeping is to be made in the global variable configuration settings of the ExMeX Framework Core. For more information on how to change global variables go to section Getting Started -> Configuration.
The possible values and global variables are listed in the following table. You can get the current settings of these variables with the shown DML.
SELECT * FROM MetadataZone.ConfigExMeXFramework WHERE VariableName IN ( 'HousekeepingHiZUnit' ,'HousekeepingHiZAmount' ,'HousekeepingHiZAttributeCode' )
Variable Name | Value |
HousekeepingHiZUnit |
Possible valid units are: 'DAY' |
HousekeepingHiZAmount |
Possible values are positive integer numbers: = 0: keep history to infinity. |
HousekeepingHiZAttributeCode | On which global available column the housekeeping should be done. |
Example
Configuration
HousekeepingHiZAmount: 3
HousekeepingHiZUnit: ProcessStepRun
HousekeepingHiZAttributeCode: AuditTrailId
This means that
Keep last 3 load slices (aka Load Process Steps) and use column AuditTrailId. Delete all older load slices:
AuditTrailId
17799
17720
17641
17562
17483
17404
17327
Second Stage - Ovveride defaults
It is possible to overwrite the default housekeeping settings for a table individually. The easiest way to do this is to use the TEDAMOH ExMeX Companion App in the filter settings.
Or with SQL by using the usp uspEntityClassAttributeFilter. For more details about the Stored Procedure uspEntityClassAttributeFilter go to section Framework Core In Depth -> Stored Procedures.
Example
Configuration
Amount: 5 (Override existing default value)
Unit: ProcessStepRun
Table: StgCleverHiZ
Column: AuditTrailId
SQL to call usp
-- Default (old) setting SELECT TargetModelCode ,TargetDatabaseName ,TargetSchemaName ,TargetEntityClassCode ,TargetAttributeCode ,HousekeepingAmount ,HousekeepingUnit FROM MetadataZone.MetadataForGenerateSource2StageLayer WHERE TargetEntityClassCode = 'StgCleverHiZ' ;
-- Override settings
DECLARE @EntityClassFilterCondition VARCHAR(2048); EXEC MetadataZone.uspEntityClassAttributeFilter @ModelCode = 'StageLayerHiZ' ,@EntityClassCode = 'StgCleverHiZ' ,@AttributeCode = 'AuditTrailId' ,@EntityClassAttributeFilterCondition = 'LAST' ,@EntityClassAttributeFilterUnit = 'ProcessStepRun' ,@EntityClassAttributeFilterValue = '5' ,@IsHousekeeping = 1 ,@EntityClassFilterCondition = @EntityClassFilterCondition OUTPUT ;
SELECT @EntityClassFilterCondition;
-- New setting SELECT TargetModelCode ,TargetDatabaseName ,TargetSchemaName ,TargetEntityClassCode ,TargetAttributeCode ,HousekeepingAmount ,HousekeepingUnit FROM MetadataZone.MetadataForGenerateSource2StageLayer WHERE TargetEntityClassCode = 'StgCleverHiZ'
This means that
Keep last 5 load slices (aka Load Process Steps) in table StgCleverHiZ and use column AuditTrailId. Delete all older load slices:
AuditTrailId
17799
17720
17641
17562
17483
17404
17327
Considerations and consequences
It is possible to set housekeeping definitions on more than one column. This ist NOT recommended. But if used, than make sure the housekeeping process can handle more than one column for housekeeping.
How to use and implement the housekeeping definitions, see the next How-To.
How to use DAPI information for a housekeeping process?
See also previous How-To: How to set the amount of data kept in history tables, aka housekeeping?
Purpose
If the data pipeline into the data warehouse stores data in the History Zone as simple historicized tables, this data should only be stored for a limited period of time or number of loading runs.
Motivation
The motivation behind the so-called housekeeping is to keep the amount of data in the History Zone to a minimum. If it is necessary to store data for a longer period of time, then an Active Archive Zone is a good choice.
Applicability
The housekeeping is designed to keep only a certain amount of data (days, months or loading slices) in a table. In this way, housekeeping enables the amount of data to be kept as small as possible but as large as necessary, e.g. in the Hist-tables or a PSA.
This question focus on how to use the information of housekeeping within a housekeeping process.
Content
To use information for housekeeping the DAPI MetadataForGenerateSource2StageLayer in Schema MetadataZone has to be used. All information about which and how many data to keep are stored in two additional columns of the DAPI:
- HousekeepingAmount - How many data to keep
- HousekeepingUnit - Unit of data to keep
HousekeepingUnit |
Possible valid units are: 'DAY' |
HousekeepingAmount |
Possible values are positive integer numbers: = 0: keep history to infinity. |
The values are associated to the tablecolumn which drives the housekeeping.
Example (ProcessStepRun)
Doing housekeeping for
Table: StgCleverHiZ
Column: AuditTrailId
Unit: ProcessStepRun
SQL to lookup on DAPI
-- Default (old) setting SELECT TargetModelCode ,TargetDatabaseName ,TargetSchemaName ,TargetEntityClassCode ,TargetAttributeCode ,HousekeepingAmount ,HousekeepingUnit FROM MetadataZone.MetadataForGenerateSource2StageLayer WHERE TargetEntityClassCode = 'StgCleverHiZ' ;
Information for Housekeeping - Here: Keep the last 3 ProcessStepRuns, aka load slices.
This means that
Keep last 3 load slices (aka Load Process Steps) in table StgCleverHiZ and use column AuditTrailId. Delete all older load slices:
AuditTrailId
17799
17720
17641
17562
17483
17404
17327
Example (DAY)
Doing housekeeping for
Table: StgCleverHiZ
Column: AuditTrailId (contains now a timestamp)
Unit: DAY
SQL to lookup on DAPI
-- Default (old) setting SELECT TargetModelCode ,TargetDatabaseName ,TargetSchemaName ,TargetEntityClassCode ,TargetAttributeCode ,HousekeepingAmount ,HousekeepingUnit FROM MetadataZone.MetadataForGenerateSource2StageLayer WHERE TargetEntityClassCode = 'StgCleverHiZ' ;
Information for Housekeeping - Here: Keep the last 10 days.
This means that
Keep last 10 days in table StgCleverHiZ and use column AuditTrailId, which contains a timestamp, to identify data for housekeeping. Delete all older dates:
The following applies NOW() = 2020-07-30
AuditTrailId
2020-07-30
2020-07-29
2020-07-[...]
2020-07-22
2020-07-21
2020-07-20
2020-07-19
2020-07-18
2020-07-[...]
Considerations and consequences
It is possible to set housekeeping definitions on more than one column. This is NOT recommended. But if used, than make sure the housekeeping process can handle more than one column for housekeeping.