Filtering data items

In the topic Filtering data items all questions about filtering source tables for Load Process Steps are covered. Filtering is introduced with version 2.10.0 of the ExMeX Framework Core. We highly recommend to use the ExMeX Companion App to apply filters.

How to use DAPI information to filter data within Load Process Step (LPS)?

See also previous How-To: How to use DAPI information to filter data within Load Process Step (LPS)? Purpose If the…

See also previous How-To: How to use DAPI information to filter data within Load Process Step (LPS)?

Purpose

If the data pipeline into the data warehouse loads a huge amount of data, e.g. full loads, but only a small part of data is needed, filtering the data before export from any source table is the way to reduce the amount of data which has to be populated.

Motivation

The motivation behind filtering is to keep the amount of data as small as possible within the scope. This will increase loading performance and reduce space for storing (not needed) data. The influence on the performance of the data warehouse through less, unused data should not be underestimated. Reducing data to the bare minimum has a positive effect on backup/restore, query speed, the IO subsystem and much more.

Applicability

The filter feature serves first of all to reduce the amount of data that the data pipeline loads into the data warehouse to the data really needed in the scope. This can mean that an unfiltered data stream (100%) can be reduced to a fraction (<5%).

Therefore the most common use case is to apply the filter to columns of the incoming interface.

Content

To use the filter conditions the DAPI MetadataForGenerateSource2StageLayer in Schema MetadataZone has to be used. All information about which columns and how to filter are stored in two additional columns of the DAPI:

  • SourceAttributeFilterCondition - The filter condition itself
  • SourceAttributeFilterOuterClusterType - How to "connect" logical to other column filter conditions
HousekeepingUnit

The filter condition on the associated column

SourceAttributeFilterOuterClusterType

Possible values are:

AND
OR

Currently only AND is possible. To use OR some more information are needed. This will be available in an upcoming release.

Example

There are set some filters in the previous How-To: How to use DAPI information to filter data within Load Process Step (LPS)? on table: Lieferant

Let's check the filter set on Lieferant in the DAPI MetadataForGenerateSource2StageLayer:

-- Table with new filter
SELECT SourceModelCode
      ,SourceDatabaseName
      ,SourceSchemaName
      ,SourceEntityClassCode
      ,SourceAttributeCode
      ,SourceAttributeFilterOuterClusterType
      ,SourceAttributeFilterCondition
FROM MetadataZone.MetadataForGenerateSource2StageLayer
WHERE SourceEntityClassCode = 'Lieferant'
  AND SourceModelCode       = 'SourceModelA'
ORDER BY SourceAttributeOrder 

HowToUseFilterCheck on filter - result in DAPI.

How to interpret the data?

The data pipeline only populates data from the source table which fulfill the filter conditions:

Take data where the number in column LieferantNummer is greater or equal 2015 and any text in LieferantLand is either 'DE' or 'US'. Ignore all other data.

How do you now assemble the selection from the source?

Let's have a look at the result set:

How To Use Filter DAPIFilter Conditions - Here: Take data where LieferantNummer is greater or equal 2015 and LieferantLand is either 'DE' or 'US'

It doesn't matter right now in which order you assemble the filter together. For now take first no. 1, than no. 2. Vice versa is also fine. To select data from the source table just add the above shown filter conditions after the dummy condition 1 = 1 (so there is no need to care about how to us the column SourceAttributeFilterOuterClusterType) as shown in the following SQL-statement:

SELECT *
FROM Lieferant
-- Dummy condition
WHERE 1 = 1

-- Add SourceAttributeFilterOuterClusterType of (1)
-- Add SourceAttributeFilterCondition of TargetAttributeCode LieferantNummer(1)
AND ([LieferantNummer] >= 2015)

-- Add SourceAttributeFilterOuterClusterType of (2)
-- Add SourceAttributeFilterCondition of TargetAttributeCode LieferantLand (2)
AND ([LieferantLand] IN ('DE','US'))

That's it. Nothing more to do. Just copy and paste the pieces together.

The following table shows an example of how the data would be filtered:

LieferantNummer LieferantLand SomeOtherColumns Comment
7799 DE [...] Filter conditions are fulfilled
1727 FR [...] Both filter condition are not fulfilled.
7720 NR [...] Filter condition of LieferantLand is not fulfilled.
173 AC [...] Both filter condition are not fulfilled.
7641 US [...]  Filter conditions are fulfilled
122 AC [...] Both filter condition are not fulfilled.
1404 US [...] Filter condition of LieferantLand is not fulfilled.
Considerations and consequences

It is possible to set one or more filter conditions on more than one column within the table. If used, make sure the load process step can handle more than one filter condition on each column within the table.

  • Stay with the pattern to concat the filter conditions together!
Posted 2 years agoby Dirk.Lerner

How to enable filter on a (source) table to reduce data loaded into a (target) table?

Purpose If the data pipeline into the data warehouse loads a huge amount of data, e.g. full loads, but only…
Purpose

If the data pipeline into the data warehouse loads a huge amount of data, e.g. full loads, but only a small part of data is needed, filtering the data before export from any source table is the way to reduce the amount of data which has to be populated.

Motivation

The motivation behind filtering is to keep the amount of data as small as possible within the scope. This will increase loading performance and reduce space for storing (not needed) data. The influence on the performance of the data warehouse through less, unused data should not be underestimated. Reducing data to the bare minimum has a positive effect on backup/restore, query speed, the IO subsystem and much more.

Applicability

The filter feature serves first of all to reduce the amount of data that the data pipeline loads into the data warehouse to the data really needed in the scope. This can mean that an unfiltered data stream (100%) can be reduced to a fraction (<5%).

Therefore the most common use case is to apply the filter to columns of the incoming interface.

Content

It is possible to set one or more filter for each column of a table individually. The easiest way to do this is to use the TEDAMOH ExMeX Companion App in the filter settings.

ExMeX Companion App Filter configuration Window Initial

Or with SQL by using the usp uspEntityClassAttributeFilter. For more details about the Stored Procedure uspEntityClassAttributeFilter go to section Framework Core In Depth -> Stored Procedures.

Example

Let's do a simple example of setting a filter on a table of the incoming interface SourceModelA:

Table: Lieferant
Column: LieferantNummer
Filter Condition: =>
Value: 2015

This means that only suppliers which has a supplier number higher or equal 2015 are loaded into the Data Warehouse:

AuditTrailId
7799
7720
7641
122
173
1404
1727

The following SQL shows how to execute the stored procedure to set the filter. For more information how to execute the stored procedure read about usp - Entity Class Attribute Filter in section Stored Procedure.

-- Table without filter
SELECT SourceModelCode
      ,SourceDatabaseName
      ,SourceSchemaName
      ,SourceEntityClassCode
      ,SourceAttributeCode
      ,SourceAttributeFilterOuterClusterType
,SourceAttributeFilterCondition FROM MetadataZone.MetadataForGenerateSource2StageLayer WHERE SourceEntityClassCode = 'Lieferant' AND SourceModelCode = 'SourceModelA'
; -- Set filter DECLARE @EntityClassFilterCondition VARCHAR(2048); EXEC MetadataZone.uspEntityClassAttributeFilter @ModelCode = 'SourceModelA' ,@EntityClassCode = 'Lieferant' ,@AttributeCode = 'LieferantNummer' ,@EntityClassAttributeFilterCondition = '=>' ,@EntityClassAttributeFilterValue = '2015' ,@EntityClassAttributeFilterDatatype = 'INT' ,@EntityClassFilterCondition = @EntityClassFilterCondition OUTPUT ; SELECT @EntityClassFilterCondition;

Later, let's add another filter with the ExMeX Companion App:

ExMeX Companion App Filter configuration Window LIKE condition

Let's check the filter in the DAPI MetadataForGenerateSource2StageLayer:

-- Table with new filter
SELECT SourceModelCode
,SourceDatabaseName ,SourceSchemaName ,SourceEntityClassCode ,SourceAttributeCode ,SourceAttributeFilterOuterClusterType ,SourceAttributeFilterCondition FROM MetadataZone.MetadataForGenerateSource2StageLayer WHERE SourceEntityClassCode = 'Lieferant' AND SourceModelCode = 'SourceModelA'
ORDER BY SourceAttributeOrder

HowToUseFilterHow to set a filter - result in DAPI.

Considerations and consequences

It is possible to set one or more filter conditions on more than one column within the table. If used, make sure the load process step can handle more than one filter condition on each column within the table.


How to use and implement the set filter conditions, see the next How-To: How to use DAPI information to filter data within Load Process Step (LPS)? .

 

Posted 2 years agoby Dirk.Lerner