How to load easy some data vault test data

: Published: 13 May 2016; Last Updated: 13 May 2016

Some time ago a customers asked me how to load easy and simple some (test)data into their database XYZ (chose the one of your choice and replace XYZ) to test their new developed Data Vault logistic processes.
The point was: They don’t want to use all this ETL-tool and IT-processes overhead just to some small test in their own environment. If this this is well done from a data governance perspective? Well, that’s not part of this blogpost. Just do this kind of thingis only in your development environment.

All databases, as far as I know, have their own loading utilities, e.g. FastLoad at Teradata or Bulk Insert at MS SQLServer. So before considering some weird loading practices like Data to Excel to Access via ODBC to database XYZ (be sure, these techniques are out there) you can simply use the provided toolset of vendor XYZ.

Let me give you a small example using Bulk Insert with MS SQL Server.

Let’s say my (fictitious) customer FastChangeCo want to load some data out of his source system Fibonacci. The interface contains three attributes (ID, Fibonacci-Sequence [1] and Lucas-Sequence [2]) which should be loaded into a Hub and Satellite.

First create a simple table in your Staging Layer:

CREATE TABLE FibonacciSource (
   ID                     INTEGER
   ,FibonacciSequence      BIGINT
   ,LucasSequence          BIGINT
)

Stage Table FibonacciSource

Then put the Fibonacci Source File into a folder of your choice (Btw. I provide all shown scripts via download), let’s assume to c:\FibonacciSource

Fibonacci Source File

If you execute the following simple script, your data will be loaded. That’s it.

-- set date format
SET LANGUAGE British;

-- clear target table
DELETE FROM FibonacciSource;

BULK INSERT FibonacciSource
    FROM 'c:\FibonacciSource\FibonacciSource.csv'
    WITH
    (
    FIRSTROW = 2,
    FIELDTERMINATOR = ';', --CSV field delimiter
    ROWTERMINATOR = '\n',   --Use to shift the control to next row
    ERRORFILE = 'c:\FibonacciSource\FibonacciSourceError.csv',
    TABLOCK
    )

SELECT *
FROM FibonacciSource
;

Bulk Insert MSSQLServer

I set the language to British bevor loading the data to not get into trouble with the European vs. US date format. If you want to dive deeper into the Bulk Insert feel free to read the documentation on [LINK]

Now FastChageCo can easy test their Hub and Satellite data logistic processes.

So long,

Dirk

References

[1] https://en.wikipedia.org/wiki/Fibonacci_number

[2] https://en.wikipedia.org/wiki/Lucas_number