Several times I had the need for some large data sets to do some Data Vault tests at customer site, writing a blogpost, doing a demo or a webinar and many more. And sometimes I need data to do performance or data usage tests on different databases. Due to my work together with EXASOL I focused on the TPC-H tool DBGen to generate gigabytes of data.

To share my experience with DBGen generating large data sets I wrote this blogpost as a step by step instruction.

To compile the DBGen I’ve downloaded first the TPC-H sources and then Microsoft Visual Studio Community Edition and installed it.

After unzipping the tpc-h zip-file go to tpc-h-tool\tpch_2_17_0\dbgen and open the visual studio project file tpch.sln and build it.

TPC H Build DBGen

There will be some errors but everything seems to build fine. Executing DBGen with the option -h I get some help:

C:\tpc-h-tool\tpch_2_17_0\dbgen\Debug>dbgen -h
TPC-H Population Generator (Version 2.17.0 build 0)
Copyright Transaction Processing Performance Council 1994 - 2010
dbgen [-{vf}][-T {pcsoPSOL}]
        [-s <scale>][-C <procs>][-S <step>]
dbgen [-v] [-O m] [-s <scale>] [-U <updates>]

Basic Options
-C <n> -- separate data set into <n> chunks (requires -S, default: 1)
-f     -- force. Overwrite existing files
-h     -- display this message
-q     -- enable QUIET mode
-s <n> -- set Scale Factor (SF) to  <n> (default: 1)
-S <n> -- build the <n>th step of the data/update set (used with -C or -U)
-U <n> -- generate <n> update sets
-v     -- enable VERBOSE mode

Advanced Options
-b <s> -- load distributions for <s> (default: dists.dss)
-d <n> -- split deletes between <n> files (requires -U)
-i <n> -- split inserts between <n> files (requires -U)
-T c   -- generate cutomers ONLY
-T l   -- generate nation/region ONLY
-T L   -- generate lineitem ONLY
-T n   -- generate nation ONLY
-T o   -- generate orders/lineitem ONLY
-T O   -- generate orders ONLY
-T p   -- generate parts/partsupp ONLY
-T P   -- generate parts ONLY
-T r   -- generate region ONLY
-T s   -- generate suppliers ONLY
-T S   -- generate partsupp ONLY

To generate the SF=1 (1GB), validation database population, use:
        dbgen -vf -s 1

To generate updates for a SF=1 (1GB), use:
        dbgen -v -U 1 -s 1

But when I executed DBGen to generate data I got an error:

TPC-H Population Generator (Version 2.17.0)
Copyright Transaction Processing Performance Council 1994 - 2010
Open failed for .\dists.dss at c:\tpc-h-tool\tpch_2_17_0\dbgen\bm_utils.c:308

To solve this issue just copy the tpc-h-tool\tpch_2_17_0\dbgen\dists.dss file to tpc-h-tool\tpch_2_17_0\dbgen\Debug. To avoid other issues, execute DBGen in the Debug folder.

If you just run DBGen without any options, it will generate one GB of data shared over eight different tables:

  • Customers
  • Nation
  • Lineitem
  • Orders
  • Parts
  • Partsupp
  • Region
  • Supplier


Executing dbgen in verbose mode:

C:\tpc-h-tool\tpch_2_17_0\dbgen\Debug>dbgen -v
TPC-H Population Generator (Version 2.17.0)
Copyright Transaction Processing Performance Council 1994 - 2010
Generating data for suppliers table/
Preloading text ... 100%
Generating data for customers tabledone.
Generating data for orders/lineitem tablesdone.
Generating data for part/partsupplier tablesdone.
Generating data for nation tabledone.
Generating data for region tabledone.
TPC H Files

If you want to generate some more data use the option -s <n> (Scale Factor (SF)). By default, <n> is set to 1 and generates 1 GB of data. So -s 10 generates 10GB and -s 100 generates 100GB.

The data definition file (dss.ddl) and also the referential integrity file (dss.ri) are stored in the folder tpc-h-tool\tpch_2_17_0\dbgen.

For your convenience I’ve created a data model and DDLs for different databases you can download right here:
TPC H Datamodel
Then simply load the data into the model and make your tests on it. Have fun!
Download scripts here:
And that’s it folks. There are some upcoming blogposts according to Data Vault on Exasol and other interesting performance topics. Stay tuned.
Please leave some comments if you need more information or examples for other databases.
So long,

3 Kommentare

  • I ran into a similar error while trying to execute dbgen and I couldn't find a Debug folder. Any help would be appreciated.

    • Hi Rohit,

      I'll take a look at the problem you're having. It's been a while, but a good opportunity to update the blogpost.


    • Hi Rohit,

      I have tested the process described in the article with the current dbgen version (3.0.1) and the current version of Visual Studio Community (17.5). Everything worked as described. Did you check if the build (Ctrl-B) in Visual Studio was error free?


Kommentar hinterlassen

Als Antwort auf Some User