Several times I had the need for some large data sets to do some Data Vault tests at customer site, writing a blogpost, doing a demo or a webinar and many more. And sometimes I need data to do performance or data usage tests on different databases. Due to my work together with EXASOL I focused on the TPC-H tool DBGen to generate gigabytes of data.
To share my experience with DBGen generating large data sets I wrote this blogpost as a step by step instruction.
To compile the DBGen I’ve downloaded first the TPC-H sources and then Microsoft Visual Studio Community Edition and installed it.
After unzipping the tpc-h zip-file go to tpc-h-tool\tpch_2_17_0\dbgen and open the visual studio project file tpch.sln and build it.
There will be some errors but everything seems to build fine. Executing DBGen with the option -h I get some help:
C:\tpc-h-tool\tpch_2_17_0\dbgen\Debug>dbgen -h TPC-H Population Generator (Version 2.17.0 build 0) Copyright Transaction Processing Performance Council 1994 - 2010 USAGE: dbgen [-{vf}][-T {pcsoPSOL}] [-s <scale>][-C <procs>][-S <step>] dbgen [-v] [-O m] [-s <scale>] [-U <updates>] Basic Options =========================== -C <n> -- separate data set into <n> chunks (requires -S, default: 1) -f -- force. Overwrite existing files -h -- display this message -q -- enable QUIET mode -s <n> -- set Scale Factor (SF) to <n> (default: 1) -S <n> -- build the <n>th step of the data/update set (used with -C or -U) -U <n> -- generate <n> update sets -v -- enable VERBOSE mode Advanced Options =========================== -b <s> -- load distributions for <s> (default: dists.dss) -d <n> -- split deletes between <n> files (requires -U) -i <n> -- split inserts between <n> files (requires -U) -T c -- generate cutomers ONLY -T l -- generate nation/region ONLY -T L -- generate lineitem ONLY -T n -- generate nation ONLY -T o -- generate orders/lineitem ONLY -T O -- generate orders ONLY -T p -- generate parts/partsupp ONLY -T P -- generate parts ONLY -T r -- generate region ONLY -T s -- generate suppliers ONLY -T S -- generate partsupp ONLY To generate the SF=1 (1GB), validation database population, use: dbgen -vf -s 1 To generate updates for a SF=1 (1GB), use: dbgen -v -U 1 -s 1
But when I executed DBGen to generate data I got an error:
C:\tpc-h-tool\tpch_2_17_0\dbgen\Debug>dbgen
TPC-H Population Generator (Version 2.17.0)
Copyright Transaction Processing Performance Council 1994 - 2010
Open failed for .\dists.dss at c:\tpc-h-tool\tpch_2_17_0\dbgen\bm_utils.c:308
To solve this issue just copy the tpc-h-tool\tpch_2_17_0\dbgen\dists.dss file to tpc-h-tool\tpch_2_17_0\dbgen\Debug. To avoid other issues, execute DBGen in the Debug folder.
If you just run DBGen without any options, it will generate one GB of data shared over eight different tables:
- Customers
- Nation
- Lineitem
- Orders
- Parts
- Partsupp
- Region
- Supplier
Executing dbgen in verbose mode:
C:\tpc-h-tool\tpch_2_17_0\dbgen\Debug>dbgen -v
TPC-H Population Generator (Version 2.17.0)
Copyright Transaction Processing Performance Council 1994 - 2010
Generating data for suppliers table/
Preloading text ... 100%
done.
Generating data for customers tabledone.
Generating data for orders/lineitem tablesdone.
Generating data for part/partsupplier tablesdone.
Generating data for nation tabledone.
Generating data for region tabledone.

If you want to generate some more data use the option -s <n> (Scale Factor (SF)). By default, <n> is set to 1 and generates 1 GB of data. So -s 10 generates 10GB and -s 100 generates 100GB.
The data definition file (dss.ddl) and also the referential integrity file (dss.ri) are stored in the folder tpc-h-tool\tpch_2_17_0\dbgen.

Then simply load the data into the model and make your tests on it. Have fun!
I ran into a similar error while trying to execute dbgen and I couldn't find a Debug folder. Any help would be appreciated.
Hi Rohit,
I'll take a look at the problem you're having. It's been a while, but a good opportunity to update the blogpost.
Dirk
Hi Rohit,
I have tested the process described in the article with the current dbgen version (3.0.1) and the current version of Visual Studio Community (17.5). Everything worked as described. Did you check if the build (Ctrl-B) in Visual Studio was error free?
Cheers,
Dirk