World Wide Data Vault Consortium 2015

: Veröffentlicht: 03. Juni 2015; Zuletzt aktualisiert: 15. Juli 2016

Last week, there was it again. A meeting of Data Vault geeks and interested people in Data Vault! And again in the wonderful area of Vermont, USA. But these year we hat +30°C compared to last year’s -20°C. In Stowe, Vermont, the conference was held in the wonderful Trapp Family Lodge. To give you some more insights I’ll embed some of the tweets. If you want read the full timeline, go here!

Arrived at the hotel Trapp Family Lodge at Stowe, Vermont. Relaxing a bit... #WWDVC #DataVault pic.7905d1c4e12c54933a44d19fcd5f9356-gdprlock/g1asDT4SFM
— Dirk Lerner (@DV_Modeling) 26. Mai 2015

Right before the conference some of us did a “pre-workshop”, talking with Dan about special topics and some brand new stuff of Data Vault 2.0. One of this topics was a variation to deal with ghost records or orphans in SAT(ellites) / equi-joins. Others are Point in Time Tables in the Business Vault, Teaming in the Data Vault 2.0 environment and Managed Self Service BI (M.SS.BI).

@DV_Modeling says you have to think about the data architecture before you start the ETL programming. Yup. #WWDVC pic.7905d1c4e12c54933a44d19fcd5f9356-gdprlock/gwr3KK4GE8
— Kent Graziano (@KentGraziano) 27. Mai 2015

On Day 1 Claudia Imhoff (founder of the Boulder BI Brain Trust or #BBBT) spoke in her keynote Unleash the power of analytics about traditional warehousing and extending data warehouse architectures with new “modules” like

Real time analytics engine as a kind of source system or
Including seamless external data or
Data provisioning as data refinery (refine raw data in a data lake to valuable data)

@Claudia_Imhoff kicks us off with her keynote at #WWDVC pic.7905d1c4e12c54933a44d19fcd5f9356-gdprlock/qkBZh3On6T
— Kent Graziano (@KentGraziano) 28. Mai 2015

Next, Dan Linstedt (Founder of the Data Vault Methodology) talked about Data Vault 2.0 in his presentation Big Data, NoSQL and Modeling. The key points were

ETL (in the sense of tools) is dead, not needed anymore,
Education is essential and
Performance issues with Hadoop.

Furthermore, Dan presents an outstanding client case where a customer uses a really huge Data Vault 2.0 installation on Teradata in Down Under.

Kent Graziano (Data Warrior, Oracle Ace Director) shows us in his famous way how to implement Data Vault at a customer without being allowed either doing nor talking about Data Vault. Great presentation about Real world data warehousing, how to solve politics and finding new names for common Data Vault patterns.

Then, it was my turn about Temporal data warehouse and Data Vault. I’ll write a blogpost about it later.

@DV_Modeling talks about #Temporal #DataWarehouse #DataVault and the advantages of #TeraData at #WWDVC @ITGAIN_GmbH pic.7905d1c4e12c54933a44d19fcd5f9356-gdprlock/jJ5LQcBCa5
— Kwitschi (@Kwitschi) 28. Mai 2015

Thanks to Dirk Schittko for his awesome insight into the German social system / charity organisations and the challenges to build an easy to maintain and low cost data warehouse with Data Vault 2.0 – “Business intelligence beyond graphs and tables. Dirk solved this problem in 1/5^th of the time big companies estimated to do.

#wwdvc Dirk Schittko said: "We ingested a new system in 10 days with #dv2, compared to big consultancy saying: 50 days for the same work"
— Daniel Linstedt (@dlinstedt) 28. Mai 2015

An amazing presentation and demo shows us Roelant Vos during his two talks about Allianz Global Assistance – Data Vault Case Study and New frontiers: Virtualize your EDW.
Incredible how Roelant virtualize a Data Vault out of a persistent staging area with metadata. Great. Roelant, I have to do it too!

Between the presentations (and in the evenings) we had enough time to network. It’s what makes the WWDVC different to other conferences. You can talk to everyone, talk with all this cutting edge data geeks and you are welcome to all. Like Sam Bendict wrote on LinkedIn WWDVC: Surrogate keys are not a solution to infertile Parent Keys.

`In spite of being on the IT leadership side of the equation, I always feel welcome and free to ask any question no matter how ‘Data Vault 101’ it may be’ – just an impressive bunch of people, all ready to share their knowledge and experiences.

@dlinstedt people from all over the world at #WWDVC 2015 pic.7905d1c4e12c54933a44d19fcd5f9356-gdprlock/0Hq5iYQf1d
— Oliver Cramer (@proximaastra) 28. Mai 2015

Day 2 brought us some vendor presentations of

Ultimate Software – Event-driven Rreal-time EDW in the cloud
Wherescape – Wherscape solution for Data Vaults,
MID - Model driven DV2 data warehouse – complete example and
AnalyticsDS – AnalyticsDS – Mapping Manager.

It’s every time interesting how and why vendors implement Data Vault 2.0 in their tools.

Sanjay Pande states in his talk Agile Big Data warehousing with Data Vault 2.0:

Agile is about continuous improving, not just fast!

Furthermore he spoke about performance issues with hive and other tools in the Hadoop universe, how to extract data best out of Hadoop and recommended several tools. And finally made us a gift: His new book which he’s currently writing on it.

Beside Claudia Imhoff I met in person Scott W. Ambler (Father of Agile Modeling and Discipline Agile Development - DAD) for the first time. Amazing presentation about why to be and to do it the agile way, the cultural gap between development and data folks according to degree of maturity in agile development and database refactoring.
Database refactoring: Evolve the database schema by continuous development. Mark old stuff (schema changes) as deprecated and drop it sometime later. Very interesting stuff everyone should consider when doing agile!
Agile data modelling does not mean not to model. It is evolutionary data modelling. Model when you know what to model and when you need it according to the agile manifesto. Similar to my last blogpost Data Vault KISS - Keep it Small and Simple.
Summary of Scott Amblers talk: Data folks, think outside the box!

Second days dinner was sponsored by AnalyticsDS. Thanks to Sam Benedict for the awesome eve!

First crazy shirt picture of today#WWDVC pic.7905d1c4e12c54933a44d19fcd5f9356-gdprlock/7WEyEoaGig
— Dirk Lerner (@DV_Modeling) 30. Mai 2015

The last day of the WWDVC was all about crazy shirts. Have a look at the amazing tweets. It was a mixture of Big Bang Theory and Magnum. What a funny idea.

But there were not only crazy shirts. We listened to Kent Grazianos Extreme BI: Creating virtual dimensions and Christian Hädrichs Temporality in Data Vault.

Kent spots us on why we should be virtual:

Support agility
Eliminates ETL bottlenecks
No need for backups

One another interesting key point Kent brought on top is:

All presentation which included virtualisation uses more or less the same SQL. Why? Because Data Vault is pattern based.

Think about that.

Christian wrapped up how to use multi active SAT(ellites) as base for bitemporality in Data Vault. By using additional error SAT(ellites) you can create out of ugly temporal data well organised bitemporal SAT(ellites) and correct timelines in the sense of full, overlapping and condensed timelines.

Finally it was for me a great event! Tons of new ideas stuck in my mind during these days. And great in depth discussions with Kent, Roelant, Marcel and many more. Thanks to Dan for organizing this awesome event.

What remains are my personal “souvenirs” of the conference:

Relating modelling techniques

Managed Self Service BI (M.SS.BI) means write back a lot of data into the business vault.
Point in Time (PIT) tables are now part of the Business Vault.
INNER Join in Data Vault 2.0 (New Option): Using Point in Time (PIT) tables to solve the INNER join challenges in Data Vault without using a full timeline. It’s an option to full timelines when volume of data is a performance issue.
With zero records, or ghost records, in SAT(ellites) you build full time rows in PITs only for all necessary and required business keys improving query speed on virtualized SCDs. With this technique only one ghost record in each SAT(elite) is needed compared to full timeline SAT(ellites) which need for each business key a ghost record.
Both techniques are valid options. It depends of concerns in specific cases.
Using virtualisation and insert only SAT(ellites) due to simplify portioning and backups of your warehouse. Saves huge amount of data to backup and only new data will be backuped.

Relating methodology

We, the data folks, are almost 20 years behind the software guys in adopting agile techniques. A lot of work is to do.
Agile: Not only time boxes. Doing agile in a mature way means deliver value in a continuous way to production.
Deprecate “old” data model parts for change management. So there’s no need to refactor immediately all succeeding apps due to data model changes.

Others

Crazy shirt contest makes a conference unique and easy doing
Write all my future blogpost in English

So long
Dirk

Some more impressions:

Getting preview from @dlinstedt of as yet unpublished DV 2.0 standard for dealing with missing data #WWDVC #DataVault pic.7905d1c4e12c54933a44d19fcd5f9356-gdprlock/XA2vWdJQgY
— Kent Graziano (@KentGraziano) 27. Mai 2015

#WWDVC @scottwambler This is so true for many things in life as well as #agile methods. pic.7905d1c4e12c54933a44d19fcd5f9356-gdprlock/OQZlqf54t7
— Kent Graziano (@KentGraziano) 29. Mai 2015

Nice to meet @Claudia_Imhoff in real life at #WWDVC #BBBT pic.7905d1c4e12c54933a44d19fcd5f9356-gdprlock/zFg7TYaFiE
— Dirk Lerner (@DV_Modeling) 28. Mai 2015

@scottwambler and myself. Great to meet all this smart ppl in person and real life#WWDVC pic.7905d1c4e12c54933a44d19fcd5f9356-gdprlock/eJSGp04jkB
— Dirk Lerner (@DV_Modeling) 29. Mai 2015