Data Modeling

Introduction

This is probably a key factor, technically speaking, of the future health of your BI solution. There is not perfect nor unique “Data Modelling” that will fit in ALL solutions, but there are some points that will power up your future DW and extend to the max. all the possibilities.

Data modeling it is so extensive concept that can’t be covered within one post neither one book, but we can provide some easy to understand “dos and don’ts” when creating your Staging Area and DW for reporting.

 

STAGING AREA

Prepare yourself for ANYTHING and get AS MUCH AS YOU CAN

This sentence is key if we are getting our data from the source to the staging area. Basically, we have to capture all the data from the source system, and keep record of the original modeling.

So it important to get all the Business Keys, Foreign Keys, nullable or non nullable fields attributes, etc. from the source system tables. And most important, record all the history at the same time. That means, every time we run our ETL, we need to bring all the data, and capture some audit columns in order to keep record of the history. So if at any point, there is a PK (or BK) that is changing in the source, we need to create another record in our Staging table with the CDC info, i.e.: INSERT/UPDATE/DELETE

That will create a database with the data as it is in the source system, adding some audit fields and history, so at any point in time we can recreate our DW using its history and we won’t miss anything out.

Dos:

  • * Raw data in Stage, respecting the DDL (datatypes, fields, etc.)
  • * Get any table that might be a potential for your interest. The DB space/resources are limited, so I won’t say get EVERY single tables no matter what, but if possible, that is not a bad idea.
  • * Keep history, so you can recreate your DW and Datamarts easily in the future.

Don’ts:

  • * Don’t change data, like computing or merging two fields (i.e.: name + surname = full name)
  • * Don’t crop or truncate any fields. If you thing a datatype of VARCHAR(2000) is too big, keep that length in your staging or even round it up to 4000 or MAX.
  • * Don’t start modeling or “fixing” the data model in your Staging Area, even if it looks wrong, you should respect this DDL at this stage, that is something you’ll start working on the next phase, while moving data to DW or even better, to your presentation layer.

 

DATA WAREHOUSE

For our Data Warehouse I can’t recommend a unique model, as every solution/business is different, and the purpose of this DW could be Archiving, reporting, build a base for OLAP/Tabular cubes and so on.

The most extended and well known models are:

 

3NF

Pro

  • save most storage of all modelling techniques
  • simple to understand

Con

  • bad performance with multiple joins
  • difficult to load (ETL)
  • changes are very difficult to handle

 

* Star Schema

Pro

  • Simple design
  • fast queries
  • fits very well to olap models
  • many DBMS are optimized for queries on star schemas

Con

  • usually centralized on one fact table
  • higher storage usage due to denormalization

 

My suggestion is, to use “Data Vault” http://en.wikipedia.org/wiki/Data_Vault_Modeling in the DW core and from that point you can build star schemas in data marts. To explain the benefits of DV would take more than 1 post, so if you are really interested, just Google it or have a look at the Wikipedia link I provided.

Cheers!

 

 


Did you like this? Share it:

Leave a Reply

*