Aiming for the real time with Tabular Cubes

The purpose of this post is to clarify some basics on the SSAS Tabular Cubes and how we can apply Incremental updates in order to achieve RealTime and its best practice.

 

First of all it is important to know about the basic processing options on Tabular cubes:

 

Object Available Processing Options
Database Process Clear, Process Default, Process Defrag, Process Full, Process Recalc
Table Process Clear, Process Data, Process Default, Process Defrag, Process Full
Partition Process Add, Process Clear, Process Data, Process Default, Process Full

 

And the meaning of these processing options:

 

Processing Option Description
Process Add Adds new rows to a partition. Any affected calculated columns, relationships, user hierarchies, or internal engine structures (except table dictionaries) are recalculated.
Process Clear Drops all the data in a database, table, or partition.
Process Data Loads data into a partition or table.
Process Default Loads data into unprocessed partitions or tables. Any affected calculated columns, relationships, user hierarchies, or internal engine structures (except table dictionaries) are recalculated.
Process Defrag Optimizes the table dictionary (an internal engine structure) for a given table or for all tables in the database*. This operation removes all dictionary entries that no longer exist in the data set and rebuilds the partition data based on the new dictionaries.
Process Full Loads data into all selected partitions or tables. Any affected calculated columns, relationships, user hierarchies, or internal engine structures (except table dictionaries) are recalculated.
Process Recalc For all tables in the database, recalculates calculated columns,  rebuilds relationships. rebuilds user hierarchies, and rebuilds other internal engine structures. Table dictionaries are not affected.

 

Once we have a clear idea of all the options we can apply in our BI Lifecycle we must differentiate between the concept that update or refresh your database doesn’t mean real update DWH:

 

  • Recalculate (Process Recalc) is not going to add new info, it is going just to recalculate all the measures even if you don’t insert new data in your cube.

 

  • Partitioning is a very common and old practice used mainly in the big solutions (huge amount of rows) where you need to process some old part of your BI solution only once and that’s not going to change. i.e.: sales 2005, 2004, 2003… In order to define your partitions you are flexible enough to specify a query that will bring only part of the table to these partitions, like “WHERE ID_Dim_Date >= 20010101 and ID_Dim_Date <= 20021231” where your current partition will have something like “WHERE ID_Dim_Date > 20130101” . That could be an option to only process part of your data, but it is not the best practice because you have to define your partitions which they’ll be at the Year or Month level, that means, if you have to re-process your 2013-11 partition every 5 minutes to aim realtime data.

 

  • Incremental processing (Process Add) is the one to achieve your beloved real-time. It will do the deltas by itself, adding or updating whatever just changed.¬†Sadly, you can’t specify a query to apply the deltas only for an specific period.

 

Best Practice:

The ideal scenario is to use the process Add but not using the whole table (reducing the amount of data to process). Hence we can use a Partition, let’s say for 2013-11 and on top of that, ProcessAdd, which will keep and process very quickly the deltas. Then you can re process the cube as many times as you need and getting closer to what we call “real time”.


Did you like this? Share it:

Leave a Reply

*