Calvin Fern's Blog

SSAS Tabular, PowerPivot and conformed snowflakes

Many of you will be aware of the Kimball concept of conformed dimensions.  This is the principle of joining multiple fact tables, often from multiple source systems, into the same dimension table.  Using this shared dimension the many fact tables can be analysed by the single view of the dimensional data and deliver powerful cross dataset analysis. 

In many cases the granularity of the differing source systems is different, a typical example of this is shown in the Adventureworks where sales data is gathered against an individual product but survey data is gathered against Product Category and SubCategory.  In order to facilitate easy key generation and lookups the Product Category and SubCategory are then separated from the main product table, this is described as a Snowflake design pattern as shown below. 

image

The first SSAS Tabular consideration here is that if you need to join in at a less granular level than the primary dimension table is is essential to present a snowflake to SSAS tabular, it cannot join in at a higher grain like SSAS Multidimensional can.  This is due to its need for a distinct list on the dimension side of the join key. 

The relationships in SSAS Tabular or Powerpivot can then be set up as shown below to join other data in at a higher grain such as the FactSurveyResponse below

image

The next step is to browse this data and check that we can indeed view all our data by Product SubCategory, using the ‘Analyze in Excel’ button we can browse the development cube and set up the pivot table below.  Allowing the slicing of disparate data by common dimensions in this fashion is one of the biggest business benefits of data warehousing. 

image

The solution thus far works but, in Excel and other add hoc tools, the three product tables and large number of attributes are confusing to the end user.  As such we should look at hiding unused and unwanted columns and using the tabular cube to denormalise or ‘flatten’ the options presented to the business user.  Unwanted columns are removed by using the hide from client tools options or removing unwanted columns from the cube all together (ideally using SQL Views).  To reduce the number of tables presented to the user we will hide DimProductCategory and DimProductSubCategory entirely from the user while adding derived columns holding this information to DimProduct.  To achieve this we use the RELATED keyword in DAX to add a few new columns.  Adding a ProductSubCategory Column to DimProduct we use the following expression:

=RELATED(DimProductSubcategory[EnglishProductSubcategoryName])

Once the above column is available, refreshing the Pivot table used earlier with the new column gives in the following result:

image

This incorrect result is the impact of the fact that relationships are only propagated towards the fact table automatically in Tabular and Powerpivot.  We need to explicitly specify that the measure we are using relates to the DimProduct table through ProductSubCategory and the context of this table needs to be used.  To do this we need to alter our measure calculation to take into account the related table DimProduct, unsurprisingly  this is achieved with the RELATEDTABLE function.  We use the CALCULATE function in combination to aggregate our data as appropriate.  The measure DAX formula can then be defined as follows:

=CALCULATE(
    DISTINCTCOUNT([CustomerKey]),
        RELATEDTABLE(DimProduct)
        ,VALUES(DimDate),
        VALUES(DimCustomer)
    )

A values statement is needed to allow the current filter context to be used for all other dimensions on the fact, this doesn’t seem very elegant and I am open to a better way of doing it. The measure as defined above can then be used in excel to give the same result as we first achieved while giving a much cleaner user interface with a single product table for users to find all their information.

Any question or improvements get in touch here or @CalvinFerns

Thanks for reading