Jeremy Kashel's Blog

February 2010 - Posts

Allocations in PowerPivot Using DAX

Although I didn't mention it, the inspiration for my last post was the excellent MDX Solutions 2nd Edition. Whilst flicking through the book a while back, I though it would be interesting to see how DAX and PowerPivot could handle the so called 'Common Calculations and Selections in MDX'.

This post continues that theme, focusing on one calculation in particular that's in the aforementioned MDX book, namely 'Unweighted Allocations down the Hierarchy'. Essentially this business problem is dealing with allocating data that is entered at a higher level (e.g. Quarter) down to a lower level (e.g. Day). It varies from business to business what the users actually want to see at the day level in this situation. Some expect to see the Quarter amount, some want to see a blank value and then some users want the amount in the quarters allocated down to the day level.

If the expectation is that data should be allocated, then one way of doing the allocation is to use a ratio. In the MDX book, this is achieved by the following MDX:

1.0 / 
Descendants(
            Ancestor(
                    [Date].[Calendar].CurrentMember,
                    [Date].[Calendar].[Calendar Quarter]
                    ),
            [Date].[Calendar].CurrentMember.Level,
            SELF
            ).Count

If we put this into a query and run it against the Calendar hierarchy in AdventureWorksDW we get the following results:

image

Therefore our quarter-entered value can simply be multiplied by this ratio to get the correct value at each level in the Calendar hierarchy.

PowerPivot

So the challenge is how to be able to do this in PowerPivot. I started off by looking for some data in AdventureWorksDW that would fit the bill. A good candidate seems to be the FactSalesQuota table, which I imported into PowerPivot along with its related dimensions:

image

Once in the pivot table, I wanted to be able to calculate the correct ratio at the daily and monthly levels. For the daily level this is 1 / 90, as there are 90 days in the Q1 that year, and for the month level it is 1/3 as there are 3 months in the quarter.

Given that there's no MDX style Ancestor() function in DAX, I ended up having to have a different calculation at different levels. Something that I learnt from this forum post is that you need to use IF() in order to isolate a calculation at a different level. Therefore the DAX that I came up with was:

=IF( COUNTROWS(VALUES('DimDate'[FullDateAlternateKey])) = 1,

    1 / CALCULATE( COUNTROWS('DimDate'), ALL('DimDate'[FullDateAlternateKey], 'DimDate'[EnglishMonthName]) ),
    1 / CALCULATE( COUNTROWS( DISTINCT( 'DimDate'[EnglishMonthName]) ), ALL('DimDate'[EnglishMonthName]) )
   )

The COUNTROWS(VALUES('DimDate'[FullDateAlternateKey])) = 1 allows me to check that we're at the day level, and if we are at the day level, then count the number of days in the quarter. Otherwise, if we're at the month level we calculate the number of months in the current quarter. When this DAX is put into a new measure we get the following results:

image

This produces the same results as the MDX statement. The next step was then to apply this ratio to the Sales Amount Quota for the current quarter. This was achieved using the following DAX:

=IF(
      COUNTROWS( VALUES('DimDate'[FullDateAlternateKey]) ) < 90,
      CALCULATE( Sum('FactSalesQuota'[SalesAmountQuota]), 
      All('DimDate'[FullDateAlternateKey], 'DimDate'[EnglishMonthName]) ) * 'FactSalesQuota'[Ratio],
      Sum('FactSalesQuota'[SalesAmountQuota])
   )

The < 90 at the start of the statement allows me to check if we're at the day level or the month level. If we're at either of these levels, then we want to pick up the Quarter-level SalesAmountQuota and multiply by the ratio. Otherwise, at the Quarter and Year levels, we just apply the original SalesAmountQuota value. This produces the following results:

image

Conclusion

The use of the MDX Scope statement would have been nice to pinpoint exactly where I wanted the calculation to occur, but then again, the use of IF() keeps it quite simple, which is better for an Excel power user. Whilst MDX is very very powerful, can you imagine a business user getting to grips with MDX statements such as Scope or Freeze?

The bottom line is that the DAX calculation produces exactly what I want - the new AllocatedSalesQuota measure is correct at all levels in the Calendar hierarchy. Even with an IF() statement, like everything else in PowerPivot, the calculation performs very quickly.

DAX Closing Balances

One of the often required calculations for users in reports is to be able to calculate a point-in-time closing balance, which is especially relevant when dealing with stock levels or financial data.

In the Analysis Services world, at least in the Enterprise Edition, we're lucky that we have a variety of Aggregation Functions that can deal with semi-additive measures such as a closing balance. Or we can always turn to MDX if we need to which will achieve the same thing.

In PowerPivot, we don't have Aggregation Functions, but there is DAX, which has a variety of time intelligence functions up its sleeve. The idea behind this post is to see how PowerPivot would deal with the requirement of reporting a closing balance measure.

DAX Time Intelligence Functions

One of the DAX functions that we can use for our closing balance is called ClosingBalanceMonth(), which will simply evaluate an expression at the end of a given month. There's also the similar ClosingBalanceQuarter() and ClosingBalanceYear().

Having noticed these functions within PowerPivot, my first port of call was to set up a simple PowerPivot model by taking data from the FactInternetSales table in AdventureWorksDW, plus all it's related dimensions of course. What I ended up with was the following simple pivot table in Excel 2010:

image

Sales Amount obviously isn't a semi-additive measure, but I'm treating it as one in this example. My first port of call was to use ClosingBalanceMonth() to try and get the value for the month to be equal to the value for last day in the month. I managed to come up with the following formula:

=CLOSINGBALANCEMONTH(Sum([SalesAmount]), 'DimDate'[FullDateAlternateKey], ALL('DimDate'[FullDateAlternateKey]))

This produces the results that are shown below, in the measure called ClosingBalanceFormula:

image

It does a good job of calculating a closing balance for each month, but it presents that closing monthly balance on each day. Therefore, I started looking for an alternative approach and found the LastDate() function. It returns the last date that's in context for the passed date column. The last date that's in context at the month level will be the last date in the month, but thankfully the last date that's in context at the day level will be the current day. Therefore I came up with the following formula:

='FactInternetSales'[Sum of SalesAmount](LASTDATE('DimDate'[FullDateAlternateKey]))

This produced the results that I wanted - the daily closing balance at the day level and the monthly closing balance at the month level:

image 

A bit more digging lead me to find out that a combination of the Calculate() and LastDate() functions gets to the same result, as an alternative. That formula is:

=CALCULATE(Sum('FactInternetSales'[SalesAmount]), LASTDATE('DimDate'[FullDateAlternateKey]))

Conclusion

It's a shame that the ClosingBalanceMonth() formula didn't quite work, but perhaps getting that approach to work will come out in the wash with more DAX experience. However, as with MDX or any kind of expression language, there's always going to be a load of different ways to get to the same result. It's much more efficient to work knowing that you've got several ways around a particular problem, rather than being forced down a particular path, and in fact, I think the range of calculation functions available shows the richness of the DAX language.

Technorati Tags: ,,
PowerPivot and Data Warehousing

A consultant that I know has recently been through the process of reviewing his company's management information requirements, and also the process of evaluating what technology platform can deliver those requirements.

Naturally, a data warehouse was one of the options. A data warehouse, in my opinion, will provide the enterprise-level platform for any organisation to deliver their management information.

In this case though, due diligence flagged up some alternative approaches to a data warehouse, namely the raft of self service BI tools that are out there, including PowerPivot. Other tools that can take data in from a variety of data sources are Tableau (which is also a good front end for Analysis Services) and Spotfire for example.

I think that all of these tools have their merits. PowerPivot is incredibly fast and easy to use, whereas Tableau and Spotfire for example have some handy data visualisation capabilities. Some vendors actually claim that self service tools can replace a data warehouse. Can this really be the case? Whereas these tools do a good job of presenting data in a dimensional model, my personal view is that a data warehouse of course also carries this out, but in addition delivers the following benefits to name a few:

  • Data Cleansing - The power of SQL in the backend and also features like Fuzzy Matching in SSIS mean that you can address the huge problem of data quality. The bonus here is that all data quality issues could be logged, meaning that you can go back to the owners of source systems and let them know that their data is in a mess.
  • Control - If you do encounter data quality issues, you may decide that they are actually too bad to load into the data warehouse and present to your users.
  • Transparency - if you want, you can choose to expose the data quality level of your facts to users, via the Kimball-style audit dimension. E.g. we may have noticed that within our retail data, one store has abnormally low sales. Therefore, we can assign this fact a lower data quality score than the other sales by store for the day, which lets the users know that the data quality steward is aware of the issue, and is looking into it.
  • Slowly Changing Dimensions - A lot of source systems don't behave. E.g. if we're tracking a customer's Marital Status of 'single' or 'married', you often find that source systems only record the latest change to a given field. Without addressing slowly changing dimension issues in a data warehouse, we may never know that Customer X is now single, but once was married.
  • Conformed Dimensions - When we have multiple data sources for a single dimension, the goal is to ensure that each dimension available to end users represents a single, consistent view of that dimension. Due to the often complex matching and de-duplication that's required, it's difficult to see how this would be possible without a data warehouse.
  • Late Arriving Dimension Rows - Going back to the 'Marital Status' attribute, we can get the situation where a source system only updates a particular field a long time after the event actually occurred. E.g, if we think that a customer called John Doe is single, but then today we're told he's married, but has actually been married for the last two years. If Marital Status is an important piece of management information to the business, then we need to update all data associated with John Doe over the last two years, in order to reflect his correct Marital Status over the last two years.
  • Complex Transformations - I recently encountered a source system where one of the columns in a SQL table held data in a varying length format of ‘#Product Type#Currency#Value, #Product Type#Currency#Value’. So what you basically have here is potentially say 25 records held in one row in a SQL table, whereas another table might have just 7 records within one row. Rows could also be badly formed! We ended up using a bit of C# within SSIS to split this data out so that the sub-elements could be analysed correctly.

I could go on, but I think as a few simple examples the above will do.

Whilst I'm an advocate of data warehousing, I also think that the self service tools, such as PowerPivot, can be complementary to a data warehouse. You'll often find, for example, that there's some data that's not in the data warehouse yet - after all some organisations have a lot of data, and it will take time before all the business analysis etc can be carried out to get this data into the warehouse. Another example is proto-typing, analysts can very quickly build a proto-type model, which can then be production-ised at a later stage. I think this video from Chris Webb is really worth watching - it covers these examples and a few more.

In summary there's always going to be a need for users to go digging for data and producing their own reports - the data warehouse can't always deliver everything immediately. PowerPivot is going to be great in this situation, but it will work best when it's used alongside cleaned and conformed data, which is exactly what a data warehouse provides.