Jeremy Kashel

Jeremy Kashel's Blog

Prototyping Analysis Services Cubes

After running a modelling workshop with end users, my next activity on a project is generally to produce a prototype model with some test data, in order to check with users that the planned star schema design will work. My tool of choice for this job tends to be Power Pivot, but I was presented with a slightly different solution when working with a client recently. This got me thinking it would be worth contrasting the two approaches….

Power Pivot

First of all, lets start off with Power Pivot. The goal here is to quickly produce a prototype to verify the star schema design. I tend to just request that a business user brings along an Excel workbook with samples of each dimension and facts pre-extracted in Excel, which can then be loaded into Power Pivot. I find that this saves time, although as an alternative of course Power Pivot could extract directly from SQL Server, Oracle etc if needed.

As an example, we can quickly walk through creating a very simple model to create a star schema:

  • I’ve copied the data that represents my single fact table and related dimensions into Excel. In this case it’s to model a simple Sales Order Transactions fact, therefore my Excel workbook has dimensions such a Product, Customer, Sales Territory etc, as we well as the data that represents the fact table:

image

  • After completing and closing the workbook, we now create a new workbook and open the Power Pivot window.
  • Within the Power Pivot window, the next step is to import the data from the Excel workbook created in step 1. This can be done by choosing From Other Sources->Excel workbook and then browsing to the file.
  • Power Pivot will now present us with all of the sheets that it finds in the workbook, so we select the sheets that represent the dimensions and facts:

image

  • The next step is to setup relationships between the various tables. Once this is done, we have a simple prototype data model, as shown below:

image

After a bit of tidying up (e.g. creating some hierarchies and removing unwanted columns), we can now connect to a Pivot Table in Excel that will help verify the star schema:

image

They key point here is that we have rapidly built a prototype, without the need for any ETL. This means we can quickly cover the design issues with the users, but this time with their actual data, rather than just on a whiteboard.

Analysis Services

The alternative approach that was presented to me recently is to develop the Analysis Services cube/Tabular model before carrying out the ETL. Essentially this means that the users connect to cube, as a production system, unaware that under the hood the ETL is not yet complete.

How is this achieved? Essentially by putting logic in the SQL view layer. Its a well known Analysis Services best practice to bind your SSAS objects (e.g. dimensions, measure groups) to SQL Server views, rather than using objects within the DSV. Rather than each view pulling its data from a complete dimension or fact table, instead each view would pull its data from a staging area or a copy of the source database. The idea being that, over time, the contents of each view would be updated to point to the actual dimensions and facts, once they are built.

Therefore a normal view (in this case for a product dimension) that feeds Analysis Services might look like:

SELECT          ProductKey,
                ProductName,
                ProductCategory,
                ProductSubCategory
FROM            Dim.Product

With the ‘No ETL’ approach, the view looks very different, as any transformation and cleaning will have to be carried out from the view:

SELECT          ROW_NUMBER() OVER(ORDER BY PROD.ProductID) AS ProductKey,
                PROD.Name AS ProductName,
                CAST(ISNULL(SUB.Name, 'Unknown') AS VARCHAR(100)) AS ProductSubCategory,
                CAST(ISNULL(CAT.Name, 'Unknown') AS VARCHAR(100)) AS ProductCategory
FROM            [$(AdventureWorks)].Production.Product PROD
LEFT JOIN       [$(AdventureWorks)].Production.ProductSubcategory SUB ON SUB.ProductSubcategoryID = PROD.ProductSubcategoryID
LEFT JOIN       [$(AdventureWorks)].Production.ProductCategory CAT ON CAT.ProductCategoryID = SUB.ProductCategoryID

This therefore incurs some technical debt, as the cube gets built before the ETL or even the physical dimension table. But the idea is that you can get the users using the cube, on production even, then gradually replace the views with proper ETL and dimension tables once you have some more feedback.

Conclusion

The Power Pivot method is tried and tested for me many times – I find that its very quick to get up and running. The Analysis Services approach is not quite as quick to get up and running, but offers the benefit that the cube can continually evolve in a fairly agile manner. Its early days for me using this approach, but the a disadvantage is that the views can get quite complex, impacting performance depending on the data volumes.

Both methods probably have their place. If I wanted to validate my design, I think I would continue to produce a quick Power Pivot model. On the other hand, if quick delivery to production is a must, and the complexity/data volumes are not an issue, then I would use the view approach.

Tabular Analysis Services Perspectives and Measure Groups

I’ve always liked how Excel and other client tools deal with measure groups in a multi dimensional SSAS cube. Once you connect to the cube, you get a drop down in the Pivot Table fields pane that lets you choose which measure group you want. From there, you get a filtered list of measures and dimensions. For example, if I connect to the Adventure Works 2012 sample cube, I get the following:

image

By picking Internet Orders, I will see only the measures dimensions that relate to Internet Orders, which provides me with a good way of navigating a large cube.

Tabular

Unfortunately though, if I connect to a Tabular model (e.g. the sample Adventure Works 2012 model that I got from codeplex), then if I click the same drop down, then I will see every table in the entire model, whether the table  is a dimension, a fact or something else. E.g. here for a tabular model than contains 3 fact tables, but 15 tables in total, then I see all 15 tables:

image

This isn’t ideal in my opinion, as I’d much rather that users had the ability to quickly jump to an area of interest, which measure groups achieve fairly well. As I have 3 fact tables, I would expect to see those 3 fact tables/measure groups in the drop down.

I did accept this as a quirk of tabular until I stumbled across something recently. If you connect to a tabular perspective, then Excel behaves exactly as Multi Dimensional does, i.e. it shows you only measure groups in the drop down. To illustrate I’ve added a new perspective to the model that contains all tables and I’ve called this perspective ‘Adventure Works’:

image

Now I can connect to this new perspective via Excel. Remember the perspective contains all my tables, so it shouldn’t be any different than connecting the model itself:

image

The perspective does give a different result though – now only the actual measure groups are displayed to me in the drop down, which is much more user friendly:

image

Summary

This is completely different to the way that perspectives work in multi dimensional. When making MD cubes, I wouldn’t always need perspectives, just because the measure groups provided a ‘natural’ way of the users picking the subset of the cube that was of interest to them. Now with tabular, it seems that using a perspective will actually improve the user experience.

Excel Fuzzy Lookup Add-In and SSIS

Although it’s been out for a while, I thought I’d mention, for those who don’t know, that you can download a Fuzzy Lookup Add-In for Microsoft Excel.

The Add-In allows you to perform a fuzzy lookup for Excel data against another Excel dataset. I won’t go into too much detail, as this has already been covered here, but essentially the Add-In requires you to setup two datasets, namely a “left” and “right” table. Once you carry out some configuration/mappings, then the fuzzy match will return all of the rows in your left table, with the appropriate values looked-up from your right table. As an example I’ve chosen to do my lookup based on reseller names and an address:

image

Part of the reason for this post is that I think this add-in could be useful from a self service perspective for power users to assist in MDM/Matching/DW projects. There’s often a lot of trial and error in trying to get the right balance for matching rules, and, given that the Fuzzy Add-in gives similar results to SSIS, it may well be a good place to start with matching an dataset as part of an MDM project, for example. Alternatively, it could be used to carry out pre-matching/clean-up on data before an MDM project starts.

SSIS

I also wanted to contrast this Add-in to the Fuzzy Lookup transformation that you get in SSIS. Both the Fuzzy Lookup Add-In and the SSIS variant do the same thing – they both match one dataset against another in a fuzzy manner, returning you 1-n matches per row. In fact, according to this paper, they use the same Microsoft Fuzzy algorithm.

With the default settings, at least with my small test datasets, I get similar results in Excel and SSIS, albeit with different similarity numbers. These are the key differences that I’ve noticed:

  • Excel returns only overall row similarity, whereas SSIS returns similarity per column also.
  • Excel exposes an additional set of tuning options. As an example, you can set the Containment Bias, whereby you decide on the penalty for tokens in the right record that are not in the left record. You can also edit a number of other settings in an XML file.
  • SSIS outputs both the confidence and similarity, whereas Excel outputs only the similarity.
  • Excel will output the similarity XML per row, whereby you can see how its split up a string into differently weighted tokens.
  • As you might expect, SSIS out performs Excel.

As I quick test, I started working with Adventure Works resellers and seeing if I could get the same match results from both products. To highlight this, I’ve filtered this down to just 1 reseller (“Bikes World Comp. Store”) that I’ve made up and seeing what are the top 5 resellers both products return. SSIS, with its default settings, gives the following when matching just on the reseller name:

image

Excel, using the Fuzzy Add-In, gives the following with its default settings:

image

So overall some very good matches out of the box from the Excel Add-In, certainly comparable to SSIS. What’s interesting about this result set is that “Bike World” is ranked higher in Excel than “World Bike Discount Store”. We can align the behaviour of the two products, in this case anyway, by changing the Containment Bias from a default of 0.8 to 0. This will assign a full penalty to tokens in the right record that are not in the left record. So, as the left record is “Bikes World Comp. Store”, and the right record is “Bike World”, we are by default being lenient on the fact that its missing tokens. Increasing the penalty for missing tokens gives the following, with the top record now matching SSIS:

image

Of course, you would normally match on more than just the name, plus what’s worked above here for me might not work in other scenarios. But the bottom line is that the Excel Fuzzy Lookup Add-In gives very good matches out-of-the-box, is easy to use for the Excel user and therefore could well be of some use as part of data matching projects.

Data Quality Services - Kimball ETL Screening

Although it’s arguably not the most popular in the Kimball series, The Data Warehouse ETL Toolkit contains a huge amount of vital information that’s focussed on the ETL side of data warehousing. In particular I’ve found the data quality screens design to be useful, so I thought it might be worth exploring if this concept can be applied to Data Quality Services.

What is a Kimball Data Quality Screen?

A screen is simply a check that is made against data that comes into the ETL pipeline and is used to measure data quality. Physically, the Kimball screen design is typically implemented as a table called Screen, which contains, as rows, the actual checks to perform against staged data. In addition, the screen table will contain metadata to aid data quality measurement, such as screen severity, category, source system and possibly exception actions.

Each time that a screen yields some results (i.e. data quality issues) then another table, Error Event Fact, is populated with the results. Therefore, by using the data generated by the screening process, The Error Event Fact becomes the place to look for an overall measurement of data quality. The schema for this design from the Kimball book (with a few modifications) is as follows:

image

The check performed by each screen is typically a snippet of SQL that checks for a particular condition, e.g. Is the customer’s postcode missing, or is the tax amount within permitted ranges?

Can Data Quality Services assist with ETL screening?

DQS is capable of auditing and checking data, as each domain within a knowledge base can have its own domain rules. In order to see how this might work as Kimball screens, I’ve set up a simple knowledge base around purchase orders. Here’s the domain management window from the DQS client:

image

Within the Delivery Charge domain, I’ve setup a domain rule called ‘Delivery Charge Negative’ in order to ensure that the delivery charge cannot be negative:

image

I’ve done something similar with the PO Date domain, setting up a rule to state that the PO date must be greater than 01/01/2012 in this case. Data run against the PO knowledge base will now be enforced against these two rules, so these are effectively my two Kimball style screens. To recap, DQS cleaning can be run either from the Data Quality Client or from SSIS.

Populating the Error Event Fact with SSIS

I’m going to be running my DQS domain rules via SSIS, as my end goal is to populate the Error Event Fact. Remember, if any DQ issues occur, I want the Error Event Fact to know about them.

In this scenario, the data from the source system has already been staged to a SQL table, which gives the following results in the SSIS source adaptor preview:

image

This data is passed to the DQS Cleansing SSIS task, with the data mapped to the DQS Knowledge Base in the following way:

image

Once the package is run, the data will be passed to DQS, which will run the data against its knowledge base, including running the domain rules. The data that is returned from DQS is as shown below. Note, PO55 row has failed both rules, whereas the other two rows have failed one row each.

image

Now we know exactly which rows have failed each data quality screen. The only trouble is that the data is in the wrong format for our error event fact. We want a total of 4 rows in the error event fact has PO Id 5 has failed 2 screens, whereas the other two rows have failed 1 screen. Therefore, we carry out a unpivot before inserting into the Error Event Fact. For this blog post my ScreenId is hard coded, but in reality you would have to lookup an appropriate ID for the DQS screen. The full SSIS package is as follows:

image

Data Quality Reporting

With the Error Event Fact populated, a variety of reports can be produced. In this case I’ve made a very simple Analysis Services cube and made a quick report in Excel 2013, using slicers, sparklines and also the new Timeline feature:

image

From this report I can see the screen that’s failed the most rows, see a trend of data quality and filter on specific issues.

Summary

With a bit of work, DQS is capable of producing the rows needed for the Kimball Error Event Fact, but how good is it? I see the advantages of using DQS as follows:

  • Front end user interface in order to allow data stewards to set up their own rules to screen the data – much better than holding rules in SSIS or manually in a table;
  • DQS becomes a one stop shop for all data quality, as, in addition to holding the screens as domain rules, DQS will clean your data for you, according to the valid/invalid domain values that you set up in the knowledge base.

There are a few disadvantages to it also:

  • The domain rules are not as flexible as pure SQL or SSIS, meaning that you may not be able to express all the screening rules that you had in mind;
  • DQS can't maintain all the metadata that you need to implement Kimball screening, so you’ll need to supplement it with some SQL tables to hold things like screen severity, screen type etc;
  • Even though the performance has been improved in CUI, it’s still not suitable for large amounts of data. Here’s an overview of how long you can expect the SSIS cleansing to take.

So my view is that if you’re already using DQS and data stewards like it using it to manage data quality, and your data volumes are not too big, then implement the classic Kimball screening design using DQS. But there are more cons than pros unfortunately, so if you’ve not already made an investment in DQS, and you have the time to invest in your own Kimball style screening solution, then I think you can come up with something that’s a bit more flexible and scalable.

Matching with Master Data Services, DQS and SSIS

If you have an MDM requirement to consolidate multiple sources of Master Data together into a single golden record, then you have a few different ways to achieve this on the Microsoft platform. This blog post gives an overview of the different ways that various matching methods can be used in conjunction with Master Data Services for consolidation in Master Data Management, outlining the pros and cons of each option. In summary, the options are:

  • Data Quality Services (DQS)
  • SQL Server Integration Services (SSIS)
  • Plus Master Data Services itself has a few (not that well known) matching capabilities

Data Quality Services

A key requirement in all but the simplest MDM solutions is that the matching/consolidation must be carried out in an automated manner, with a Data Steward alerted to take action if needed (e.g. the matching engine incorrectly matches two customers, the Data Steward opens MDS and corrects this). This scenario would be hard to achieve with DQS, as it’s possible to automate the DQS cleaning, but it’s not possible to automate the DQS matching. This is something that I’ve raised connect issues about, here and here.

If your data to be matched into MDS is not coming in on a daily basis, and you therefore need to do more ad-hoc matching to produce your golden Master Data records, then DQS could be for you. The MDS Excel Add-in will give you the capability of matching data that you import into Excel with MDS members, harnessing the power of DQS. An overview of how this works is available here.

Integration Services

SSIS has been around for a long time now and, as many of you will know, contains fuzzy matching components. With the right MDS model design, its possible to carry out a batch based fuzzy match between your master records and end up with a mapping between your the records that exist in your source systems and your MDS golden records. The rough conceptual steps to do this are:

  1. Load the new and changed records from the source systems into a staging area.
  2. Clean and standardise your data. This is actually something that DQS cleaning can help with.
  3. Query your staging area to get the new records that you want to insert/update into Master Data Services.
  4. Now the question arises, do we have an exact or close match for these records already in MDS? While the exact matches are easy to deal with, use the SSIS Fuzzy Lookup component to establish whether there are any approximate matches.
  5. Link the source records to master records (if the match is high enough) using MDS Domain Attributes.
  6. Carry out appropriate inserts and updates into MDS using the MDS staging tables.
  7. Ensure that a Data Steward is alerted in some way if necessary (e.g. if the match threshold is below x% confidence). This can be done with Email or MDS Notifications, for example.

This process can run in batch overnight, with the Data Steward approving or rejecting the matches that SSIS has carried out the following morning. Whilst the above over-simplifies the process and technical work required, hopefully the process makes sense at a high level.

Master Data Services

Although you cannot feed MDS your source data and get it to automatically carry out matching for you, it does actually contain the raw components in order to do this. By this I mean the MDS database contains an assembly called Microsoft.MasterDataServices.DataQuality, which gives you a number of fuzzy matching T-SQL functions. These are called from the MDS front end when you carry out some filtering when viewing entities. Using them just for filtering in the front end really isn’t using the functions to their true capability, but thankfully you can use these functions in your own code.

You can use the MDS T-SQL functions in a similar way to the conceptual SSIS method outlined above, in order to match and eliminate duplicates. In addition, the MDS web API can also be used to carry out a fuzzy match, as mentioned in this forum post. Retrieving match candidates using a web service may be an attractive option if you’re trying to do real time MDM.

Conclusion

Essentially until it’s possible to automate DQS matching, we have a choice between SSIS and the MDS matching functions. The following e-book gives a very detailed overview of the matches that both are capable of doing. The MDS T-SQL functions are more flexible than the SSIS fuzzy components as you can choose what fuzzy algorithm you want to use, but the SSIS components let you choose between Fuzzy Grouping and Fuzzy Lookup out of the box, without having to write SQL. Although I tend find that both give very good matching results, the MDS T-SQL functions produce slightly better matches in my experience, plus give you the option of trying different algorithms to suit different scenarios.

It’s also worth mentioning that Profisee Maestro (full disclosure, we are a partner) integrates with MDS, offering its own matching algorithms. Maestro also has a front end with the ability to assist with survivorship and approval, which I think is a useful addition to MDS. Speaking of survivorship and approval, there are two options in MDS out-of-the box. The new Master Data Manager web front-end is much improved, but potentially the MDS Excel Add-In allows a bit more flexibility for survivorship carried out by a Data Steward, just due to its natural ability for filling/copying/pasting/filtering.

So overall, due to the various improvements, Master Data Services is now capable of tackling more complex MDM scenarios than in the 2008 R2 version.

Analysis Services Member Properties and Excel

When working with users who are browsing a cube using Excel 2010, a common requirement that I find is that the users want to create tabular or grid based reports. E.g. “I have a list of currencies, I want to display the currency code in an adjacent column”. Or “Here are my account codes, I want to also see the account type”.

The common way that I see users attempt this is to stack attribute hierarchies on top of one another. For example, using Adventure Works, to get the currency names and codes together, I can stack the two attribute hierarchies:

image

This produces the following report:

image

Not exactly what we want – ideally we want the currency codes in separate columns, on the same line as the name.

An Alternative Approach

One well known way to achieve this layout is to right click on the relevant member and display its member properties, whereas another is to set the layout to Tabular Form. I tend to prefer the member properties route as I think it produces slightly better reports, so I’ll be covering a simple member properties design tip shortly. First of all, here’s the sort of report that I’m able to create using member properties:

image

I’m able to do this because the Adventure Works Source Currency Code Attribute Hierarchy has the Source Currency (names) as a member property. Therefore, I get the following option when I right click on the currency code:

image

Displaying member properties will only work if the attribute relationships are set up correctly. The reason that I’m able to see the member property ‘Source Currency’ is because of the following attribute relationship:

image

However, if we just display the currency name, using the Source Currency attribute hierarchy and then try and right click to see the currency code, we’ll unfortunately see nothing in the ‘Show Properties in Report’ submenu. This is because the currency name is a property of the code, rather than the other way around. I’d argue that for a user it’s intuitive to get the name from the code, or vice versa, as they sit at the same level. With a few simple changes we can achieve this, essentially by adding an extra attribute to the dimension…

Within the dimension editor in BIDS, the first step is to add the new attribute. In this case it should be based on the Currency Code column that is, of course, already in use. That doesn’t matter, we want to use it twice. Therefore, I’ve created a new attribute called Currency Code and have set AttributeHierarchyEnabled = False, as there’s no need for anyone to browse this attribute. Now onto the all important attribute relationships. The attribute hierarchy that contains the names is called Source Currency. As it’s not possible to get the currency codes by right clicking on this attribute hierarchy, Source Currency is the attribute needs to be related to the new Currency Code attribute:

image

Finally, the end result for the user is that they now get the ability to easily get access to the currency codes by right clicking on the names:

image

PowerPivot Settings for Power View

I’ve been using both PowerPivot and Power View quite a bit recently and, in addition to the post I did a while back, have made a few further observations on getting the two to play nicely together.

Building an Example PowerPivot Model

For this post I’ve created a very very simple PowerPivot model, based on the Adventure Works Internet Sales Fact table and its related tables. What this means is that after the model is built, I can build a variety of Excel reports that summarise the measures in the FactInternetSales Adventure Works fact table, such as Sales Amount, Tax Amount or Order Quantity. In the example below I’ve built a very basic report showing Order Quantity by year:

Excel report

On to Power View

Having already published the workbook to SharePoint, my task now is to create a Power View report, which can be done via the PowerPivot Gallery. If wanted to try and create a visual version of the above report in Power View (E.g. a graph), my steps would be to do first of all drag Calendar Year into the report, then Order Quantity. However, when doing this recently, the result wasn’t quite what I expected:

Table Without Sum

Order Quantity has not been summed at all, which is the opposite behaviour to that of PowerPivot. Consequently I can’t create a graph visualisation, because Power View thinks I don’t have any measures in my report:

Greyed Out Visualisations

Yet if we contrast this to measures such as Internet Sales Amount or Tax Amount, they all work fine. Upon looking at the field list for Internet Sales, we will see that Order Quantity is missing the sum symbol, as Power View has not interpreted it as a measure:

image

We can get around this in Power View with a bit of dragging and dropping, but we may as well look why it’s happened in the first place. The reason for this is the way in which the PowerPivot import wizard interprets SQL data types. The OrderQuantity column in the FactInternetSales SQL table has a data type of Smallint, which will not be summarised by default in Power View. Decimal and Money types are fine, hence why Tax Amount and Sales Amount above look ok.

Rather than changing the data types, we can force Power View to interpret the column as a measure by going to the PowerPivot advanced tab and clicking on Summarize By. If the Summarize By value is changed from Default to Sum, then Power View will interpret the column as a measure:

image

If we republish the model and then go back to Power View, we will see quite a difference with the same actions as before. First of all the field list is looking correct:

Summarize By After

Secondly, only dragging Calendar Year and Order Quantity will now give the following correct results:

Table With Sum After

Now that we have a measure, it means that we can now change the table into a visualisation of our choice:

Visualisations Enabled

Power View is the kind of product that power users are hopefully going to find intuitive, but we can make things even easier for users by making small changes like this.

Master Data Services Excel Add-in

Master Data Services in SQL Server Denali now includes an Excel add-in that allows MDS users to add and update data that exists within Master Data Services. For those of you that haven’t had a chance to download the latest CTP, this post gives an overview of what’s possible in the Excel Add-in.

Excel Add-in Overview

Once you install the Excel add-in (available here for download) you will see an additional toolbar in the ribbon, as shown below:

image

At a high level, the following functionality is available within the MDS add-in:

  • Retrieve members from a master data entity
  • Add or Update members (including annotations) and publish the changes back to MDS
  • Create entities in MDS if you have sufficient permission
  • Run the business rules

I don’t want this post to go on for ever, so today I’m going to focus on retrieving and updating members.

Retrieving MDS Entity Members

The Excel Add-in comes with a sidebar called the Explorer that can be used to connect to an MDS model and retrieve data from an entity within the model. What I quite like is that there is a filter button that allows a user to specify how to filter the data before loading it. In the example below, I’ve connected to the sample customer model, and have filtered the customer entity to only show customers of type 1 and customers from Seattle:

image

On the same filtering window its possible to choose the attributes to display. Therefore in the above example, by clicking the ‘Load Data’ button, a filtered list of customer members will be shown:

image

Editing Members

Once the members have been retrieved, editing is just a simple case of typing into the field that you want to change. In the example below I’ve chosen to change the names of two of the customers, which has caused the cells to be highlighted, informing me of the changes that I’ve made:

image

Domain-based attributes are also picked up nicely by Excel. Not that I imagine it would make too much sense, but it’s possible to change the Sales District North Western US (WUSSL) to Central US (CEUS), for example. Excel handles this by rendering a drop down for this domain attribute:

image

As it is in Master Data Manager, within Excel it’s also possible to display a domain attribute’s name.

Regardless of the type of attribute that gets changed, the changes will just remain in the Excel sheet until I click the publish button:

image

Clicking the the publish button will result in being prompted to make an optional annotation for each of the changes:

image

The annotations, as well as the member updates, will be committed to the Master Data Services database. The end result is that the transactions can be viewed as per 2008 R2 in Master Data Manager, or by right clicking in Excel and choosing ‘View Transactions’.

Summary

The Excel add-in doesn’t contain all the functionality available within Master Data Manager, but provides an interface that clearly Excel-based data stewards will be very comfortable with. It’s also much easier to do bulk updates in the Excel front-end, when compared to the web front-end.

That’s about if for now. There’s more to the Excel add-in, which I’m aiming to cover at SQLBits 9, plus Master Data Manager has been given a complete overhaul in Denali – hopefully I’ll find the time to cover that soon…

Adding a Calculated Field to an Excel Pivot Table With Analysis Services

One question that I get asked from time to time is how to add a calculated field to an Excel Pivot Table that’s connected to an Analysis Services cube. I’ve been aware of a workaround to do this for a while, but a quick Bing :-) revealed that the common answer given to this question is that it’s not possible.

It definitely is possible, it’s just not that obvious and is perhaps a little limited. Here’s how it can be done in Excel 2010:

Sample Cube

First of all, connect to a cube and produce a pivot table report. I’ve connected to the 2008 R2 version of the Adventure Works DW cube and have produced the following report, using the Internet Sales Amount measure:

image

Adding a Calculated Field

If you now open the Field List, you will see that Internet Sales Amount is selected of course. Here’s the trick – you can select the measure twice by dragging it into the Values area, which will give the following:

image

Now click the drop down next to Internet Sales Amount2, which will display the following window:

image

You can change the name of the field – I’ve called mine Percent Share. In the Show Values as drop-down, I’ve picked % of Grand Total, so that I can see the share of the sales in each country across all years. So now the report is as follows:

image

Summary

A better approach is to put as many calculations in the cube as possible, but as some calculations requested by users are report-specific, this feature of Pivot Tables allows users to produce the reports that they want.

The calculations that you can carry out are only those in the ‘Show values as’ drop-down. There’s quite a few in there, especially in Excel 2010, as mentioned here. If that’s not enough, then an alternative is to use the OLAP Pivot Table Extensions, which is available to download on Codeplex.

Allocations in PowerPivot Using DAX

Although I didn't mention it, the inspiration for my last post was the excellent MDX Solutions 2nd Edition. Whilst flicking through the book a while back, I though it would be interesting to see how DAX and PowerPivot could handle the so called 'Common Calculations and Selections in MDX'.

This post continues that theme, focusing on one calculation in particular that's in the aforementioned MDX book, namely 'Unweighted Allocations down the Hierarchy'. Essentially this business problem is dealing with allocating data that is entered at a higher level (e.g. Quarter) down to a lower level (e.g. Day). It varies from business to business what the users actually want to see at the day level in this situation. Some expect to see the Quarter amount, some want to see a blank value and then some users want the amount in the quarters allocated down to the day level.

If the expectation is that data should be allocated, then one way of doing the allocation is to use a ratio. In the MDX book, this is achieved by the following MDX:

1.0 / 
Descendants(
            Ancestor(
                    [Date].[Calendar].CurrentMember,
                    [Date].[Calendar].[Calendar Quarter]
                    ),
            [Date].[Calendar].CurrentMember.Level,
            SELF
            ).Count

If we put this into a query and run it against the Calendar hierarchy in AdventureWorksDW we get the following results:

image

Therefore our quarter-entered value can simply be multiplied by this ratio to get the correct value at each level in the Calendar hierarchy.

PowerPivot

So the challenge is how to be able to do this in PowerPivot. I started off by looking for some data in AdventureWorksDW that would fit the bill. A good candidate seems to be the FactSalesQuota table, which I imported into PowerPivot along with its related dimensions:

image

Once in the pivot table, I wanted to be able to calculate the correct ratio at the daily and monthly levels. For the daily level this is 1 / 90, as there are 90 days in the Q1 that year, and for the month level it is 1/3 as there are 3 months in the quarter.

Given that there's no MDX style Ancestor() function in DAX, I ended up having to have a different calculation at different levels. Something that I learnt from this forum post is that you need to use IF() in order to isolate a calculation at a different level. Therefore the DAX that I came up with was:

=IF( COUNTROWS(VALUES('DimDate'[FullDateAlternateKey])) = 1,

    1 / CALCULATE( COUNTROWS('DimDate'), ALL('DimDate'[FullDateAlternateKey], 'DimDate'[EnglishMonthName]) ),
    1 / CALCULATE( COUNTROWS( DISTINCT( 'DimDate'[EnglishMonthName]) ), ALL('DimDate'[EnglishMonthName]) )
   )

The COUNTROWS(VALUES('DimDate'[FullDateAlternateKey])) = 1 allows me to check that we're at the day level, and if we are at the day level, then count the number of days in the quarter. Otherwise, if we're at the month level we calculate the number of months in the current quarter. When this DAX is put into a new measure we get the following results:

image

This produces the same results as the MDX statement. The next step was then to apply this ratio to the Sales Amount Quota for the current quarter. This was achieved using the following DAX:

=IF(
      COUNTROWS( VALUES('DimDate'[FullDateAlternateKey]) ) < 90,
      CALCULATE( Sum('FactSalesQuota'[SalesAmountQuota]), 
      All('DimDate'[FullDateAlternateKey], 'DimDate'[EnglishMonthName]) ) * 'FactSalesQuota'[Ratio],
      Sum('FactSalesQuota'[SalesAmountQuota])
   )

The < 90 at the start of the statement allows me to check if we're at the day level or the month level. If we're at either of these levels, then we want to pick up the Quarter-level SalesAmountQuota and multiply by the ratio. Otherwise, at the Quarter and Year levels, we just apply the original SalesAmountQuota value. This produces the following results:

image

Conclusion

The use of the MDX Scope statement would have been nice to pinpoint exactly where I wanted the calculation to occur, but then again, the use of IF() keeps it quite simple, which is better for an Excel power user. Whilst MDX is very very powerful, can you imagine a business user getting to grips with MDX statements such as Scope or Freeze?

The bottom line is that the DAX calculation produces exactly what I want - the new AllocatedSalesQuota measure is correct at all levels in the Calendar hierarchy. Even with an IF() statement, like everything else in PowerPivot, the calculation performs very quickly.