Adatis

Adatis BI Blogs

New Features in SQL Server 2016 – Part 2: Analysis Services

This blog solely focuses on the new Analysis Services features of SQL Server 2016 CTP2.  For anyone who may missed it – click here to view my opening blog on the Database Engine. Although there have not been any major Analysis Services (SSAS) enhancements in CTP2, it is pleasing Microsoft are still looking to improve this part of the BI Stack.  The majority of them seem to be geared towards Tabular and DAX, although there are plans to release further Multidimensional functionality in CTP2 and beyond. There are five key enhancements for SSAS: 1.      Tabular Model Partitions (Tabular). 2.      Many 2 Many (Tabular). 3.      Analysis Services PowerPivot mode. 4.      New DAX Functions (Tabular). 5.      Enable/disable attribute hierarchies. Tabular Model Partitions SQL Server SSAS 2016 CTP2 includes new parallel processing functionality for tables with two or more partitions, increasing processing performance. There are no configuration settings for this.  More information on this feature is displayed below: 1.      IN SSAS Tabular, partitioning big data marts will help drastically when processing new data.  Instead of having to process all data, you can just partition the relevant ones. 2.      In order to create, manage, and process partitions in SQL Server Management Studio, you must have the appropriate Analysis Services permissions defined in a security role. 3.      Parallel processing is also available in SSAS.  This happens automatically, when processing a table with more than one partitions.  Although you can choose to independently process a partition if required. Many to Many Dimensions (Tabular) There isn’t too much to show or say here yet apart from the fact that this feature is FINALLY HERE!  Gone are the complications of workarounds (thanks to Marco Russo and Alberto Ferrari), we can now just use the Many to Many functionality in Tabular.  Multidimensional already has this feature, which is one of the reasons why developers have yet to buy in to Tabular.  Unfortunately, Microsoft have yet to reveal much detail on to how this feature will work.  I just hope it performs as well as the current workarounds. Analysis Services PowerPivot mode To install PowerPivot for SharePoint is now a lot easier.  If you already know how to configure SharePoint 2013, this will help you greatly.  Taken from the MSDN Microsoft website, you simply use the Install Wizard and do the following: 1.      Select ‘Analysis Services’ from the standard Instance Feature. 2.      Choose ‘Analysis Services server mode’ and configure administrators (screenshot below).   New DAX Functions There are many new DAX functions, which are all detailed on MSDN website – click here for further detail. I have picked out 5 functions that will be highly beneficial when creating an SSAS Tabular cube.  I am looking at this form a very BI focused background so I would recommend looking at all of the new functions to see if they benefit your needs. 1.      TOPn a.      This is not actually a new function but has been updated. b.     Now allows TRUE/FALSE/ASC/DESC to specify sorting direction. 2.      CALENDAR a.      Returns a table with a single column named “Date” that contains a common set of dates. The range of dates is from the specified start date to the specified end date. b.      See Similar – CALENDARAUTO 3.      MEDIAN a.      Very simple – returns the median numbers in a column. b.      See Similar – MEDIANX (uses an expression). 4.      NATURALINNERJOIN a.      Inner join of a table with another table. The tables are joined on common columns in the two tables. If the two tables have no common column names, an error is returned. b.      See similar – NATURALLEFTOUTERJOIN 5.      SUMMARIZECOLUMNS a.      Enables you to group data together and return a summary table. Enable/Disable Attribute Hierarchies Again, this functionality is tailored towards SSAS Tabular.  There is a new setting that ties back to the Tabular mode metadata 1.      ColumnUsage property a.      DAXUsage                                                     i.     Attribute hierarchies can only be used in measures. b.     UnrestrictedUsage                                                     i.     Can be set in the XMLA (in SSDT) or by using an ALTER statement in the Tabular model.                                                    ii.     An example taken from MSDN:        <Alter>   <ObjectDefinition...>     <Database>       <Dimensions>         <Dimension>           <Attributes>             <Attribute>                <ddl500:ColumnUsage value=”ddl500_500”>                DAXUsage | UnrestrictedUsage                </ddl500:ColumnUsage>   Multidimensional Features The full SQL Server 2016 release will include some key enhancement to Multidimensional SSAS.  These include: 1.      Netezza as a Data Source (Netezza Data Warehouse | IBM - NDM Technologies). 2.      General performance improvements. 3.      Unnatural hierarchies. 4.      Distinct counts. 5.      DBCC support. Checks the logical and physical integrity of objects in the specified database. 6.      Expose on-premises multidimensional cubes in the cloud with Power BI. I will provide further information on the improvements, as and when Microsoft announce them. References For more information on all of the new SSAS SQL Server 2016 features, the below resources/blogs are highly recommended. ·        Official Microsoft Page - https://msdn.microsoft.com/en-us/library/bb522628(v=sql.130).aspx ·        Brew your own Business Intelligence Blog – http://byobi.com/blog/2015/05/ssas-related-enhancements-in-sql-server-2016/ ·        Jorg Klein SQL Server Blog - http://sqlblog.com/blogs/jorg_klein/archive/2015/05/22/bi-on-your-terms-with-sql-server-2016.aspx

Prototyping Analysis Services Cubes

After running a modelling workshop with end users, my next activity on a project is generally to produce a prototype model with some test data, in order to check with users that the planned star schema design will work. My tool of choice for this job tends to be Power Pivot, but I was presented with a slightly different solution when working with a client recently. This got me thinking it would be worth contrasting the two approaches…. Power Pivot First of all, lets start off with Power Pivot. The goal here is to quickly produce a prototype to verify the star schema design. I tend to just request that a business user brings along an Excel workbook with samples of each dimension and facts pre-extracted in Excel, which can then be loaded into Power Pivot. I find that this saves time, although as an alternative of course Power Pivot could extract directly from SQL Server, Oracle etc if needed. As an example, we can quickly walk through creating a very simple model to create a star schema: I’ve copied the data that represents my single fact table and related dimensions into Excel. In this case it’s to model a simple Sales Order Transactions fact, therefore my Excel workbook has dimensions such a Product, Customer, Sales Territory etc, as we well as the data that represents the fact table: After completing and closing the workbook, we now create a new workbook and open the Power Pivot window. Within the Power Pivot window, the next step is to import the data from the Excel workbook created in step 1. This can be done by choosing From Other Sources->Excel workbook and then browsing to the file. Power Pivot will now present us with all of the sheets that it finds in the workbook, so we select the sheets that represent the dimensions and facts: The next step is to setup relationships between the various tables. Once this is done, we have a simple prototype data model, as shown below: After a bit of tidying up (e.g. creating some hierarchies and removing unwanted columns), we can now connect to a Pivot Table in Excel that will help verify the star schema: They key point here is that we have rapidly built a prototype, without the need for any ETL. This means we can quickly cover the design issues with the users, but this time with their actual data, rather than just on a whiteboard. Analysis Services The alternative approach that was presented to me recently is to develop the Analysis Services cube/Tabular model before carrying out the ETL. Essentially this means that the users connect to cube, as a production system, unaware that under the hood the ETL is not yet complete. How is this achieved? Essentially by putting logic in the SQL view layer. Its a well known Analysis Services best practice to bind your SSAS objects (e.g. dimensions, measure groups) to SQL Server views, rather than using objects within the DSV. Rather than each view pulling its data from a complete dimension or fact table, instead each view would pull its data from a staging area or a copy of the source database. The idea being that, over time, the contents of each view would be updated to point to the actual dimensions and facts, once they are built. Therefore a normal view (in this case for a product dimension) that feeds Analysis Services might look like: SELECT ProductKey, ProductName, ProductCategory, ProductSubCategory FROM Dim.Product With the ‘No ETL’ approach, the view looks very different, as any transformation and cleaning will have to be carried out from the view: SELECT ROW_NUMBER() OVER(ORDER BY PROD.ProductID) AS ProductKey, PROD.Name AS ProductName, CAST(ISNULL(SUB.Name, 'Unknown') AS VARCHAR(100)) AS ProductSubCategory, CAST(ISNULL(CAT.Name, 'Unknown') AS VARCHAR(100)) AS ProductCategory FROM [$(AdventureWorks)].Production.Product PROD LEFT JOIN [$(AdventureWorks)].Production.ProductSubcategory SUB ON SUB.ProductSubcategoryID = PROD.ProductSubcategoryID LEFT JOIN [$(AdventureWorks)].Production.ProductCategory CAT ON CAT.ProductCategoryID = SUB.ProductCategoryID This therefore incurs some technical debt, as the cube gets built before the ETL or even the physical dimension table. But the idea is that you can get the users using the cube, on production even, then gradually replace the views with proper ETL and dimension tables once you have some more feedback. Conclusion The Power Pivot method is tried and tested for me many times – I find that its very quick to get up and running. The Analysis Services approach is not quite as quick to get up and running, but offers the benefit that the cube can continually evolve in a fairly agile manner. Its early days for me using this approach, but the a disadvantage is that the views can get quite complex, impacting performance depending on the data volumes. Both methods probably have their place. If I wanted to validate my design, I think I would continue to produce a quick Power Pivot model. On the other hand, if quick delivery to production is a must, and the complexity/data volumes are not an issue, then I would use the view approach.

KPI’s in Power View and DAX Query for KPI’s

I was recently approached by a client where we have assisted in implementing a Microsoft Analysis Services tabular solution and a supporting management pack of SSRS reports.  They were asking about utilising some of the KPI’s defined in their tabular cube in the SSRS reports using DAX queries.  I was at a bit of a loss and most web help seemed to suggest it couldn’t be done: Blog comment from 2012 However, with a bit of time to investigate it appears Power View does now support Tabular KPI’s and as such a DAX query should be able to pull them out.  My KPI base measure is named “Highest Ranking” and I am able to view its value, status or goal in Power View.  A little work with every SQL developer’s best friend Profiler and I had what I was looking for.  It appears that when you create a KPI SSAS is defining some measures in the background which do some of the work of the MDX KPI functions. Therefore the following DAX query against my cycling Demo cube contains the expected results and could therefore be used in SSRS reporting. EVALUATE ADDCOLUMNS( VALUES('DimRider'[Name]) , "Highest_Ranking", 'FactRanking'[Highest Ranking], "Highest_Ranking_Goal", 'FactRanking'[_Highest Ranking Goal], "Highest_Ranking_Status", 'FactRanking'[_Highest Ranking Status] ) ORDER BY 'DimRider'[Name]   I also tested for the existence of other KPI functions such as Trend and Weight but these do not appear to be present.  It is also interesting that the use of a KPI over a measure does not change the measure name but just group it in the field list and as such there is no need for a value function. For more info on KPI’s in Power View there is additional documentation here.  I am currently unsure of the required versions for this functionality so if it is missing from your environment I would love to hear from you.

Power BI preview– Mobile Integration

I have had the invite through to view the Power BI preview today (Get yours here).  The first thing I wanted to do was get some reports up and start displaying them through my Surface and its Power BI App. The app comes pre-installed with samples but to link it up with your own site you need to hit browse from the app menu.  From there you can add a location.  What caught me out was where do i navigate to to get my Power BI preview reports? Well it is hidden away in the documentation here that you don’t try and navigate directly to your Power BI site as I did but you directly give the SharePoint site and it will pick up the Power BI app from there.  From there you can navigate to the reports you have uploaded to your preview site and favourite the ones you want to appear on the home screen.  The couple I have on there can be seen here: You can control which sheets get displayed by changing the browser view options of the workbook, as described here More to come on Power BI shortly..

Power View Default Field Set

Here's another quick tip when creating data models for use with Power View - Default Field Sets can be created in both Tabular SSAS and PowerPivot that give the user a shortcut for automatically adding multiple fields to a report. As an example, here's a screenshot of the Default Field Set in PowerPivot - this will tell Power view that the following fields should be automatically selected when the table is added to a Power View report: I'd seen this in the documentation a while back, but hadn't actually used it, nor could I see a way to select the table. In fact, although there's no checkbox, it's just a simple single click on the actual table name, as I’ve highlighted below: When you click on the table name Product above, you will now get the following fields automatically added to a table visualisation: So not exactly rocket science, but worth doing! Here's how to set up the Default Field set for PowerPivot and Tabular Analysis Services.

PowerPivot Settings for Power View

I’ve been using both PowerPivot and Power View quite a bit recently and, in addition to the post I did a while back, have made a few further observations on getting the two to play nicely together. Building an Example PowerPivot Model For this post I’ve created a very very simple PowerPivot model, based on the Adventure Works Internet Sales Fact table and its related tables. What this means is that after the model is built, I can build a variety of Excel reports that summarise the measures in the FactInternetSales Adventure Works fact table, such as Sales Amount, Tax Amount or Order Quantity. In the example below I’ve built a very basic report showing Order Quantity by year: On to Power View Having already published the workbook to SharePoint, my task now is to create a Power View report, which can be done via the PowerPivot Gallery. If wanted to try and create a visual version of the above report in Power View (E.g. a graph), my steps would be to do first of all drag Calendar Year into the report, then Order Quantity. However, when doing this recently, the result wasn’t quite what I expected: Order Quantity has not been summed at all, which is the opposite behaviour to that of PowerPivot. Consequently I can’t create a graph visualisation, because Power View thinks I don’t have any measures in my report: Yet if we contrast this to measures such as Internet Sales Amount or Tax Amount, they all work fine. Upon looking at the field list for Internet Sales, we will see that Order Quantity is missing the sum symbol, as Power View has not interpreted it as a measure: We can get around this in Power View with a bit of dragging and dropping, but we may as well look why it’s happened in the first place. The reason for this is the way in which the PowerPivot import wizard interprets SQL data types. The OrderQuantity column in the FactInternetSales SQL table has a data type of Smallint, which will not be summarised by default in Power View. Decimal and Money types are fine, hence why Tax Amount and Sales Amount above look ok. Rather than changing the data types, we can force Power View to interpret the column as a measure by going to the PowerPivot advanced tab and clicking on Summarize By. If the Summarize By value is changed from Default to Sum, then Power View will interpret the column as a measure: If we republish the model and then go back to Power View, we will see quite a difference with the same actions as before. First of all the field list is looking correct: Secondly, only dragging Calendar Year and Order Quantity will now give the following correct results: Now that we have a measure, it means that we can now change the table into a visualisation of our choice: Power View is the kind of product that power users are hopefully going to find intuitive, but we can make things even easier for users by making small changes like this.

Working with Images in Power View

Power View includes several ways to create visualisations with images, ranging from a simple table, to the scrollable tile visualisation. In this post I’m going to explain a bit about the tile visualisation, showing how it can be used in Power View, before showing a couple of tips for working with images in both PowerPivot and Analysis Services Tabular models. Power View Tiles The Power View Tiles visualisation allows you to display a scrollable series of images, which, when clicked on, will display data of your choosing. The report below is an example of this: What’s nice about the View above is that the names of the products are automatically displayed underneath the images. This is a sensible place to display the product names, as it means the area below can focus on whatever product level data you want, in this case Revenue by Month for the selected product. This works fine in the sample models (in my case the Tailspin Toys model that I think came with CTP3) but it wont work in vanilla PowerPivot / Tabular models unless you configure a few model settings. Trying to reproduce the report above without these settings will give the following tile visualisation, minus the product names: PowerPivot Advanced Settings To address this in PowerPivot, you first of all need to switch to Advanced Mode, which you can do by selecting this option from the File menu in PowerPivot: This gives you the following advanced tab: Clicking on the highlighted Table Behaviour button in the ribbon gives you the following Table Behaviour window: Here you can “change the default behaviour for different visualisation types and default grouping behaviour in client tools for this table”. The following properties should be set in order to get the tiles working correctly: Row Identifier – Specifies a column that only contains unique values, which allows the column to be used as an internal grouping key in client tools. Default Label – Specifies which column gives a row level display name, e.g. the product name in a Product table. This is key, as when a row level image is used, this property specifies which value to display alongside the image in tiles and other image-based visualisation. Default Image – Specifies which column contains images representing the row level data, e.g. pictures of products. The full details for all these properties can be found here. Tabular Models Tabular Models contain the same properties, which can be edited when you open a tabular project in the new SQL Server Data Tools. The image below shows how the properties have been configured for the column in my model called Image: Back to Power View If we now build a tile visualisation using the image column, we will see that we get the names of the products returned along with the image. It’s also worth noting that the Image and Product Name columns have now have an icon next to them, indicating that the field exhibits “row level” behaviour: Finally, for some examples of what’s new in Power View SQL 2012 RC0 take a look at this post on the Reporting Services Team Blog.

HP Business Decision Appliance–PowerPivot in a box!

Further to my blog post about SQL server appliances Microsoft and HP also offer the very exciting Business Decision Appliance.  This contains a preconfigured environment with: Microsoft Windows Server 2008 R2 Enterprise Edition Microsoft SQL Server 2008 R2 Enterprise Edition with PowerPivot integration for SharePoint Microsoft SharePoint 2010 Enterprise Edition Prerequisites for SharePoint and PowerPivot Appliance Administration Console Appliance-specific SharePoint Home Page Up to 80 Concurrent Users The aim of this appliance is to provide a safe and scalable environment for business users to quickly put a PowerPivot environment in place.  This is isolated from existing systems and can be implemented by a business department with very limited IT involvement.  Its isolated nature is important as many organisations don’t run SharePoint 2010 throughout their enterprise.  PowerPivot authors also need Excel 2010 but once reports are built and deployed they can be shared throughout the business through SharePoint using any web browser.  Anyone who has already tried to set up an integrated SharePoint and PowerPivot environment will know that the installation is not simple. This appliance takes that pain away with a one click installation from first start up that can have you up and running in under an hour. The key point is that business users love PowerPivot and its ability to quickly use their data to answer any question.  They can respond dynamically, collaborate and share insights throughout the organisation.  Importantly PowerPivot allows the creation of reports that look great and are very fast.  All this is done from within the familiar Excel interface and requiring little training to get started.  Any team of business analysts would likely have a massive boost in productivity from the installation of the BDA in their department.  IT can then monitor those reports that are heavily used and decide whether they need making more robust through transition to enterprise software such as Analysis Services.  PowerPivot use is at its easiest in an environment where there is a clean data warehouse but where the business aren't happy with the speed or responsiveness to change of their current front end.  In an environment with more disparate and dirty data the end users need to be more SQL and data modelling skilled, but the payback can be even greater. I see massive opportunity in this appliance for any team of data analysts to be able to deliver massive value to their business right now.  More Info on the BDA 

Allocations in PowerPivot Using DAX

Although I didn't mention it, the inspiration for my last post was the excellent MDX Solutions 2nd Edition. Whilst flicking through the book a while back, I though it would be interesting to see how DAX and PowerPivot could handle the so called 'Common Calculations and Selections in MDX'. This post continues that theme, focusing on one calculation in particular that's in the aforementioned MDX book, namely 'Unweighted Allocations down the Hierarchy'. Essentially this business problem is dealing with allocating data that is entered at a higher level (e.g. Quarter) down to a lower level (e.g. Day). It varies from business to business what the users actually want to see at the day level in this situation. Some expect to see the Quarter amount, some want to see a blank value and then some users want the amount in the quarters allocated down to the day level. If the expectation is that data should be allocated, then one way of doing the allocation is to use a ratio. In the MDX book, this is achieved by the following MDX: 1.0 / Descendants( Ancestor( [Date].[Calendar].CurrentMember, [Date].[Calendar].[Calendar Quarter] ), [Date].[Calendar].CurrentMember.Level, SELF ).Count If we put this into a query and run it against the Calendar hierarchy in AdventureWorksDW we get the following results: Therefore our quarter-entered value can simply be multiplied by this ratio to get the correct value at each level in the Calendar hierarchy. PowerPivot So the challenge is how to be able to do this in PowerPivot. I started off by looking for some data in AdventureWorksDW that would fit the bill. A good candidate seems to be the FactSalesQuota table, which I imported into PowerPivot along with its related dimensions: Once in the pivot table, I wanted to be able to calculate the correct ratio at the daily and monthly levels. For the daily level this is 1 / 90, as there are 90 days in the Q1 that year, and for the month level it is 1/3 as there are 3 months in the quarter. Given that there's no MDX style Ancestor() function in DAX, I ended up having to have a different calculation at different levels. Something that I learnt from this forum post is that you need to use IF() in order to isolate a calculation at a different level. Therefore the DAX that I came up with was: =IF( COUNTROWS(VALUES('DimDate'[FullDateAlternateKey])) = 1,     1 / CALCULATE( COUNTROWS('DimDate'), ALL('DimDate'[FullDateAlternateKey], 'DimDate'[EnglishMonthName]) ),     1 / CALCULATE( COUNTROWS( DISTINCT( 'DimDate'[EnglishMonthName]) ), ALL('DimDate'[EnglishMonthName]) )    ) The COUNTROWS(VALUES('DimDate'[FullDateAlternateKey])) = 1 allows me to check that we're at the day level, and if we are at the day level, then count the number of days in the quarter. Otherwise, if we're at the month level we calculate the number of months in the current quarter. When this DAX is put into a new measure we get the following results: This produces the same results as the MDX statement. The next step was then to apply this ratio to the Sales Amount Quota for the current quarter. This was achieved using the following DAX: =IF(       COUNTROWS( VALUES('DimDate'[FullDateAlternateKey]) ) < 90,       CALCULATE( Sum('FactSalesQuota'[SalesAmountQuota]),        All('DimDate'[FullDateAlternateKey], 'DimDate'[EnglishMonthName]) ) * 'FactSalesQuota'[Ratio],       Sum('FactSalesQuota'[SalesAmountQuota])    ) The < 90 at the start of the statement allows me to check if we're at the day level or the month level. If we're at either of these levels, then we want to pick up the Quarter-level SalesAmountQuota and multiply by the ratio. Otherwise, at the Quarter and Year levels, we just apply the original SalesAmountQuota value. This produces the following results: Conclusion The use of the MDX Scope statement would have been nice to pinpoint exactly where I wanted the calculation to occur, but then again, the use of IF() keeps it quite simple, which is better for an Excel power user. Whilst MDX is very very powerful, can you imagine a business user getting to grips with MDX statements such as Scope or Freeze? The bottom line is that the DAX calculation produces exactly what I want - the new AllocatedSalesQuota measure is correct at all levels in the Calendar hierarchy. Even with an IF() statement, like everything else in PowerPivot, the calculation performs very quickly.

DAX Closing Balances

One of the often required calculations for users in reports is to be able to calculate a point-in-time closing balance, which is especially relevant when dealing with stock levels or financial data. In the Analysis Services world, at least in the Enterprise Edition, we're lucky that we have a variety of Aggregation Functions that can deal with semi-additive measures such as a closing balance. Or we can always turn to MDX if we need to which will achieve the same thing. In PowerPivot, we don't have Aggregation Functions, but there is DAX, which has a variety of time intelligence functions up its sleeve. The idea behind this post is to see how PowerPivot would deal with the requirement of reporting a closing balance measure. DAX Time Intelligence Functions One of the DAX functions that we can use for our closing balance is called ClosingBalanceMonth(), which will simply evaluate an expression at the end of a given month. There's also the similar ClosingBalanceQuarter() and ClosingBalanceYear(). Having noticed these functions within PowerPivot, my first port of call was to set up a simple PowerPivot model by taking data from the FactInternetSales table in AdventureWorksDW, plus all it's related dimensions of course. What I ended up with was the following simple pivot table in Excel 2010: Sales Amount obviously isn't a semi-additive measure, but I'm treating it as one in this example. My first port of call was to use ClosingBalanceMonth() to try and get the value for the month to be equal to the value for last day in the month. I managed to come up with the following formula: =CLOSINGBALANCEMONTH(Sum([SalesAmount]), 'DimDate'[FullDateAlternateKey], ALL('DimDate'[FullDateAlternateKey])) This produces the results that are shown below, in the measure called ClosingBalanceFormula: It does a good job of calculating a closing balance for each month, but it presents that closing monthly balance on each day. Therefore, I started looking for an alternative approach and found the LastDate() function. It returns the last date that's in context for the passed date column. The last date that's in context at the month level will be the last date in the month, but thankfully the last date that's in context at the day level will be the current day. Therefore I came up with the following formula: ='FactInternetSales'[Sum of SalesAmount](LASTDATE('DimDate'[FullDateAlternateKey])) This produced the results that I wanted - the daily closing balance at the day level and the monthly closing balance at the month level:   A bit more digging lead me to find out that a combination of the Calculate() and LastDate() functions gets to the same result, as an alternative. That formula is: =CALCULATE(Sum('FactInternetSales'[SalesAmount]), LASTDATE('DimDate'[FullDateAlternateKey])) Conclusion It's a shame that the ClosingBalanceMonth() formula didn't quite work, but perhaps getting that approach to work will come out in the wash with more DAX experience. However, as with MDX or any kind of expression language, there's always going to be a load of different ways to get to the same result. It's much more efficient to work knowing that you've got several ways around a particular problem, rather than being forced down a particular path, and in fact, I think the range of calculation functions available shows the richness of the DAX language.

PowerPivot and Data Warehousing

A consultant that I know has recently been through the process of reviewing his company's management information requirements, and also the process of evaluating what technology platform can deliver those requirements. Naturally, a data warehouse was one of the options. A data warehouse, in my opinion, will provide the enterprise-level platform for any organisation to deliver their management information. In this case though, due diligence flagged up some alternative approaches to a data warehouse, namely the raft of self service BI tools that are out there, including PowerPivot. Other tools that can take data in from a variety of data sources are Tableau (which is also a good front end for Analysis Services) and Spotfire for example. I think that all of these tools have their merits. PowerPivot is incredibly fast and easy to use, whereas Tableau and Spotfire for example have some handy data visualisation capabilities. Some vendors actually claim that self service tools can replace a data warehouse. Can this really be the case? Whereas these tools do a good job of presenting data in a dimensional model, my personal view is that a data warehouse of course also carries this out, but in addition delivers the following benefits to name a few: Data Cleansing - The power of SQL in the backend and also features like Fuzzy Matching in SSIS mean that you can address the huge problem of data quality. The bonus here is that all data quality issues could be logged, meaning that you can go back to the owners of source systems and let them know that their data is in a mess. Control - If you do encounter data quality issues, you may decide that they are actually too bad to load into the data warehouse and present to your users. Transparency - if you want, you can choose to expose the data quality level of your facts to users, via the Kimball-style audit dimension. E.g. we may have noticed that within our retail data, one store has abnormally low sales. Therefore, we can assign this fact a lower data quality score than the other sales by store for the day, which lets the users know that the data quality steward is aware of the issue, and is looking into it. Slowly Changing Dimensions - A lot of source systems don't behave. E.g. if we're tracking a customer's Marital Status of 'single' or 'married', you often find that source systems only record the latest change to a given field. Without addressing slowly changing dimension issues in a data warehouse, we may never know that Customer X is now single, but once was married. Conformed Dimensions - When we have multiple data sources for a single dimension, the goal is to ensure that each dimension available to end users represents a single, consistent view of that dimension. Due to the often complex matching and de-duplication that's required, it's difficult to see how this would be possible without a data warehouse. Late Arriving Dimension Rows - Going back to the 'Marital Status' attribute, we can get the situation where a source system only updates a particular field a long time after the event actually occurred. E.g, if we think that a customer called John Doe is single, but then today we're told he's married, but has actually been married for the last two years. If Marital Status is an important piece of management information to the business, then we need to update all data associated with John Doe over the last two years, in order to reflect his correct Marital Status over the last two years. Complex Transformations - I recently encountered a source system where one of the columns in a SQL table held data in a varying length format of ‘#Product Type#Currency#Value, #Product Type#Currency#Value’. So what you basically have here is potentially say 25 records held in one row in a SQL table, whereas another table might have just 7 records within one row. Rows could also be badly formed! We ended up using a bit of C# within SSIS to split this data out so that the sub-elements could be analysed correctly. I could go on, but I think as a few simple examples the above will do. Whilst I'm an advocate of data warehousing, I also think that the self service tools, such as PowerPivot, can be complementary to a data warehouse. You'll often find, for example, that there's some data that's not in the data warehouse yet - after all some organisations have a lot of data, and it will take time before all the business analysis etc can be carried out to get this data into the warehouse. Another example is proto-typing, analysts can very quickly build a proto-type model, which can then be production-ised at a later stage. I think this video from Chris Webb is really worth watching - it covers these examples and a few more. In summary there's always going to be a need for users to go digging for data and producing their own reports - the data warehouse can't always deliver everything immediately. PowerPivot is going to be great in this situation, but it will work best when it's used alongside cleaned and conformed data, which is exactly what a data warehouse provides.

PowerPivot CTP3 Released

Following last week's release of the November CTP for SQL Server 2008 R2, the latest CTP for PowerPivot is now also released! The links for the downloads are: 32 Bit PowerPivot Download 64 Bit PowerPivot Download PowerPivot Main Download Page Included in the new CTP3 features are several new DAX functions, especially quite a few around time intelligence. Some look quite comparable to MDX, whereas others possibly give even more functionality than MDX. This is something I hope to blog about soon.....

Creating a Custom Gemini/PowerPivot Data Feed – Method 1 – ADO.NET Data Services

There are already a few good Gemini/PowerPivot blogs that provide an introduction into what it is and does so there is no need for repetition.  What I haven’t seen are examples of how existing investments can be harnessed for Gemini/PowerPivot based self-service analytics. This series of posts focuses on various ways of creating Custom Data Feeds that can be used by Gemini/PowerPivot natively – Providing a direct feed from otherwise closed systems opens up new channels of analytics to the end user. Gemini/PowerPivot supports reading data from Atom-based data feeds, this post looks at a quick way of creating an Atom-based feed that can be consumed by Gemini/PowerPivot.  By far the simplest way to develop an Atom-based data feed is to employ ADO.NET Data Services in conjunction with ADO.NET Entity Framework.  With very few (in fact one and a bit!) lines of code, a data source can be exposed as a feed that Gemini/PowerPivot can read natively.  I am going to use the AdventureWorksDW sample hosted by a SQL Server 2008 R2 instance for this – obviously Gemini/PowerPivot natively reads SQL Server databases, so creating a custom feed over the top may seems a little pointless.  However, this technique may be useful for quick wins in several scenarios, including: - Preventing the need for users to connect directly to the underlying data source. - Restricting access to various elements of the data source (tables/columns etc) - Applying simple business logic to raw data. ADO.NET Data Services are a form of Windows Communication Foundation (WCF) services, and therefore can be hosted in various environments.  Here, I will simply host the ADO.NET Data Service inside an ASP.NET site. To create a Native Gemini/PowerPivot feed, you take seven steps: 1 - Create ASP.NET Web Application 2 - Create Entity Data Model 3 - Create the Schema 4 - Create the Data Service 5 - Load From Data Feed 6 - Create Relationships 7 - Test Step 1) Create ASP.NET Web Application I’m using Visual Studio 2008 here to create an ASP.NET Web Application. Step 2) Create Entity Data Model Add an ADO.NET Entity Data Model item to the project, these files have a .edmx extension and allow us to create a schema that maps to the underlying database objects. Step 3) Create the Schema We simply require a 1:1 mapping so will ‘Generate from Database’.  Incidentally, the ‘Empty Model’ option allows you to build a conceptual model of the database resulting in custom classes that can be optionally mapped to the database objects later. Create a Microsoft SQL Server connection to AdventureWorksDW2008. Select the appropriate database objects, I’ve selected the following tables: - DimCurrency - DimCustomer - DimDate - DimProduct - DimPromotion - DimSalesTerritory - FactInternetSales Once the wizard has completed, a new .edmx and associated cs file is created that respectively contain an Entity Relationship Diagram and a set of Auto Generated Classes that represent the database objects. Due to the way the Entity Framework handles Foreign Key Constraints we have to apply a workaround to ensure the Foreign Keys on the FactInternetSales table are exposed and brought into Gemini/PowerPivot.  A previous post Exposing Foreign Keys as Properties through ADO.NET Entity Framework walks through the workaround.     Step 4) Create the Data Service Add an ADO.NET Data Service item to the project. The service class inherits from a generic version of the System.Data.Services.DataService object, so we need to inform the compiler what class to base the generic object on.  We essentially want to base our Data Service on the class representing our newly created Entity Data Model.  The class name is derived from the database name, unless changed when the Entity Data Model was created, so in our case the class name is AdventureWorksDW2008Entities. The auto generated service class contains a ‘TODO’ comment that asks you to ‘put your data source class name here’.  The comment needs replacing with AdventureWorksDW2008Entities. The final step is to expose the resources in the Entity Data Model.  For security reasons, a data service does not expose any resources by default.  Resources need to be explicitly enabled. To allow read only access to the resources in the Entity Data Model the InitializeService method needs updating with a single line of code.  The code snippet below details the final class implementation, notice the AdventureWorksDW2008Entities reference at line 1 and the the explicit resource enablement at line 6. Code Snippet public class GeminiDataService : DataService<AdventureWorksDW2008Entities>     {         // This method is called only once to initialize service-wide policies.         public static void InitializeService(IDataServiceConfiguration config)         {             config.SetEntitySetAccessRule("*", EntitySetRights.AllRead);         }     } That’s all that’s needed, by default, ADO.NET Data Services conform to the Atom standard, so in theory the Service is ready to be consumed by Gemini/PowerPivot. Before we try, it’s worth giving the service a quick test, building and running the solution (F5) launches Internet Explorer navigating to the Service hosted by the ASP.NET Development Server. You are first presented with an XML document containing elements that represent database objects, you can further drill into the objects by amending the URL.  For example, if you want to see the contents of the DimPromotion table then append DimPromotion to the end of the URL: http://localhost:56867/GeminiDataService.svc/DimPromotion (Case sensitive) Note:  You may need to turn off Feed Reader View in Internet Explorer to see the raw XML (Tools->Internet Options–>Content->Settings–>Turn On Feed Reader View – make sure this is unchecked) As a slight aside, the URL can be further enhanced to, filter, top n rows, extract certain properties etc etc. Here are a couple of examples: URL Effect http://localhost:56867/GeminiDataService.svc/DimCustomer?$top=5 Return the top 5 Customers http://localhost:56867/GeminiDataService.svc/DimCustomer(11002) Return Customer with id 11002 http://localhost:56867/GeminiDataService.svc/DimCustomer(11002)/FirstName Return the First Name of Customer 11002 http://localhost:56867/GeminiDataService.svc/DimProduct(310)?$exapnd=FactInternetSales Returns Product with id 310 and all related Internet Sales Records Confident that the feed is working, we can now deploy the service, and start using the feed in Gemini/PowerPivot.  Step 5) Load From Data Feed Open up Excel 2010, launch the Gemini/PowerPivot Client (by selecting ‘Load & Prepare Data’) Select ‘From Data Feed’ from the ‘Get External Data’ section of the Gemini/PowerPivot Home Ribbon to launch the Table Import Wizard. Specify the Url from the ADO.NET Data Services feed created earlier, in my case: http://localhost:56867/GeminiDataService.svc as the 'Data Feed Url’ and click Next. Incidentally, you can use the majority of the enhanced Urls to, for example only select the DimProduct table should you so wish, however by specifying the root Url for the service you have access to all objects exposed by the service. From the Table Import Wizard Select the required tables, in my case I’ll select them all.  (You can optionally rename and filter the feed objects here too). Following the summary screen, the Gemini/PowerPivot Client then gets to work importing the data from the ADO.NET Data Service: Once completed, Gemini/PowerPivot displays all the data from all of the feed objects as if it came directly from the underlying database. Step 6) Create Relationships There is one final step before we can test our model using an Excel Pivot Table.  We need to create the relationships between the tables we have imported.  The Gemini/PowerPivot Client provides a simple, if a little onerous way of creating relationships, the ‘Create Relationship’ action on the Relationships section of the Home Ribbon launches the Create Relationship wizard: Each table needs relating back to the primary Fact table which results in the following relationships: Step 7) Test We are now ready to start our analysis, selecting PivotTable from the View section of the Gemini/PowerPivot Client Home ribbon creates a pivot table in the underlying Excel workbook attached to your custom fed Gemini/PowerPivot data model.         So, to allow fast access to, for example, potentially sensitive data, through Gemini/PowerPivot you can quickly build a custom data feed that can be consumed natively by the Gemini/PowerPivot Client data feed functionality.