Calvin

Calvin Ferns' Blog

Power BI and SSAS Tabular conversion

One of the most compelling aspects of the Microsoft Power Pivot tool is the ability to take a workbook that has been developed as a quick prototype in Excel and promote it to a corporate SSAS Tabular model. 

This way a business team is able to ‘self serve’ their requirements with Power Pivot and deploy to SharePoint or their Power BI Site.  The monitoring tools as part of these sites can pick up on important and heavily used workbooks and make them visible to IT as candidates for bringing into the corporate managed server tool set through SSAS. 

You access this functionality in SSAS by starting a new project in SSDT and selecting Import from PowerPivot

image

Power Query

To enhance users ability to self serve their BI requirements Power BI also provides a tool called Power Query.  This tool acts as a data cleansing  and ETL tool allowing business users to gather data from any available source, process it and then integrate it as part of a Power Pivot model.  This tool is very flexible and easy to use, it is able to get data from nearly any website, Twitter, Facebook and, in the case shown below, Azure Blob Storage.

image

The Load to Data Model checkbox allows us to directly bring this data into Power Pivot and build rapid prototype analysis over this data.  This opens up a large number of business user cases to integrate on prem data with Cloud data and openly available web data from outside the organisation.  Really interesting analysis such as comparing spend on sales promotions to social media activity can then be performed by Excel business analyst from the desktop with just an Office licence.

Current Limitations

Unfortunately this Power BI story is currently missing the final pieces to close the loop, I'm sure these will come but you need to be careful with how you use them at the moment.  The current situation is that if you use Power Query load to data model then any upgrade of the model from Power Pivot to Tabular will be problematic.  Depending on the type of Power Query source you are usually able to import the metadata of the Power Pivot model but you will not be able to refresh and data through the Power Query connection manager defined.

Connection as seen in Tabular:

image

In my local environment trying to review or change the source of an SSAS table that was converted from a Power Query gives the error:

The 'Microsoft.Mashup.OleDb.1' provider is not registered on the local machine.

image

This means although I can bring in the metadata I'm unable to swap out the source query in the UI and so will need to delete and re-add the data manually – thus losing the benefit of the import from Power Pivot.

Possible workarounds

Workaround 1

It is possible to load Power Query results to a worksheet rather than directly into Power Pivot (the Data Model).  It is possible to bring some data in through Power Query and still have an updatable model by performing the following steps:

  1. Build a Power Query and save results to worksheet
  2. Save worksheet out as .csv
  3. Build a new Workbook containing Power Pivot Data model
  4. Import the .csv as a table from a text source
  5. The .csv file source will continue to be refreshable in SSAS Tabular and it will be possible to edit the table source.
Workaround 2

Wait, I know Microsoft are looking to improve the integration of Power Query so we must hope that small issues like this will go away with future releases.

 

I would not really expect SSAS to be the right tool to embed Power Queries into in the corporate BI world so really what is missing is the piece of functionality to easily bring Power Queries into SSIS so they can be integrated into warehousing architectures.

In the mean time be very careful about mixing Power Query as part of the same architecture as SSAS Tabular models as they are not currently easy bedfellows.

KPI’s in Power View and DAX Query for KPI’s

I was recently approached by a client where we have assisted in implementing a Microsoft Analysis Services tabular solution and a supporting management pack of SSRS reports.  They were asking about utilising some of the KPI’s defined in their tabular cube in the SSRS reports using DAX queries.  I was at a bit of a loss and most web help seemed to suggest it couldn’t be done:

Blog comment from 2012

However, with a bit of time to investigate it appears Power View does now support Tabular KPI’s and as such a DAX query should be able to pull them out. 

image

My KPI base measure is named “Highest Ranking” and I am able to view its value, status or goal in Power View.  A little work with every SQL developer’s best friend Profiler and I had what I was looking for.  It appears that when you create a KPI SSAS is defining some measures in the background which do some of the work of the MDX KPI functions.

Therefore the following DAX query against my cycling Demo cube contains the expected results and could therefore be used in SSRS reporting.

EVALUATE ADDCOLUMNS(
VALUES('DimRider'[Name])

,

"Highest_Ranking", 'FactRanking'[Highest Ranking],

"Highest_Ranking_Goal", 'FactRanking'[_Highest Ranking Goal],

"Highest_Ranking_Status", 'FactRanking'[_Highest Ranking Status]

)

ORDER BY 'DimRider'[Name]

 

I also tested for the existence of other KPI functions such as Trend and Weight but these do not appear to be present.  It is also interesting that the use of a KPI over a measure does not change the measure name but just group it in the field list and as such there is no need for a value function.

image

For more info on KPI’s in Power View there is additional documentation here.  I am currently unsure of the required versions for this functionality so if it is missing from your environment I would love to hear from you.

Power BI Natural Language (Power Q&A)

Power BI preview has had a few new features added over the past few days.  The most exciting of these is the Q&A functionality.

I have only got to play with the sample workbooks so far (as i don't think you can yet add data) however it is as impressive as the Amir Netz Demo promised!

If you have a Power BI site already you will need to add the samples to get the new content, this then appears as a popular questions area above your reports library.

image

Clicking through there give a selection of popular questions with some nice images (how are we going to be able to control them in ‘real life’ scenarios?)

image

Clicking through to ‘Sports with the most medals’ give us a nice visualisation of the data we have asked for as well as a number of suggestions of similar questions to ask.

image

As some of you may know I’m a keen cyclist, so what happens if i free type ‘count of cycling gold medals by country name’?  Power Q&A shows me a map as shown below

image

This is really interesting in that it has correctly realised i would like a bubble map, however you can see from the detailed description that what is is actually displaying is the Olympic host country where the medals were won.  This was not the data I was looking for so i will change my query to Nationality.  This is a reminder of the complexities of Natural Language search but i am impressed with the Power BI display of what it thinks you mean allowing for quick corrections.

image

This is much more like it, the most successful Olympic cycling nations of all time, we can take this further and add more filters or visualisations:

image

image

So to wrap up, initially i am very impressed.  I can’t wait to point it at some real business data and see what I can do with it in that environment.  I'm really excited to see how you manage the metadata and how much control you have with synonyms.  Get hold of the preview if you can and prepare to be impressed!

Feel free to get in touch @calvinferns

Update:

you can also ‘show field list’ and go back to working the traditional way or double check what it has chosen for you!

image

Power BI preview– Mobile Integration

I have had the invite through to view the Power BI preview today (Get yours here).  The first thing I wanted to do was get some reports up and start displaying them through my Surface and its Power BI App.

The app comes pre-installed with samples but to link it up with your own site you need to hit browse from the app menu.  From there you can add a location.  What caught me out was where do i navigate to to get my Power BI preview reports?

Well it is hidden away in the documentation here that you don’t try and navigate directly to your Power BI site as I did but you directly give the SharePoint site and it will pick up the Power BI app from there. 

Power BI mobile app URL address box

From there you can navigate to the reports you have uploaded to your preview site and favourite the ones you want to appear on the home screen.  The couple I have on there can be seen here:

clip_image002

You can control which sheets get displayed by changing the browser view options of the workbook, as described here

More to come on Power BI shortly..

My highlights from the Microsoft “Performance Tuning of Tabular Models” Paper

Microsoft have now released the highly anticipated and detailed paper on performance tuning tabular models.  It is available here

At over 180 pages and a high level of technicality it takes a bit of getting through, this blog aims to summarise a few of the points I feel are new to me and useful.  I wont cover the DAX query plan section in detail here but its well worth a read.  There is lots that i wont mention as I had already come across it elsewhere. 

As such I will mostly be pointing at important sections in the document, all credit goes to the authors: John Sirmon, Greg Galloway, Cindy Gross, Karan Gulati

Query speed
  • Avoiding error handling functions made easier with fourth parameter on SEARCH, FIND and the DIVIDE function (p47)
  • Prefer ISEMPTY to ISBLANK (p50)
  • Only evaluate many to many measures when you need to (p55):

image

  • Train users of Power View and Excel to add measures first to avoid cross products of dimensions (p57)
  • Beware of Excel compatibility mode and Pre 2010 Excel, this can send less efficient queries which return unnecessary sub total rows (p60)
  • Excel slicers connected to a lot of pivots is a known performance issue (I thought it was just my cube!) (p64)
Processing
  • The Process Data followed by process recalc pattern i knew about, however the paper also proposes that if you are overnight processing and users are not querying during your process then doing a process clear and then process full reduces your memory pressure as you don't need both copies in memory at once.  Great for those smaller environments. (p75)
  • You can control how many connections can be opened against your data source for parallel processing of tables, default is 10 (p82)
  • Data types and encoding, lots to consider here, generally favour currency and avoid stings where you could use int. (p87)
  • Compression setting can be adjusted to fine tune query performance vs processing time (p89)
  • To process tables in parallel you need to manually adjust the XMLA to include Parallel tags (<Parallel></Parallel>) around all processing commands (p95)
  • You can query to check when a process defrag is necessary rather than it just being an overhead task (p116)
Server Settings
  • Get all the latest cumulative updates, lots of fixes and performance improvements are not in SP1 (118)
  • On large servers with a lot of RAM raise the Low Memory limit as the default allocation is probably too aggressive (p127)image

That's all I will list out for now but there is a lot in there

Thanks

@CalvinFerns

SSAS Tabular, PowerPivot and conformed snowflakes

Many of you will be aware of the Kimball concept of conformed dimensions.  This is the principle of joining multiple fact tables, often from multiple source systems, into the same dimension table.  Using this shared dimension the many fact tables can be analysed by the single view of the dimensional data and deliver powerful cross dataset analysis. 

In many cases the granularity of the differing source systems is different, a typical example of this is shown in the Adventureworks where sales data is gathered against an individual product but survey data is gathered against Product Category and SubCategory.  In order to facilitate easy key generation and lookups the Product Category and SubCategory are then separated from the main product table, this is described as a Snowflake design pattern as shown below. 

image

The first SSAS Tabular consideration here is that if you need to join in at a less granular level than the primary dimension table is is essential to present a snowflake to SSAS tabular, it cannot join in at a higher grain like SSAS Multidimensional can.  This is due to its need for a distinct list on the dimension side of the join key. 

The relationships in SSAS Tabular or Powerpivot can then be set up as shown below to join other data in at a higher grain such as the FactSurveyResponse below

image

The next step is to browse this data and check that we can indeed view all our data by Product SubCategory, using the ‘Analyze in Excel’ button we can browse the development cube and set up the pivot table below.  Allowing the slicing of disparate data by common dimensions in this fashion is one of the biggest business benefits of data warehousing. 

image

The solution thus far works but, in Excel and other add hoc tools, the three product tables and large number of attributes are confusing to the end user.  As such we should look at hiding unused and unwanted columns and using the tabular cube to denormalise or ‘flatten’ the options presented to the business user.  Unwanted columns are removed by using the hide from client tools options or removing unwanted columns from the cube all together (ideally using SQL Views).  To reduce the number of tables presented to the user we will hide DimProductCategory and DimProductSubCategory entirely from the user while adding derived columns holding this information to DimProduct.  To achieve this we use the RELATED keyword in DAX to add a few new columns.  Adding a ProductSubCategory Column to DimProduct we use the following expression:

=RELATED(DimProductSubcategory[EnglishProductSubcategoryName])

Once the above column is available, refreshing the Pivot table used earlier with the new column gives in the following result:

image

This incorrect result is the impact of the fact that relationships are only propagated towards the fact table automatically in Tabular and Powerpivot.  We need to explicitly specify that the measure we are using relates to the DimProduct table through ProductSubCategory and the context of this table needs to be used.  To do this we need to alter our measure calculation to take into account the related table DimProduct, unsurprisingly  this is achieved with the RELATEDTABLE function.  We use the CALCULATE function in combination to aggregate our data as appropriate.  The measure DAX formula can then be defined as follows:

=CALCULATE(
    DISTINCTCOUNT([CustomerKey]),
        RELATEDTABLE(DimProduct)
        ,VALUES(DimDate),
        VALUES(DimCustomer)
    )

A values statement is needed to allow the current filter context to be used for all other dimensions on the fact, this doesn’t seem very elegant and I am open to a better way of doing it. The measure as defined above can then be used in excel to give the same result as we first achieved while giving a much cleaner user interface with a single product table for users to find all their information.

Any question or improvements get in touch here or @CalvinFerns

Thanks for reading


SQL PASS Summit 2012

Having just returned from SQL PASS I thought I would blog a few highlights about the conference and some of the things I came across and am looking forward to.  I went over to Seattle with two of my colleagues and we started with a little sight-seeing to get our body clocks back in sync and I would thoroughly recommend it.  The jetlag from the UK gets to you for the first couple of days and you get to see things like this:

DSC03818

The Adatis SQLPass team: Myself, Neil Dobner, Tim Kent

This was my first time at PASS and I was not disappointed.  The first couple of evenings in Seattle were spent catching up with some people I have met at SQL Bits.  The SQL community being what it is I was quickly introduced to more people and making new connections.  The #SQLfamily on twitter are a welcoming bunch and really add a lot to the atmosphere of a friendly conference.  I would strongly recommend using twitter for both connecting with people and using the #SQLHelp hash tag to ask questions.  I had come across a number of the speakers at SQL Bits and they were all very good quality and I heard very little negative comment on the sessions across the 3 days. 

There were a good number of highlights for me that I would like to pass on in case they are as relevant to you as they are to me.  I've split them down into a few areas, firstly Powerview:

  1. PowerView coming to SSAS Multidimensional
  2. PowerView in Excel
  3. Powerview on Multidimensional will come to Sharepoint well before it gets into the Office Excel release cycle
  4. Image Urls and geospatial reports are also particularly easy

Some brilliant stuff in a Tabular internals stuff given by Akshai Mirchandani and Allan Folting, DAX queries by Alberto Ferrari and 2012 SSAS by Chris Webb:

  1. Insights into value encoding and hash encoding compression which makes sense of guidance such as
  2. Don't bring in any row level identifier or document numbers etc unless you really have to
  3. Use a process defrag as maintenance when doing updates and deletes
  4. Lower end hardware can be quicker, ie go for less but faster procs
  5. Profile you DAX queries to look out for formula engine calls using: CallBackDataId
  6. Partitions are not eliminated by the query optimiser they only help you manage processing, this is particularly impressive because the engine is so fast it didn't need this!
  7. Direct query really is unusable given the limitations and the awful SQL it generates
  8. Two thirds of the SSAS multidimensional codebase is there to deal with M2M correctly

Finally some database engine stuff:

  1. Windowing functions in 2012 really will be a lifesaver for complex queries, especially lag and lead for those moments where your user thinks you work in Excel
  2. Updatable columns store indexes!
  3. Licencing can be a nightmare of Cores, processors and virtualisation but the key way to keep it sensible on the new model is to keep you core density at 4 cores per processor.

I met loads of fantastic interesting people and have picked up some great contacts.  The conference has also had the added side affect of getting me excited about SQL Server again.  It is easy to get caught in the day job and forget what a brilliant product we work with and what hard work goes on at Microsoft to keep it that way.

Thanks everyone

See you Next Year? or maybe SQL Bits a bit closer to home.

@CalvinFerns

HP Business Decision Appliance–PowerPivot in a box!

Further to my blog post about SQL server appliances Microsoft and HP also offer the very exciting Business Decision Appliance.  This contains a preconfigured environment with:

  • Microsoft Windows Server 2008 R2 Enterprise Edition
  • Microsoft SQL Server 2008 R2 Enterprise Edition with PowerPivot integration for SharePoint
  • Microsoft SharePoint 2010 Enterprise Edition Prerequisites for SharePoint and PowerPivot
  • Appliance Administration Console
  • Appliance-specific SharePoint Home Page
  • Up to 80 Concurrent Users

The aim of this appliance is to provide a safe and scalable environment for business users to quickly put a PowerPivot environment in place.  This is isolated from existing systems and can be implemented by a business department with very limited IT involvement.  Its isolated nature is important as many organisations don’t run SharePoint 2010 throughout their enterprise.  PowerPivot authors also need Excel 2010 but once reports are built and deployed they can be shared throughout the business through SharePoint using any web browser.  Anyone who has already tried to set up an integrated SharePoint and PowerPivot environment will know that the installation is not simple. This appliance takes that pain away with a one click installation from first start up that can have you up and running in under an hour.

The key point is that business users love PowerPivot and its ability to quickly use their data to answer any question.  They can respond dynamically, collaborate and share insights throughout the organisation.  Importantly PowerPivot allows the creation of reports that look great and are very fast.  All this is done from within the familiar Excel interface and requiring little training to get started.  Any team of business analysts would likely have a massive boost in productivity from the installation of the BDA in their department.  IT can then monitor those reports that are heavily used and decide whether they need making more robust through transition to enterprise software such as Analysis Services.  PowerPivot use is at its easiest in an environment where there is a clean data warehouse but where the business aren't happy with the speed or responsiveness to change of their current front end.  In an environment with more disparate and dirty data the end users need to be more SQL and data modelling skilled, but the payback can be even greater.

I see massive opportunity in this appliance for any team of data analysts to be able to deliver massive value to their business right now. 

More Info on the BDA 

SQL Server can scale

If you work in large enterprises you often come across the attitude that Microsoft SQL Server can’t scale.  People therefore turn to vendors such as Oracle and Teradata for solutions at very high cost.  This attitude often comes from in-house experience of a growing SQL database which hits something between 5 and 10TB and starts slowing down and becoming difficult to work with.  These databases are often run on multi-instance servers with shared SAN access and little thought to data fragmentation.  Its no wonder that their environment has problems!

Microsoft Strategy

I was recently invited to a Microsoft partners event which aims to challenge these attitudes and give partners better information to discuss large scale SQL implementations. Their strategy is to use partnership with HP to give SQL server a balanced hardware configuration to run on.  This will solve the major bottleneck which gives SQL Server the performance problems people perceive are with the software.  They have created a range of solutions together to allow SQL server to scale effectively to meet any need.   The first of these solutions was released a couple of years ago and there are more in the pipeline.  There are real world case studies available from Microsoft detailing the success of this strategy for many early adopters. These solutions are targeted at the workloads they need to run and are detailed below.

Warehousing

< 5TB – HP Business Data Warehouse Appliance

20TB – 80TB – Fast Track Data Warehouse

126TB – 500TB – HP Enterprise Data Warehouse Appliance

Applications/ Consolidation/Private Cloud

100VM’s- 10,000VM’s

OLTP

On the way shortly - Will scale to the worlds largest OLTP Implementations

These solutions are all positioned such that they are going to be significantly cheaper than almost any other vendor while providing equal or better performance.  The appliances also provide excellent improvements in time to market, drastically reducing required configuration time.  I would also recommend the 5TB box to anyone building a data warehouse of any size.  This give you the confidence that you are buying a competitively priced server which is optimised for a SQL workload and you are not going to hit any hardware bottlenecks which you might come across on a shared or custom build server.  The evolution of the Microsoft appliance and fast-track structure has also produced predictable benchmarks.  You can choose the performance you need by picking the right server and if you follow the best practices you can be sure that you will achieve the throughput and response time needed.

Oracle’s Exadata platform claims to be able to manage any workload at any scale.  What they don’t specify in the sales briefs is the amount of configuration required to match the platform to the workload in your environment.  This means you need to invest in a lot of time for your Oracle DBA’s to tune and configure the hardware correctly.  It is then not a comparable product to a SQL server installed on a spare bit of hardware which has very little time invested in it. 

With a balanced hardware configuration and best practice ETL Microsoft SQL Server can meet any scale requirement asked of it.  So next time you come across the “SQL can’t scale” attitude don’t be afraid to let people know that they are out of date and SQL will be able to meet their requirement at a far lower cost than the competitors.

(There is another appliance out for self service BI but it deserves a post all of its own - BDA )

Slides from Avon Information management talk

Distributing the slides from my talk last night, it was well received and I think everyone was able to take something away.

Credit goes to Marco Russo and Alberto Ferrari for their data modelling ideas, visit www.sqlbi.com 

http://dl.dropbox.com/u/25925341/Learning%27s%20from%20large%20scale%20dw%20project.pptx

Hopefully not my last talk as i enjoyed doing it.  Look out for further Bath based events at:

http://avonim.wordpress.com/

All comments welcome

Calvin