Adatis

Adatis BI Blogs

PowerBI Optimisation P3– Extracting and Source Controlling PowerBI Data Models

Source Control – once seen as “something proper developers do” – has been an integral part of the way business intelligence developers work for a long time now. The very idea of building a report, data model or database without applying some kind of source control actually pains me slightly. However, there has been a push for “Self-Serve” reporting tools to strip out anything that looks remotely like a technical barrier for business users - This includes the ability to properly track changes to code. We find ourselves in a very familiar situation – versions of PowerBI desktop files are controlled by including version numbers in file names. I’ve seen several copies of “Finance Dashboard v1.2.pbix”. This is obviously dangerous – who’s to say that someone didn’t open up the file, edit it and forget to increment the name. Once a file has been shared, there’s no controlling what changes happen at that point. If this happened to an SSIS package, for example, we would still be able to perform a code comparison. This would highlight differences between the two packages so we could accurately see what caused the changes. This is not currently possible with PBIX files in their entirety. We can, however, compare the data model behind the file. This allows us to check for changes in business logic, amendments to DAX calculations, additions of new fields etc. If the performance of two PBIX files different drastically even if they were meant to be the same “version”, then this is a reasonable starting point! Extracting the Data Model from a PowerBI PBIX File Firstly, we need to extract the JSON that describes the Tabular Model embedded model (technically, this is TMSL, the tabular model scripting language, but it’s still JSON…) We can do this by connecting to the model via SSMS. I’ve talked about the steps required to do this here. So, assuming you have found your temporary SSAS port and connected via SSMS, you should see something like this: As we would with any other Tabular model, you can right-click and script out the database as so: If we do this to a new query window, you’ll see the various JSON objects that describe your PowerBI model: This script contains the details for all tables, attributes, DAX measures etc required for your data model. Comparing PowerBI Data Models What if someone has been using a specific version of my PowerBI desktop file, but they’ve modified it and it has stopped working? For a Tabular model, I’d compare the model definition to source control which will automatically highlight any changes. Now that we can script out our PowerBI model, we can apply the same principles. Say, for example, I make a couple of changes to my sample PowerBI report and want to figure out how it has changed compared to a baseline script I exported previously. The easiest option is to use a tool like Textpad – here you can compare two text documents and it will highlight any differences it finds between the two. For example, I changed the name of a table and removed a column, the text comparison highlights this change as below: I can now be confident that if someone sends me a PBIX file, I can check to see if there are any data model changes without having to manually eyeball the two side by side. This alone is a huge leap forward in manageability of models. The next step would be to add this file to an actual Source Control provider, such as Visual Studio Team Services. This tool is free for the first 5 users and can be used with Visual Studio 2015 Community Edition – which is also free! Essentially you would add this exported script to your source control directory each time you updated the model. By checking in your new model, you can compare previous versions, much like with the TextPad editor above. Final Thoughts In the end, this isn’t real, true Source Control. If you make a mistake, you can only view what the previous configuration was, you cannot roll back code directly into your PowerBI model. It is, however, a step towards managing PowerBI with a bit more discipline and rigour. I don’t see this as a huge drawback as rumours on the wind are hinting at larger steps in this direction coming with future releases. Let’s hope we’re not having to work around these problems for much longer!    

PowerBI Optimisation P2–What’s using all my memory?

If you're a regular user of PowerBI, you're probably aware of the size limitations around datasets and it's very likely you've hit them more than once whilst writing reports on top of large datasets. It's difficult to see where size savings can be made directly through PowerBI, but we can use traditional tabular optimisation techniques to help us! For those not in the know, a single dataset can be up to 1Gb in size, with excel files limited to 250mb. Each user also has a storage limit as follows: Free users have a maximum 1 GB data capacity. Pro users of Power BI Pro have 10 GB maximum capacity. Pro users can create groups, with a maximum 10 GB data capacity each. For more information about the limits themselves and how to view your current usage, there's PowerBI blog about it here: https://powerbi.microsoft.com/en-us/documentation/powerbi-admin-manage-your-data-storage-in-power-bi/ But what if you're hitting that 1Gb data limit? There's very little within PowerBI itself to help you understand which tables are the largest, where you could make some savings, or generally anything about your model itself. The answer is to connect to the model via SSMS and take advantage of the Tabular system views, as described here. What determines Tabular model size? It’s worth discussing this briefly before going into the details. Put very simply, the XVelocity engine used by the tabular model will hold more data if there are more unique values for a column column. The key to avoiding large models is, therefore, to avoid columns with huge numbers of lots of distinct values. Text fields will generally be pretty bad for this, although there are common design patterns to avoid the worst offenders. A simple example is to look at a DateTime column – this combination of date and time means that each minute of each day is a unique value. Even if we ignore seconds, we’re adding 1140 new, distinct records for every day within the system. If we split this into two fields, a date and a time, this problem goes away. Each new date adds just a single record, whilst we will never have any new hours and minute combinations, so that’s a controllable field. There are a few techniques to avoid these problems if you find them, I’d advise heading over to Russo & Ferrari for some general tips here and some more detailed techniques here. Accessing Memory Usage Data So - following the above instructions, connect to your data model and open a new DMX query: Here you can use SQL syntax to query several DMVs behind the model - not all of them will be relevant in the cut-down tabular instance that PowerBI uses but there is one in particular that will help us manage our model size - DISCOVER_OBJECT_MEMORY_USAGE. Admittedly, on it’s own this is pretty incomprehensible. We can filter down the results slightly into something that makes a little sense, but you’ll generally get a big list of model entities with numbers against them – OK as a starter but not great as an actual model optimisation tool: Stopping here we would at least have a hit-list of the worst-offending columns and we can use this to start tackling our model. But there are much better ways to approach this problem! Tabular Memory Reports There are several free tools made available within the SSAS community for people to analyse their current SSAS memory usage. These tools simply query this same data but apply a bit of data modelling and make the data much more accessible. For straight tabular, I would tend to use Kasper de Jonge’s old excel spread, which pulls in data quite reliably, however there is an updated PowerBI Model found here. However, this doesn’t play nicely with the PowerBI flavour of tabular just yet, so I would advise using the SQLBI.com Vertipaq Analyser. Following their instructions and pointing it at my temporary tabular instance, we can refresh successfully and use their categorisations to explore the model. I’ve added some conditional formatting to help see where the issues are. I can see, for example, which of the tables in my model are the worst offenders, and what’s causing it: Interestingly the Customer dimension is pretty huge in my example. It has a lot less data than my fact but the dictionaries required are pretty hefty. Dictionaries are built using string lookups and are heavily affected by high volumes of unique values – so I can presume I’ve got some pretty big text strings in this dimension. Looking at the Column breakdown, I can see where the offenders are: This tells a slightly different story – my main offenders are from one of the hidden date dimension tables (A sign that relying on PowerBI’s inbuilt date functionality can be a memory drain) and the Sales Order Number – a unique identifier for my fact, obviously this is going to have a large number of distinct values. The other columns we can do more about – Email address is the next offender. We can assume each customer, of all 18,000 will have a unique email address. However, it’s very rare that we would want to do analysis on the email address specifically, this is a good candidate to remove from the model. At the very least, we could consider keeping only the domain which will yield much fewer unique values.   Hopefully the above will help you move forward in reducing your PowerBI data model size – I’ll be posting about Performance Analysis & Source Control over the next couple of days.

PowerBI Optimisation 1 – Connecting Via Management Studio

I recently gave a talk to the London PowerBI UserGroup and I kicked things off with a confession - "I don't do much report building in PowerBI". Perhaps an odd way to qualify myself to speak to that particular audience. But I am, however, a cloud solution architect - I spend my time designing large scalable cloud systems to process vast amounts of data and PowerBI is a common tool used on top of these systems. Why then, do we accept the lack of controls available within PowerBI? Given any other end-user system I'd want to know about performance bottlenecks, about data model efficiency and, more than anything, I'd want it in source control. First and foremost, the talk is available here. The key to it all, is realising that PowerBI Desktop, when running, starts a SQL Server Analysis Services processes in the background. It doesn't just use the same engine as Tabular, it literally runs tabular in the background without telling you. Open up a PowerBI Desktop file and, after you've seen the "initialising model…" window, you'll see this process in the background - one for each PBID session. So - if the model is using Tabular in the background, we must be able to actually connect to the model! First - Find your Temporary SSAS Port There are two straight forward ways we can achieve this: 1. By far the easiest, is to open up DaxStudio if you have it installed. When you open DaxStudio, it gives you a Connect window, which lists all of the PowerBI processes you have running in the background, as well as any Tabular services: When you connect to a PBI file here, you'll see the Port listed In this case, my port is 5524 -be aware that this will change every time you open PowerBI Desktop, so you can't hardcode anything looking for your "powerbi port". 2. Alternatively, you can find the "msmdsrv.port.txt" file related to your specific instance. Take a look in your user appdata folder, you should find a Microsoft/Power BI Desktop/ folder with some analysis services details: C:\Users\<YourUser>\AppData\Local\Microsoft\Power BI Desktop\AnalysisServicesWorkspaces\ You'll see an instance for each of your PBI Desktop instances, I've only got one at the moment: Inside this folder, in another folder called "Data", you'll find the file we're looking for: Opening this file, we see: Pretty straight forward, and no DAX required. Obviously if you have multiple instances, you'll need to figure out which of these relates to the instance you're after. Connect via SSMS Now that we know our port, we can simply open up management studio, connect to analysis services and enter "localhost:" and the port number from earlier.   Once connected, you'll see a model connection - each PBIX file will have a GUID for this instance, but you can drill down and see the objects underneath, exactly as you would with a Tabular model: You can now write queries, browse the model and basically treat it as a Tabular instance. The Database itself will use a generated GUID, and several internal tables will do the same - you can see above that a hidden data table has been created for every datekey included in my model. We'll discuss the applications of this in my next post - namely how this unlocks performance tuning, monitoring and source control.

Power BI MDS Dashboard

Master Data Services tables such as the staging tables have been well blogged and documented, but there are a host of other system tables and views that can help you with an MDM project. By querying this tables and views, we can gather MDS metrics such as the number of members, number of validation errors and staging errors, all of which can then be exposed to a data steward via a dashboard. Given all of the recent updates around Power BI, I decided to use Power BI Desktop in order to build an MDM dashboard for this blog post. The goal in this case is to produce a dashboard that allows a data steward to get an overview of metrics on an MDS model by model basis. Power BI Desktop The first step when working with Power BI is to import some data. Once we start Power BI Desktop the welcome screen gives you a shortcut to get data, or we can do this via the ribbon: In this case we need to take data from SQL Server, so clicking the SQL Server option requires us to enter a server name and optionally a database name. Once we do this and pick the MDS database, we get a list of tables choose from: Clicking on Load will load the tables into Power BI. Master Data Services Objects The question is what tables do we need? If you navigate the MDS database you will see that there is a mix of Master Data tables (e.g. post-fixed with _EN for entity), application tables and metadata views (prefixed with viw_SYSTEM). Its the system views that we need to start with, as we want to allow a breakdown by model, entity and version. To do this we need to load in the following views: --Models SELECT ID AS ModelId, Name AS ModelName FROM MDM.viw_SYSTEM_SCHEMA_MODEL --Entities SELECT E.Id AS EntityId, E.Name AS EntityName, E.Model_ID, e.Model_Name, E.Model_MUID FROM MDM.viw_SYSTEM_SCHEMA_ENTITY E --Get a count of versions --This is used as a total and by version status SELECT V.ID AS VersionId, V.Name AS VersionName, V.Model_MUID, V.Model_Name, V.Status AS VersionStatus FROM [mdm].[viw_SYSTEM_SCHEMA_VERSION] V Next it would be useful to report on the member count in each entity. This is a little harder as its not stored in the metadata tables, but we can combine the MDS metadata views with the SQL Server sys.dm_db_partition_stats Dynamic Management View in order to return the row counts per member: --This gives the row count per entity for all models SELECT E.Name AS EntityName, M.Name AS ModelName, M.MUID, M.ID AS ModelId, 'mdm.' + E.EntityTable AS TableName, P.row_count FROM MDM.viw_SYSTEM_SCHEMA_ENTITY E INNER JOIN MDM.viw_SYSTEM_SCHEMA_MODEL M ON M.MUID = E.Model_MUID INNER JOIN sys.dm_db_partition_stats P on P.object_id = OBJECT_ID('mdm.' + e.EntityTable) AND P.index_id = 1 Finally a few more metrics that would be useful are the number of staging errors and a breakdown of the members by validation status (Validation Succeeded, Validation Failed, Awaiting Validation). To do this we can take in the following tables and stored procedures: --Get the current count of errors that have occured during staging SELECT Entity_ID AS EntityId, ErrorMemberCount, Version_ID FROM mdm.tblStgBatch --Pass in the appropriate version id EXEC [mdm].[udpValidationStatusSummaryGet] --This gets the validation issues by model/entity: SELECT ValidationIssue_ID, Version_ID, Model_ID, ModelName, Entity_ID AS EntityId, EntityName, Member_ID, BusinessRuleName FROM [mdm].[viw_SYSTEM_USER_VALIDATION] Power BI Relationships and Visualisations Once we import the data, Power BI Desktop will detect relationships automatically for us, but can alter them by clicking on Manage Relationships on the ribbon if we wish. The following shows the relationships between the various MDS objects mentioned above: Once we’re happy with the model and the relationships, then we need to start dragging and dropping to build the Power BI report. As an example we can create a simple card visualisation to show the count of various metrics: This can be used to produce the following metrics, the count of entities, versions, validation issues and staging errors: Equally we can create a simple column chart by using the [mdm].[viw_SYSTEM_USER_VALIDATION] view. The count of issues is the Value section of the chart, whereas the Business Rule Name is on the Axis: Putting it all together we can get the following report, shown for the sample Product model: Conclusion This is just a start, there are other metrics that you could put onto an MDM dashboard – as an example you could include MDM ETL metrics if you’ve logged them. But overall the combination of the rich MDS metadata and intuitive Power BI interface means its relatively easy to produce simple dashboards.

Power BI Visual Studio Online Content Packs – Analysing BI Team Performance

I spend a fair amount of time championing the use of data and analytics around my client’s companies, convincing people from all walks of life that making their data more visible and available around the company will be hugely beneficial. I was recently challenged on this – If I’m such a firm believer in sharing data and improvement through analysis, why am I not sharing my own data? It’s a very good point, and not one that I had an answer for readily available. I’m currently engaged on a project which uses Visual Studio Online for both ALM and source control. Fortunately, Microsoft have recently released the PowerBI VSO Content Pack, making the data held within your account available for dashboarding. The above link describes the steps to connect, but I found the dashboards required a little setting up before they were completely useful. You are first presented with a mixed bag of charts and metrics, many of which will contain no data. This is because the data model has different entities depending on the project template (Agile, Scrum or CMMI) chosen within TFS, as well as the source control binding (TFS or Git). I removed many of the charts from the default dashboard then went exploring in the exposed report model – I was very happy with what I found, the VSO object model exposed pretty much every metric I could think of to report on the activity of a BI development team, including report templates for each template/source version you could be using. I gave myself 15 minutes to see if I could pull out a reasonable dashboard and was pleasantly surprised by what could be done in so little time. So – how do you analyse an analysis team? How is the Project Going? Firstly, we’re an agile team. We run iterations loosely based around Scrum principles, we manage our client projects through story backlogs and report daily on blockers, impediments etc. This information is key to us operationally, but it also tells a useful story. How many new features were added in the sprint? How many individual user stories, each representing a distinct piece of business value, were delivered? How much effort is remaining in the backlog (and therefore how many additional sprints would be required to deliver all known functionality?). How many bugs have been raised – and how effective are we at dealing with them? What’s the current Sprint Status? The day to day metrics also tell a valuable story – was the sprint smooth and predictable, or was it a rush to deliver towards the end? How much work is still remaining in the current sprint? Are there any blocked tasks or impediments that may be seen as a risk to delivery? What actual work has been done? Stories and tasks only tell one side of the story – a task may represent a change to a single piece of code, or a large update that touches much of the system. Simply counting tasks therefore limits our understanding of how productive we were during a sprint. Fortunately, we can also analyse the source control history, looking at the changesets committed and their contents. This provides some insight into the complexity of those completed tasks – it’s not a perfect measure but gets us a little closer. We can now ask questions such as: How many individual changesets were committed? Who commits most regularly? What kind of work was done – what is the most common file amended? Is there someone who hordes changes and causes potential problems by not regularly committing their work? Is our development behaviour changing overtime as we review our practices and learn from them? Finally, it’s also worth noting that the content pack has been fully set up with synonyms to allow for the Q&A Natural Language query bar to be activated. So if there’s a metric not included in the dashboards, users can simply type their question into the query bar. For example, I want to better understand the type of changes we’re doing – are we creating new entities or modifying existing code? For this, I tried the following, with the relevant chart appearing before I’d even finished typing: There’s a whole lot more content in the packs under the individual report tabs, but this gave me a good point to start that conversation. I can now provide weekly dashboard updates to my project sponsors, showing just how much progress we’re making. This is a huge boost to my ability to champion data and I’m expecting it to actually improve some of our working habits. Now, if anyone interrupts me mid-flow I can simply grab my phone, load up the Power BI app and pull out some insights from the team’s current performance, wherever I am.

XLCubed Version 5 Released

Interesting to see yesterday that XLCubed have released version 5 of their popular OLAP reporting tool. As I've mentioned before, I've always found XLCubed to be very powerful and easy to use, and therefore, in my opinion, it's definitely one of the best Excel OLAP reporting tools out there. Version 5 sees the introduction of some new features in the Excel/web editions, namely: Enhanced Excel integration with 'MicroCharts'; An improved user interface in the Web Edition; Print to PDF and enhanced save to Excel options in the Web Edition; Fine grain control over publishing Excel reports to the web. Personally, I'm used to using the Excel Edition to slice and dice cubes to get what I want quite easily. Although it's not entirely new to V5, you can carry out the same kind of ad-hoc analysis in the Web Edition, as shown in the following video. The end result is that you are able to build the kind of dashboards as shown below, whether in the web or the Excel edition: As usual, the evaluation editions for the Excel and Web versions can be downloaded here

Bullet Charts and Sparklines in Reporting Services 2005

Here at Adatis we're big fans of Reporting Services.  Yep, it has it's quirks and frustrations (don't talk to me about excel export!) but doesn't deserve all the criticism it gets in our opinion.  You can create some great looking reports with not a lot of extra effort over the defaults (though how many times have you seen a "slate" style report called "Report1" go into production!) We're also fans of what some might call the more "trendy" visualisation techniques such as sparklines and bullet charts and with Microsoft acquiring Dundas technology for use in SQL 2008, SSRS has some much improved functionality in this area.  I'll be taking a look at some of these new features in a future post. Whilst SSRS 2005 does have it's limitations, with a little effort you can still create some very effective information visualisations.  The dashboard below is built using completely standard functionality Adding sparklines and in-table bar charts is very simple (and pretty clever IMO).  This MSDN Article covers how to do it in detail but in essence you simply add a line or bar chart into a cell in your data table. Whilst the bullet chart (in the Sales Channel Performance region of the dashboard) may be simplistic compared to those available in SSRS 2008 or MicroCharts from XLCubed (from whom I nicked the styling for our dashboard) it still provides a very clear visualisation of the data.  This required a little more trickery in the underlying query but is also just a chart in a table cell - this time a 100% stacked bar.  I've explained how this works here

PPS Monitoring - Missing Parameters using a Reporting Services report in SharePoint integrated mode

If you're running Monitoring with SP1 applied and working with a Reporting Services report in your dashboard that comes from a report server in SharePoint integrated mode, you may experience an issue where your parameters don't appear in the Report Parameters section. There's a hotfix available for this issue upon request from MS support: http://support.microsoft.com/kb/956553/en-us

XLCubed Version 4 Released

XLCubed Version 4 was recently released, adding support for increased data visualisation via MicroCharts, as well as enhanced web publishing. I've been using XLCubed since version 1 and have always found it both powerful and easy to use. All the standard functionality that you'd expect from an OLAP/Analysis Services Excel add-in exists and is incredibly intuitive, yet there's plenty more if you want to dive deeper. In particular, I've always liked the following features: Wealth of right click functionality, such as remove only, keep only, propagate across sheets; Drag and drop pivoting - alter your report quickly without having to open toolbars or menus; Converting grids to formula - Allows you to easily create disjoint reports using Excel formulas; Linking grids - Means that an action in one grid (such as a drill down on a dimension) can automatically perform the same operation on multiple linked grids. As mentioned, version 4 sees the inclusion of the interesting MicroCharts feature to the XLCubed grids. This essentially means that you can choose to display just numbers, a graphical MicroChart visualisation of the numbers, or both (see below). The MicroCharts do not hinder the core functionality, such as drill down or pivoting, and can be turned on for any of the measures. There's a lot more functionality worth looking at, you can find out more at www.xlcubed.com.

Great Dashboard Design

PerformancePoint Monitoring has made it really easy to build and publish basic dashboards.  However designing really great dashboards is something that very few people achieve.  Despite my recent post on making your reports pretty, I'm not a fan of bells and whistles - in a pre-sales/sales situation they definitely have their place - but in a production dashboard it's hard to justify. A dashboard needs to deliver the right information in an easy to read and and easy to interpret manner.  Sounds simple right? IMO it's easier to come up with a bad dashboard than a good one but a good dashboard maybe harder to sell than a bad one particularly when the budget holder isn't going to be actually using it. One of those that has mastered dashboard design is Stephen Few and I'm unashamedly going to crib from his excellent book "Information Dashboard Design - The Effective Visual Communication of Data" for this post.  This is one of my favourite IT books and I urge anyone who works with any type of report to get a copy.  It's got loads of examples of dashboards and makes it clear why each is good or bad.  I guarantee you will design your dashboards differently after reading; use pie charts in your reports? You won't after reading this book! Anyway this post isn't supposed to be a free ad for Stephen's book, it's meant to give a few simple tips on making your dashboards a little better so here goes: Don't exceed a single screen - that bit of information that is off the bottom of the screen is less important, right? Wrong! You might even give the wrong message by leaving part of a report off the screen.  Most dashboard tools make it easy to provide navigation to detail so think more about why you need to put the information together.  And consider your layout - the brain naturally gives least attention to the bottom right area. Don't overuse colour or decorate unnecessarily - My post about adding pretty borders to your reports is a great example of unnecessary use of decoration;  What value does that border round the reports really add to the user-experience?  It often distracts from the real information.  Does a traffic-lighted map of the UK really add any value where a bar chart is much more readable? "Dude! - your dashboard looks like a real car dashboard! coooool".  And Sacha will kill me for this but please - NO 3D charts!   Your company's logo here? errr No thanks :) Too much or too little information- By nature a dashboard is a high-level overview - you really don't need sales figures to the penny.  Consider also the context of the data you are presenting:  actual sales for January may not be much use unless you can compare it to budget, forecast or previous information. Use variety only when necessary - dashboard designers often feel compelled to use variety where variety isn't necessary - I'm definitely guilty of this one. It's not a crime to have your dashboard only use barchart and grids.  Don't add a Radar chart just because it will add something different. Highlight Important Data - sounds obvious but not always simple.  You should be drawn straight away to the information that requires your attention the most.  In the key figures area of the dashboard below you can see straight away that Late Arrivals has some issues.  I particularly like the lack of a green icon against the other KPI's. Of course there's so much more to it than these few points and in fact when you consider the complexity of human visual perception perhaps it's a wonder anyone can get it right! So what does a great dashboard look like? Well everyone has their own tastes but compare the dashboard above with the one below and make your own decisions: one is simple, clean, readable and informative! In case you're wondering the lower dashboard was built using XLCubed.