Adatis BI Blogs

Geospatial Data Visualisation R, Python or BI Tools

IntroductionAs a newbie to Adatis I begin my journey of learning the vast, ever-growing languages and platforms. In my first few steps I am taking a problem I already have experience with in R and understanding if the solution in Python presents similar hurdles as a comparison between the two languages.Before joining Adatis I have just graduated with a degree in Biology. During my course I discovered a website containing animal GPS tracked movement data sets, free to use from movebank. So, I am using a data set I am familiar with that contains GPS tracking data of 27 Vulture Turkeys (Cathartes aura) and using both tools to plot the 350,000 points on a map. How the data set looks as a csv file loaded into R’s viewing window.Note: While both platforms are open source there may be packages out there that would optimise the process I was trying to achieve, as a beginner with limited knowledge this was my experience of trying to map geospatial data in R compared to Python and ArcGIS in PowerBI. PythonPackages concerned: basic (numpy, pandas, matplotlib) geometry (geopandas and shapely)Python failed to install the geopandas and shapely package and I had to use the pip install command in the command window, alternatively the packages can be downloaded in the anaconda environment. The data loaded in fine as expected. First the longitude and latitude values were zipped together and shapely’s apply point function was used to turn them into a single co-ordinate value. ------> If I were to plot now, I could retrieve a world map (template from the geopandas library, changing the fill, colour of sea and country outline colour) with custom size, shape and a single colour for all the data points which is something, but we can make it more informative. Now when plotting I could not find a function that would allow python to sort colour by groups based on a string value (name of vulture). Instead I created a function just to convert every name to a number, so each co-ordinate had a group number assigned to it depending which vulture it came from. This means a list of 350,000 numbers ranging from 1:27 can be classed as colour for the points. After plotting python offers a zoom function which allows you to zoom in down to a single point but is limited by the resolution of the map/plot. I did however limit the axis to over the Americas just for initial plot clarity. The point conversion and plotting process takes around 15 seconds according to python sys.time() function placed at beginning and end of code. Ignoring the library imports the whole process took 14 lines of unique code. RPackages concerned: rworldmapNow R has some useful packages like ggmap, mapdata and ggplot2 which allow you to source you map satellite images directly from google maps, but this does require a free google API key to source from the cloud. These packages can also plot the map around the data as I am currently trimming the map to fit the data. But for a fair test I also used a simplistic pre-built map in R. This was from the package rworldmap, which allows plotting at a country level with defined borders. Axes can be scaled to act like a zoom function but without a higher resolutions map or raster satellite image map it is pointless to go past a country level.R works a little different to Python, initially we plot a blank map and in a sequential but new line add the points. This differs from the points being plotted and the map being contained in the plot function.The first step is to get the map from the rworldmap library and plot limiting axes to act a zoom. However, the axes do not automatically show this way. So next, manually define the intervals of axes aka longitude and latitude. Finally plot the points with customisation on size, shape and colour. The colour of each point can be assigned using ‘col=factor(dataframe$column)’. So, no need to make a group number index as we did in python.Using the function system.time(), R reported the plotting take 7 seconds but the actual graphic doesn’t show up for a further 8 seconds making the total time 15 seconds, same as python. R however, again ignoring the library imports executed this in only 6 lines of unique code!Just for fun I did check for a correlation between distance travelled and weight and a regression analysis would suggest a statistically significant relationship, but without a time scale we cannot conclude anything e.g. some birds could have been tagged longer than others resulting in greater distance travelled. Is geospatial suitable to present in R and Python?During the write up of this blog I read Jason’s blog(s) on ArcGIS Maps for PowerBI (Jason’s blog), So I trialled my own data. Now quite simply the program is drag and drop. An image presenting the same final information from both R and Python can be created in roughly 2 minutes. One of the feature I really got on well with is the variable zoom down to the precision of a street. If you have read Jasons blog you will know about all the customisable features, if not read it! But quite simply ArcGIS Maps gave you the option to fully explore the data visually but is limited by analysis and ‘map features’.PowerBI handles statistical analysis similarly to excel, possible but tedious having to generate each column. The map feature limit poses another problem; ArcGIS (free version) only allows for 1,500 ‘map features’ (5,000 in paid version) which equated to around 30,000 points. With this data set it meant not all the data could be plotted. I thinned out the data in R just by removing every 9 of 10 points and considering the data was continuous this meant the readings were once every half/full day instead of every 1-3 hours, this shouldn’t really affect the overall distance travelled. I suspect a limit like this is in place to save the performance as a similar platform called Tableau plotted all of the data but at the cost of performance to the point that the zoom navigation was unusable. ConclusionIn summary, I feel like difference between R/Python and PowerBI just depends on the user. PowerBI makes the data very accessible and gives great control for visually exploring the data to a business analyst for example. But if the user is more technical e.g. data scientist, then R/Python might be more fitting. Between R and Python there was no difference in performance with this data set but that could be because in the grand scheme of things 350,000 isn’t ‘big’. There was a notable difference in the amount of code needed but this could be down to my greater experience with R and lacking knowledge in a Python package capable of achieving the same task.

Loop Through Webpages with Multiple Parameters in Power Query

Back in February 2016, I posted a step by step article (found here) on how to loop through a website and dynamically pull down multiple pages into a single table in Power Query. I was recently approached by a gentleman who had read my blog, wanting to enhance the looping process. The requirements were: Current Problem: -          The user can currently only loop through Year, having to hard code the Page and Month. Requirements: -          Dynamically loop through three webpage parameters and output them into a single table – highlighted below: -          Specify the Page, Year and Month to loop through o   Years 2010 - 2017 o   Months 1 - 12 o   Pages 1 - 10 Parameters Whilst GUI Parameters are not a new concept in Power Query, they did not exist when I wrote my initial blog.  As a result, I had to code a parameter into the M query and then define it as a function.  A similar concept will be adopted here, but the out of the box parameter functionality makes life so much easier now. I previously wrote an article explaining how to set up parameters (here), so I will simply show you the output of them below: Note – all three parameters are set to Text.  This is needed in order to pass the parameters into the website address later on. Web Connection The first step is to import the data from the website (shown above) using the web connector.  To pass our parameter values into the URL, click the cog in the first Applied Step – typically called ‘Source’. This will open a new window, where by design, the hardcoded URL is shown under the Basic radio button.  Select Advanced and you will see additional configuration capabilities. URL Parts enable you to build up elements of the webpage, whilst dynamically passing in parameters.  The URL Preview will show you how the URL looks with any parameters or URL Parts.  All other options can be ignored for this example. The URL Parts had to be broken down quite significantly and I will show you both the code and how it looks in the Advanced window. User Interface: The sections highlighted in red are the parameters and sit in between some of the hard-coded URL text Code Breakdown: -          Text = -          Parameter = [Page] -          Text = &view=calendargross&yr= -          Parameter = [Year] -          Text = &month= -          Parameter = [Month] -          Text = &p=.htm URL Preview:{Page}&view=calendargross&yr={Year}&month={Month}&p=.htm Year/Month/Page Permutations Table Before creating/invoking the function, a table that contains the different years, months and pages is needed.  There are many ways to achieve this, but I used a cross join technique (found here) with three disparate tables.  I followed the end user requirements and end up with a table like below: This will be the final table where the function is called. Function Let’s go back to the original table, where the parameters have been applied to the Web URL.  Right click the table and select Create Function.  Nothing else is required (apart from naming it), as Power Query will automatically package up the table and you end up with a new Function like below: The last step is to invoke the function.  Go back to the Permutations table (called ‘All Data’ in my example), and go to Add Column > Invoke Custom Function. When prompted with the next screen, map each parameter to the equivalent column.  The red highlighted values are the columns and yellow contain the parameters.  Both are named the same in my example and make mapping even easier. The function will loop through each URL permutation, dynamically changing the parameters on execution.  Once it has finished, tidy up any errors (for Years, Pages and/or Months) that do not exist.  Apply all transformations within the Query Editor and that is it – the data is ready for Reporting!  You may need to configure some of the measures (e.g. Total Gross) to be numbers, but that is easy enough to do within the Query Editor. The example in this blog can be adapted in many ways and illustrates how dynamic Power Query can be – both in Excel and Power BI. Further Reading o   Cross Join in Power BI – o   Reza Rad Custom Functions – Contact Me If you would like a copy of the PBIX workbook or have any general questions, feel free to leave a comment below. Twitter:                @DataVizWhizz

Themes in Power BI – Part 2

In September’s Power BI Desktop release, one of Microsoft’s announcements was a new offering of Themes.  My previous blog back in March 2017 explained what a Theme could (or couldn’t!) do, but I am pleased to say Microsoft listened to the community and have improved it greatly. Originally, we were limited to setting a few page or chart colours - mainly tables and matrices.  Appreciated this was a first release from Microsoft, but it’s safe to say I was left uninspired.  The good news is Themes can now do a lot more, including: -          Set the font size for specific charts e.g. 14 -          Set visuals to use a specific font type style e.g. Calibri -          Turn off/on chart data labels -          Set default property values for a chart -          “*” functionality that will apply a specific property across all charts This blog will focus on how to not only use the new features, but how they can be brought together to form a larger, organizational configuration file. If you would like to follow my examples, please ensure you have downloaded the latest version of Power BI Desktop – found here. New JSON In order to extend the Adatis Theme, a new section called visualStyles must be added to the JSON.  The format is as follows: The highlighted JSON sections give you the ability to use ‘*’ syntax, which means the Theme format will apply to everything underneath it.  This is useful for things if you want to apply a global font size across the entire report, rather than needing to configure every chart/visual. Now, let’s pick a simple use case.  Client A requires the following to always be applied to a Pie Chart in Power BI Desktop: -          Switch on the Pie Chart Legend -          Set Text Size to 14 -          Set Font Family to Calibri Light. Default Pie Chart: -          By design, Legends are turned off. The configuration file is tweaked to contain the new JSON - highlighted in yellow. {   "name": "Adatis",   "dataColors": [ "#002C5C", "#006FBA", "#81C341", "#F4AA00", "#003876", "#00448F", "#0051A9", "#007ED4", "#008DED", "#74B238", "#8EC954", "#DB9800", "#FFB60F" ],   "background": "#FFFFFF",   "foreground": "#002C5C",   "tableAccent": "#81C341",   "visualStyles": {     "pieChart": {       "*": {         "legend": [           {             "show": true,             "fontSize": 14,             "fontFamily": "Calibri Light"           }         ]       }     }   } } The final step is to import the modified JSON file, resulting in a completely different looking Pie Chart.  Applied Theme: 1.       The traditional Adatis chart colours have been applied. 2.       A Legend is visible – in both the properties pane and the visual itself. 3.       Font family and Text Size are updated. Adatis Theme Extended I will now add some additional configurations to the Adatis Themes file. -          Turn On all Chart Legends. -          Pie Chart Label Style set to Percent of Total. -          Set Card Label and Category Label Font Size to 12, but all other visuals use 10. This changes the JSON code significantly, but by following the visualStyles format, this was more straightforward than it seems. {   "name": "Adatis",   "dataColors": [ "#002C5C", "#006FBA", "#81C341", "#F4AA00", "#003876", "#00448F", "#0051A9", "#007ED4", "#008DED", "#74B238", "#8EC954", "#DB9800", "#FFB60F" ],   "background": "#FFFFFF",   "foreground": "#002C5C",   "tableAccent": "#81C341",   "visualStyles": {     "*": {       "*": {         "*": [           {             "fontSize": 10,             "fontFamily":  "Calibri"           }         ],          "legend":[{                     "show":true                 }]       }     },     "card": {       "*": {         "labels": [           {             "fontSize": 12           }         ],         "categoryLabels": [           {             "fontSize": 12           }         ]       }     },         "pieChart": {           "*": {             "labels": [               {                 "labelStyle": "Percent of total"               }             ]           }      }   } } The report is significantly different.  I will pinpoint a specific area of the report to illustrate where the configurations occurred. Before: After: 1.       Global font size is set to 10. 2.       Percent of Total setting applied to Pie Charts. 3.       Category Labels and Labels set to font size 12 for all Cards. What else can Themes do? This article is only scratching the surface at what Power BI Themes can now do.  Click here to find the extensive list of properties that can be configured.  It is worth mentioning that if you have limited JSON experience, some of the configurables can be a little tricky to set up.  However, there are quite a few blogs out there that can get anyone up to speed, which I will further provide below.  Limitations There are a couple of limitations, as well some things I have experienced: -          Debugging JSON (natively within Power BI) is not supported and a syntax error occurs. Power BI will give you a generic error message like below: -          Data configurations are not supported ·         This is understandable, though.  Every report tab contains different slices or filtering of data, but colours, fonts and chart properties tend to remain the same. -          Setting Card font and background colours does not appear to be possible yet.  Or at least I have been unable to get it to work or find any supporting documentation from Microsoft to suggest it is. -          Overriding a Theme ·         Individually changing a chart property will override the Theme JSON file.  I was caught out by this and couldn’t understand why the configuration wouldn’t change on a Pie Chart. ·         To get around this, use Revert to default within a chart property and it will re-apply the Theme.  This option exists in each top-level chart property – see example below.   Further Reading o   Jason Lengstorf’s JSON Blog – o   Power BI Desktop September Update - o   Power BI Report Themes BI Blog - o   Themes Gallery (containing example JSON files) - Contact Me If you would like a copy of the PBIX workbook/JSON Theme file or have any general questions, feel free to leave a comment below.  I am particularly interested to see what Theme files other people have created and if there any other known limitations.    Twitter:                @DataVizWhizz

Direct Query vs Live Connection in Power BI – Part 2

This instalment of the Power BI blog series focuses on the two other differences between Direct Query and Live Connection in Power BI.  If you haven’t done so already, be sure to check out Part 1, which concentrated on Quick Measures and Relationships. Feature Differences Find below a Power BI Direct Query screen, focusing on the final 2 differences (highlighted green): o   New Hierarchy o   Change to Import Mode New Hierarchy Hierarchies are very useful with Power BI, especially when wanting to drill up and down within levels of data e.g. Day > Month > Year. Let’s try and create a New Hierarchy in Live Connection mode: New Hierarchy does not appear in the list of options. This is because hierarchies are typically created within the OLAP Cube, therefore, it makes sense why the option is not available.  However, I would argue that Direct Query allows you create a hierarchy, so why can’t Live Connection? Direct Query stores the hierarchies within a smaller Power BI model (Tabular Cube) running on a report developers local machine.  Ironically, Live Connection does exactly the same thing when you create an ad hoc measure in Power BI. It is surely a matter of time before hierarchies are also supported. Change to Import Mode Direct Query mode supports the ability to easily switch to Import Mode.  This is a useful option, especially for a self-serve analyst wanting to make transformations and shape the data.  By right clicking the highlighted option below, a simple wizard appears: When the import completes, the data (from tables in the database) will be stored in a Tabular cube on the local machine where Power BI Desktop is running.  If we navigate to the same area within Live Connection, notice there is no option to change to Import Mode (“click to change”: Once again, I am not really sure why this feature isn’t support in Live Connection mode.  I can understand why it would be more difficult to convert a Multidimensional Cube (MDX) into a local Tabular Cube (DAX), but if Live Connection already points to Tabular, it’s an exact copy.  To vote for Import Mode functionality within Power BI Live connection, click here. Coming Soon Part 3 is the final instalment of the blog series, specifically focusing on the underlying Power BI Data Models in Direct Query and Live Connection. Further Reading Other than the Power BI Blog, there are some other great pages out there too: o   Power BI Blog - o   Import Mode vs. Direct Query - o   Direct Query in Power BI - o   Live Connection in Power BI –   Contact Me If you have any questions or thoughts, please leave a comment below.  My Twitter details are also provided. Twitter:                @DataVizWhizz

Direct Query vs. Live Connection in Power BI – Part 1

There are lots articles out there that compare Import Mode vs.Direct Query, but people rarely talk about if there are any differences between Direct Query and Live Connection.  “Wait. Aren’t they the same thing?”  Well, not quite. The first big difference between Direct Query and Live Connection is the type of data source used for the connection.  The former uses a database connection (typically SQL Server), whilst the latter requires an Analysis Services OLAP Cube. This blog series won’t explain what Direct Query and Live Connection can do (found here and here), but will instead highlight the other subtle differences between the two connections.  Feature Differences There aren’t any features I can find that are available in Live Connection and not Direct Query.  However, there are a few the other way around. I will first show you a Power BI Direct Query screen, focusing on the 2 of the 4 differences (highlighted green): o   Quick Measures o   Relationships Quick Measures This feature was first released in April 2017 and is available in Import Mode and Direct Query.  It enables a non-technical Business Analyst to create relatively complex DAX aggregations, with the use of a nice Wizard.  To access Quick Measures, right click on a measure or attribute and select Quick Measures.  Let’s try the same thing in Live Connection mode. You’ll notice that Quick Measures is missing from the list of options. I find it bizarre that Live Connection doesn’t support Quick Measures, especially when using a Tabular Cube as the connection.  The Power BI DAX language and engine are the same as Tabular, so you would think the two are compatible! Please vote for this feature to be added into Live Connection - Relationships There are two tabs displayed on the left-hand pane in Direct Query mode. If you click the highlighted tab, it opens a Relationships page – where you can begin to join datasets (from the database) together.  I created a manual relationship that joined DimEmployee and DimDate together – as shown below.  No relationships are created in the underlying SQL Server database, but instead stored within the Power BI model. In Live connection, the left-hand pane looks bare: There is no option to create any form of Relationship against the Live Connection Tabular Cube.  This kind of makes sense because a BI Developer would be the person responsible for creating relationships within an OLAP Cube.  I would argue that if you want the ability to mashup data or create your own relationships, you shouldn’t be connecting to a Cube in the first place. Coming Soon Check out Part 2 of my blog series - available here.  The focus of this article are the Add Hierarchy and Change to Import Mode features. Part 3 will conclude the Trilogy, where I go off-piste slightly and focus on the Data Models in Direct Query and Live Connection. Further Reading Other than the Power BI Blog, there are some other great pages out there too: o  Power BI Blog - o   Import Mode vs. Direct Query - o   Direct Query in Power BI - o   Live Connection in Power BI –   Contact Me If you would have any questions or thoughts, please leave a comment below.  My Twitter details are also provided. Twitter:                @DataVizWhizz

Coming Soon in Power BI - June 2017 & Beyond

Whilst I wasn’t fortunate enough to attend the Microsoft Data Summit on June 12-13 2017, I managed to view the keynote through a live webinar.  The majority of this session contained details of what is coming soon in Power BI, more than likely over the next few months.  We all know the ‘roadmap’ can be as long as a piece of string, but I am hopeful the features mentioned in this blog will all be available by the end of the calendar year. Without further ado, I will move onto the features. Out Now The below features were actually released in the June Power BI Desktop Download. ü  Data bars for new table & matrix (preview) o   This is the concept of having spark lines within a table or matrix, providing trend analysis in a visual way.  ü  Fonts in Visuals o   You can now alter the font in a visual, the same as the functionality within a text box property.  These small configurations actually make a big difference. Coming Soon   ü  New SKUs for power bi premium specifically targeting embedded workloads...  EM1 and EM2 SKUs starting at $625 per month.  The difference pricing tiers are:   ü  Embed using new SKU’s into SharePoint and MS Teams easily o   I wander if the code will be completely free? ü  New Visio diagram control – auto mapping to data model entities.  Copy and paste a Visio diagram into power BI and it hooks up the data elements. o   This looked really slick in the Microsoft demo but previous experiences with this functionality un SharePoint was a little clunky and fiddly. ü  Embed a Power APP into Power BI – with write back functionality  o   This could replace MDS for certain scenarios, depending on the how complex your reference/master data is.    ü  New Quick Measure gallery – a new DAX measure, upload it to the gallery and have your name referenced in the Product. o   Chris Webb recently had this privilege, with a link to his blog series here. ü  Better Custom Visual support. o   New button to install custom visuals from store without having to download and import first. ü  Ability to remove stop words in the Word Cloud Custom Visual ü  Drill through to other report tabs using Page Settings. o   This is huge for Power BI. Something seemingly so simple is not available but I for one am very excited for when this feature is released. ü  Bookmark pane to facilitate a ‘save state’ (all current filter settings).  In addition, it will support analysts who want to tell a story or provide a walkthrough. o   Show/hide visuals on a page when building story ü  Create buttons o   Turn a button into a report page link o   Use buttons to show/hide visuals o   Again, a simple, but MASSIVE feature for Power BI.  This really helps with the user’s reporting journey. ü  Quick insights type functionality built into visuals and desktop to answer “why” questions -  using AI ü  Waterfall chart improvements ü  Cortana Improvements o   More integration o   Using colleague names in Cortana to find PBI workbooks o   Conversational Q&A – refine answer with further questions ü  What-If Analysis o   Like Excel ü  Annotation support for presenting back to customers.  o   In effect, you can doodle or draw on top of a Power BI report. Further Reading I would advise subscribing to the Power BI Blog, in which you will hear of any new announcements to features, etc.  In addition, there are some other great pages out there too: o   Power BI Blog - o   Power BI Premium Capacity White Paper -   o   Chris Webb Blog Page – Contact Me If you have any questions or thoughts, please leave a comment below.  My Twitter details are also provided. Twitter:                @DataVizWhizz

Handling Web Page Errors in Power BI Query Editor

I was recently asked if it was possible to handle Error Rows in the Query Editor and more specifically, to web pages that do not exist from source. The user broached me with this question after following a previous blog on looping through multiple web pages in Query Editor – found here. In my blog post, there were no errors from the loop output but in the user’s workbook, they were losing data directly below the expanded error row. My first suggestion was to use the Remove Errors UI option, which would delete the problematic row. However, they wanted a way of treating the errors as an Unknown Member and did not want to lose the data entirely It is assumed consumers of this blog already have knowledge of the Power BI Query Editor and the UI functions used in the examples.   Scenario I created a new workbook that connects to local Council Facebook pages. Each page has a unique Object ID, which will be used as the parameter in the loop. The Council “Camberley” deliberately contains an invalid Object ID. I then proceeded to create a Parameter and Function, replicating the exact steps from my previous blog. When I invoke the function (through the use of a Custom Column), the following is produced: As you can see, Camberley Council produces an error in the fnInvokeCouncils column. If we expand the contents (highlighted in yellow), the Facebook page data appears. Upon further inspection, the Farnham and Rushmoor council data are available, but Camberley (incorrect Object ID) and Guildford are not. The error message is a little misleading but let’s save the debugging debate for another day. The key observation is “Guildford” data is not available, simply because it comes after “Camberley” in the list. Whilst we want to see errors in a Query, we do not want them causing data loss. Resolution As I mentioned at the beginning of this article, using the Remove Errors function would prevent the loss of Guildford data. However, the user needs to handle errors as Unknown Members and conform to a typical Kimball Data Warehouse.  I am sure there are many ways to fulfil the requirement, but here is how I approached it: 1.       Duplicate the existing ‘Councils’ query, naming it ‘Councils Error Rows’. 2.       Switch back to the ‘Councils’ query and Remove Errors, leaving only three records:   3.       Expand the fnInvokeCouncils column, opening up the underlying fields and data: 4.       In the ‘Council Error Rows’ query, apply the Replace Errors UI function - inserting the string “Validation Failed”. 5.       Add a Custom Column, writing the following M: if [fnInvokeCouncils] = "Validation Failed" then 1 else 0 This is a simple IF statement that sets the error rows to 1. 6.       Now filter the selection to only display ErrorRows with the value of 1. This is achieved by using the Filter Rows UI function. The ‘Council Error Rows’ query now looks like the following: 7.       The columns must match the ‘Councils’ query, meaning 4 new Custom Columns are needed. We can hardcode the values and remove any unwanted columns.     8.       Right click on the previously modified ‘Councils’ query select Reference. Name the query ‘All Councils’. This makes it easier to track the transformations and persists any future changes made to the raw data.   9.       Within the ‘All Council’ query, select Append Query transformation. Choose ‘Council Error Rows’ as the table to append and click OK.   10.   We are now left with a Union of both datasets, containing the Unknown Member and data from all other Councils. 11.   The Unknown Member record is visible within the final Query.     Summary I have shown you how to get around two different problems with Error Rows in the Power BI Query Editor. The first is how to retain all correct data, whilst the second is keeping error records and inserting them back into the dataset as an Unknown Member. Both methods are dynamic, meaning that if we added a new Council called ‘Basingstoke’, we would see the underlying data regardless of whether the underlying Facebook Object ID exists or not. Whilst none of the transformations in this blog are overly technical, there are quite a few hoops to jump through to retain all data from a web page/Facebook. Having said that, I am sure there are a few other ways people could approach this problem. I would be really interested to speak to anyone who does have an alternative solution. Further Reading Query Editor Basics (Power BI blog) – Unknown Members in Data Warehousing - Loop through Multiple Web Pages using Power Query - Contact Me If you would like a copy of the workbook containing the examples or want to know more about the Query Editor within Power BI, please leave a comment below. My Twitter details are also provided. Twitter:  @DataVizWhizz

Slicer Properties in Power BI: Header or Title?

I recently shared a Power BI Report with a customer and they reported that the “Clear Selections” option (Eraser icon) was not available when they used the Slicer.  It took me a while to work out why this was. This blog will illustrate how you can lose the “Clear Selections” functionality, depending on what Format settings are applied to a Slicer.  I will also show how to work around the formatting constraints, which will help prevent you from ever experiencing the issue. In order to follow the examples, you will need access to Adventure Works 2014 SSAS Tabular and of course, Power BI Desktop.  Follow the appropriate links to download what you need. Use Case I have created a very simple report (available on request), using a Product Category Slicer and a Map to display Internet Total Units.  o   Format property. o   Select All is off, ensuring multi select is allowed. o   Header is off. o   Title is on, used instead of Header and configured to look like the below: After multi-selecting the attributes in the Slicer, I tried to “Clear Selections” – which is normally available as an option like below: However, it does not appear in the report I created: The reason I cannot see the “Clear Selections” option is because I am not using a Header.  When I use this instead of Title, the Slicer contains the required feature. Header vs Title This made me wonder what other differences are there between Title and Header but in fact, there aren’t many.  The subtle differences to be aware are displayed and described below: o   Header can have an Outline, which includes the ability to underline text. o   Header is constrained to displaying the name of the attribute (“Product Category Name”), whereas a Title can be customised (“Select Category”).  You can rename your source data attribute to get around this, however. o   Title enables you align the text, but this is not possible with a Header. o   Header contains the “Clear Selection” option. Workarounds There are couple of ways to work around the missing “Clear Selections” issue, which I will demonstrate below. Option 1 I could simply switch from a Title to a Header, but then we would lose the ability to centre align the description.  Instead, we can set both options to ‘On’.   After some formatting, the Slicer is pictured below. o   The Header text is set to White, therefore, it not visible.  o   You can still see the “Clear Selections” option. o   However, there is white space in between the Title and Header.  This not only wasted space, but also looks a bit strange from a visual perspective. o   The actual sections themselves (e.g Bike) are a bit squashed and disproportionate to the Slicer border.   Option 2 The other workaround involves a little more work, but gives the impression that only a Title is being used, but with the added functionality of the “Clear Selections” option.  Furthermore, there is no longer the white, empty space.     o   Two objects were used to produce the result: ·         Text Box ·         Slicer o   The text box has been formatted and labelled with ‘Select Category’, using the Title property.  This allows you to change the font colour. o   The Slicer is using the Header option, ensuring “Clear Selections” is available. Title is turned off to reduce the empty space.   Summary So why are there two types of properties for Slicers and other objects like Text Boxes?  I am not entirely sure myself, but occasionally, you may want a Slicer to contain a Title with an Attribute Name as the Header underneath. This gives you a Title > Sub-Title concept (illustrated below).  I do understand why the “Clear Selections” feature is specific to the Header setting, as it directly relates to the Slicer attribute.   There are certainly ways of solving this issue - here are just a couple of suggestions: 1.       Providing continuity across both the Header and Title format settings.  Quick fix, but not necessarily solving the ambiguity around both options. 2.       The Header setting is contained within the Title, meaning both sets of functionality are merged into one.  It would make the usability of a Slicer (especially from a development perspective) a lot better. Whatever Microsoft decide to do in the future, I really hope they tidy up and fully define the Settings within Visuals, Slicer and Text Boxes.  Some things are confusing, especially to self-serve Business Analysts who rely on intuitive reporting tools. Further Reading Power BI Documentation – Power BI Community Blog -   Contact Me If you would like a copy of the workbook containing the examples or want to know more about any of the Power BI settings/properties not mentioned, please leave a comment below.  My Twitter details are also provided below.Twitter:                                @DataVizWhizz

Themes in Power BI

Microsoft recently released Themes to a Preview version of Power BI.  The concept is pretty simple – imagine having standardized colours that can be applied to charts and matrix visuals? This is where Themes come in. The theory is this will ensure Power BI developers adhere to company policies and can do so quickly and efficiently. The reality is Themes are still lacking a lot of key configurables, but I will get onto that later.Whilst this blog is not going to show you how to import or apply a Theme, it will demonstrate how to create one and what each configuration means. For a great high level walkthrough of the feature, click here. The aim is to not only show how this makes a developers life easier, but where it can actually be improved too. JSON Configuration File I have pasted the following code into Notepad++, which incorporates the Adatis branding and colour scheme.  For many developers, you will recognize the coding language – JSON (JavaScript Object Notation). If you would like to know more about JSON, check out Jason Lengstorf’s blog. Use the below code as a template for your Theme: {   "name": "Adatis",   "dataColors": [ "#002C5C", "#006FBA", "#81C341", "#F4AA00", "#003876", "#00448F", "#0051A9", "#007ED4","#008DED","#74B238","#8EC954","#DB9800","#FFB60F" ],   "background": "#FFFFFF",   "foreground": "#002C5C",   "tableAccent": "#81C341" } Notice there are a number of ‘#’ within the code.  These are Hex numbers, in which Power BI natively uses to determine a shade of colour.  This is common across many reporting/image editing tools.  Configurations Explained I will now explain what each line of code means and how it will effect a simple un-themed report.   1.       The name of the Theme. 2.       The colours that appear in a pre-set chart colour palette and the order of which a visual applies them e.g. Dark Blue first, Light Blue second, Green third, etc. 3.       Font colour for the matrix or table visual. 4.       Primary background colour for a matrix or table visual.  Even though the setting says’ ‘foreground’, it is more of a background colour! 5.       The table accent applies to a matrix or table visual, displaying as the grid outline or contrasting background colour, depending on what type matrix/table is applied.   Applying a Theme I have created a quick Power BI report, which is using the default colour schemes applied.  The only formatting applied was to the Matrix visual and this was simply changing the style of the grid.  Here is how the report currently looks: Now let’s import and apply the ‘Adatis’ theme.  You will see some of the visuals have changed, whereas other elements of the report haven’t.  I will explain in greater detail. 1.       The Report Title, Slicers and Titles are not affected by the Adatis theme.  This is because you cannot currently configure fonts in the JSON file. 2.       The matrix visual has updated, using the three configurations form the JSON file: a.       background (White) b.      foreground (Navy Blue) c.       tableAccent (Green) 3.       The Pie and Bar chart have picked up the colours in the order they are specified in the JSON file, using the dataColors property.  However, note the colour of both data points on the Bar Chart – rather than use different colours, it uses the same primary blue colour.  Not ideal if you want to plot one colour for ‘Male’ and another for ‘Female’. 4.       The Waterfall chart hasn’t changed at all.  We would expect the Adatis colours to have been applied, but this visual seems to ignore the configuration. We can change the Waterfall Chart manually, using the imported Adatis Colour Palette.  Here is how the chart now looks: What’s Next? Whilst the concept of Themes is great, the current preview functionality is extremely limited and very much ‘Version 1’.  I am a little disappointed with how inconsistent the application of a theme is, which were highlighted in my points above. With the majority of new Power BI features, Microsoft continue to improve the usability and functionality over time, so I am very hopeful more Theme properties will be opened up in the underlying JSON configuration file.  Options such as font colours/styles, consistent theme application (for all visuals) and company logos are all necessary for this to become really powerful.  There were talks of a CSS type of configuration in Power BI, but this has yet to announced or released.  Imagine how powerful and cool that would be? As a BI Consultant, I am not currently comfortable with demoing Themes to a client, simply because of the clear gaps.  Once the feature is more mature and in GA, I think enterprise companies will really benefit from standardising reports across their business.  For now, Themes will remain a glorified colour palette. Further Reading o   Jason Lengstorf’s JSON Blog – o   Power BI Report Themes BI Blog - Contact Me If you would like a copy of the workbook or have any questions about this blog, please leave a comment below.Twitter:                                @DataVizWhizz

DAX Calculated Tables in Power BI

Calculated Tables have been around for a while in Power BI, but today I found a real life scenario for using them. I connected to Manchester United’s Facebook page and plugged the data into Microsoft’s Cognitive Services. In essence, I want to measure Sentiment Analysis and find out how many times a supporter has mentioned one of Manchester United’s rival football teams. You are probably wondering what this has to do with a DAX Calculated Table, so let me explain.  I have pulled down the Facebook data (from the API), but when trying to undertake a GROUP BY for Likes, Loves, etc. in the Query Editor, Power BI hangs and the query never resolves.  Whilst I cannot pinpoint exactly why this happens, I would guess that the number of API calls to Facebook are exceeded and some form of timeout occurs. This blog will walk you through how to create a DAX Calculated Table and apply a Group By to get the count of reaction types.  There are a number of articles already out there showing examples of a Calculated Table and I have provided the links at the bottom of the post. Existing Query Currently, my query looks like the below: The only remaining task is to apply a COUNT of all records, GROUPED BY Reactions.Type and id.  If I try and use the Query Editor functionality within the UI, the transformation step never completes. I am left with the following message in bottom right hnd side of the Query Editor: After waiting two hours for the GROUP BY query to resolve, I gave up.  The alternative is to use a DAX Calculated Table and I will show you how I achieved this: Calculated Table In order to create A Calculated Table, come out of the Query Editor, navigate to the Modeling tab and select New Table. Now we can write some DAX.  Pasting the below syntax into the new Table will achieve the Group By on the ‘Reaction Man United’ query. ReactionTotalsManUnited = GROUPBY (   ReactionsManUnited, ReactionsManUnited[id], ReactionsManUnited[reactions.type],  "TotalReactions", COUNTX( CURRENTGROUP(), ReactionsManUnited[reactions.type])  )  Let me break down the code: o   Calculated Table named as ‘ReactionTotalsManUnited’ o   GROUP BY function, grouping all reaction Id’s (‘id’) and types (‘reactions.type’) o   COUNTX function applied over reaction type, using the CURRENTGROUP() function to ensure the unique count is made by Id and Type within the ‘ReactionsManUnited’ table. Finally, to test the new DAX table works, I have created a basic KPI Card.  It is aggregating exactly expected. Conclusion Let’s recap.  I have shown you how to use three DAX expressions, albeit nested together in one statement.  This demonstrates how powerful and flexible the language is. o  GROUP BY o  COUNTX o  CURRENTGROUP I made use of the Calculate Table functionality due to poor performing queries made to the Facebook API.  There are many other reasons for using them, with some good examples provided in Chris Webb’s blog.  Where possible, you should always use Query Editor (and M language) for ad hoc transformations, although a DAX expression can sometimes get around slow performing queries.  DAX measures are evaluated at run time and in memory, whereas the Query Editor needs to pull down and refresh data after every applied step.  I would strongly recommend that all budding Power BI developers learn DAX, in order to get the most out of your Power BI reports.  The Calculated Table function is just one of over 200 different expressions within Power BI. Further Reading o   Microsoft MSDN –    o   Power BI Blog - o   Reza Rad’s Blog - o   Chris Webb’s blog - o   List of DAX Expressions (Paul Turley’s blog) - Contact Me If you would like a copy of the workbook or have any questions about this blog, please leave a comment below or contact me on Twitter (@DataVizWhizz ).

UK Power BI Summit 2017: What’s next for Power BI?

During the keynote at the UK Power BI Summit, Microsoft announced some new features coming into Power BI. Chris Webb also provided a session on Azure Analysis Services, hinting at what may be next and more importantly, how they may impact Power BI.  Without further ado, I will dive straight in. Power BI Visuals Some exciting visuals were demoed by Will Thompson (Program Manager for Power BI). They were shown in an exclusive preview of Power BI, not available to the general public: -          Themes o   Finally! Companies can now easily ‘skin’ a suite of reports to use standardised logos, fonts, etc.  o   The themes will be driven by JSON file. o   All themes can be uploaded into, using the ‘Themes Gallery’. -          Slicer Improvements o   A numeric, sliding scale slicer will be available as a new configuration option. o   Currently only designed for attributes.  o   Measure slicers are not  an immediate focus, but Microsoft are giving it thought. -          Matrix/Table Visuals o   Whilst they currently exist, a new visual will be added. This includes: §  Drill down capabilities. §  Include/Exclude – like what you see in charts currently. §  Ability to highlight rows and columns, which also cross filters and highlights the selections on accompanying charts. Analysis Services All of the features discussed are for Azure Analysis Services, but I am sure they will also be made available on-prem soon after GA release: -          Live Connect to Analysis Services o   Add ad hoc measures within the Power BI Model. o   It does not update your actual Analysis Services model, therefore, adding measures in Power BI need to be done so with caution. o   Currently Tabular only, but should eventually come into Multidimensional too. o   Available in both Azure (cloud) and on-prem. -          Power Query/M Integration o   This is the missing jigsaw to the Power BI/Analysis Services puzzle. -          Table Level Security o   Ability to implement this and row-level security together. -          Migrating existing Power BI Models o   This will be awesome! Imagine being able to move your complex Power BI model (which is now performing badly) into an Azure Analysis Services model? o   At the moment, you have to create the tabular model again. Conclusion Whilst no official dates have been given, I hope that March’s release will include the visual features, such as themes. The ones specific to Azure Analysis Services are ideas that have come from many blogs (professional experts) and the Microsoft roadmap.  If there is a feature that your customer needs or just something you believe should be in Power BI, the Ideas forum is a great place to submit your suggestion. Further Reading For up to date Microsoft announcements, I would definitely advise following the below blogs and twitter users: -          Blogs o   Power BI Ideas - o   Power BI Blog - o   Power BI Blog Announcements - o   Chris Webb’s blog - -          Twitter o   Will Thompson (Power BI Program Manager) - @Will_MI77 o   Riccardo Muti (SSRS Program Manager) @RiccardoMuti Contact Me If you have any questions, please comment below or contact me directly on Twitter (@DataVizWhizz ) .

Dual KPI Custom Visual in Power BI

On February 8th, Power BI released a new custom visual called Dual KPI. The purpose of this chart is to visualise two measures over time and show their trend based on a joint timeline. The absolute values may use different scales e.g. Sales and Profit.This blog will not only show you how to set up the new visual, but also demonstrate how changing some of the settings can enhance a report. Adam Saxton posted a YouTube video that also walks through the Dual KPI. Pre Requisites In order to follow my example, you will need a copy of AdventureWorksDW2014 database – found here. You will also need to download the following custom visuals: o   Hierarchy Slicer – o   Dual KPI – NOTE:   This article assumes previous knowledge of downloading and importing Custom Visuals into Power BI Desktop. If this concept is new to you, Scott Murray’s blog gives great step by step instructions.  Prepare Data Open Power BI Desktop and Get Data. Point to the new AdventureWorksDW2014 database and drop down Advanced Options. Paste in the following T-SQL: SELECT         DPC.EnglishProductCategoryName        ,DPS.EnglishProductSubCategoryName        ,DP.EnglishProductName        ,SUM([TotalProductCost]) AS [TotalProductCost]        ,SUM([SalesAmount]) AS [SalesAmount]        ,SUM([SalesAmount]) - SUM([TotalProductCost]) As ProfitAmount        ,[ShipDate] FROM [AdventureWorksDW2014].[dbo].[FactInternetSales] FI INNER JOIN        [dbo].[DimProduct] DP        ON DP.ProductKey = FI.ProductKey INNER JOIN        [dbo].[DimProductSubcategory] DPS        ON DPS.ProductSubcategoryKey = DP.ProductSubcategoryKey INNER JOIN        [dbo].[DimProductcategory] DPC        ON DPS.ProductcategoryKey = DPC.ProductcategoryKey WHERE ShipDate BETWEEN '2013-01-01' AND '2013-06-30' GROUP BY         DPC.EnglishProductCategoryName        ,DPS.EnglishProductSubCategoryName        ,DP.EnglishProductName        ,[ShipDate] When happy, click ‘OK’ to continue. The preview of the data will open.  Click Load, as we do not need to edit any data in the Query Editor.  Apply and changes and rename the query to ‘Internet Sales’ – final output below: Some measures and attributes need to be formatted within the ‘Modeling’ Tab. o   ‘ShipDate’ = dd MMMM yyyy o ‘ProfitAmout’ = Currency o   ‘SalesAmount’ = Currency The final formatting step is to create a Product hierarchy, based on the three product attributes.  Navigate to the Data tab, right click on the ‘EnglishProductCategoryName’ attribute and select ‘New Hierarchy’.  Drag the attributes into the hierarchy and name it ‘Products’.  It should look like the following: Create Report Visual We need to use both the Slicer and Dual KPI custom visual. To achieve this, follow the steps below: Select the Hierarchy Slicer in the Visualizations menu and drag the ‘Products’ hierarchy on to the Fields box. The slicer will now appear in the report. Select the Dual KPI Slicer in the Visualizations menu and drag the following measures to the appropriate chart properties box: a.       ‘ShipDate’ > Axis b.      ‘SalesAmount’ > Top values c.       ‘ProfitAmount’ > Bottom values The chart is now configured and each metric/visual is explained in more detail below. Only the top KPI (Sales Amount) is shown because both use the same calculations.    1.       This is a fixed growth percentage, comparing the last (06/30/2013) vs. first (01/01/2013) data point on the graph. The metric acts as a static KPI. 2.       The Sales Amount value for the last data point on the graph. Also a static KPI. 3.       The data point currently being hovered over. This dynamically changes when you move along the axes. 4.       The Sales Amount value for the current data point being hovered over. Also dynamic. 5.       % since metric that looks at the Sales Amount for the last data point on the graph and works out the growth based on the current data point being hovered over. To use the example in the screenshot: -          Sales Amount for 06/30/2013 = 51,596 -          Sales Amount for 05/17/2013 = 18,442 -          % since:  ((51,596 - 18,442) / 18,442) * 100 = 179.7% Enhancing the Report As with all custom visual in Power BI, there are lots of settings that you may never use. I have picked out some that enrich the capabilities of the Dual KPI Chart: o   Fields o   Warning State §  Set alerts around data freshness and view warning messages. o   Top/Bottom % change start date §  For the fixed +/- % change on the chart, you can add an override start date. The dates could vary by product category and dynamically impact the % in the visual. o   Format o   Dual KPI Properties §  Show abbreviated values, define multiple tooltips and show stale data warnings. o   Dual KPI Chart Type §  Choice of either Area or Line charts. I have applied the Top/Bottom % change start date functionality and also formatted the chart properties. The report now looks a little more professional: Further Reading o   Adam Saxton YouTube Video – o   Power BI Blog - Contact Me If you would like a copy of the workbook or have any questions about this blog, please leave a comment below.Twitter:  @DataVizWhizz

On-Premise Power BI: Part 2 – What’s next?

Part 2 of this blog series focuses on the future of Power BI On-Premise and what direction Microsoft are heading in.  This is a little tricky, as there haven’t been many formal announcements of how both SSRS and Power BI will work together from an architectural and pricing perspective. I will be using this article as a forum to discuss both On-Premise and Cloud variations of Power BI, potential licensing and whether the new offering is actually just ‘Plugging Power BI reports into SSRS’.  The last statement is a little crude but from a business perspective, decision makers need to know what version of Power BI suits their needs.  We won’t get all of the answers right now, but it would be nice to shed some light on what appears to be a very dark room. Cloud vs On-Premise What does this mean for Power BI as a brand and more importantly, as a reporting tool?  The original purpose for Power BI was to offer self-service, ad hoc end user reporting.  However, as the product has matured, prospective clients have wanted it to do so much more.  “How can we incorporate Active Directory security”? “How do we share Dashboards within a specific workspace and limit permissions to it”? “But our business uses ultra-sensitive data and it MUST stay on a server in the UK…..”.  These are the types of comments/questions that get banded around a lot at the moment. This is why Microsoft are offering both Cloud and On-Premise.  We are not yet in a position where definitive pros and cons can be laid out for each option, but this will soon be possible when Microsoft reveal their ultimate strategy.  There are still a lot on unanswered questions, especially around licensing.  A sensible assumption is that Power BI On-Premise will be covered under the typical Enterprise Edition version of SQL Server.  However, will there be an add-on fee for Power BI or will the general license costs go up?  What impact will this have on the Cloud costs?  There are some complicated pricing models for Office 365 users and the simpler £9.99 a month for a standalone Power BI Pro license.  Will the same pricing strategy exist or will On-Premise force Microsoft into a rethink?  I tend to stay away from speculation in my blogs and stick to facts, but I genuinely am interested to see how Microsoft market the variations of Power BI. Coming Soon After looking through various blogs and forums (links provided below), the following features/functionality will soon be available before Power BI On-Premise goes to GA. Short-Term Custom visuals Additional data connectors (besides Analysis Services), cached data, and scheduled data refresh Power BI mobile apps (viewing Power BI reports stored in SSRS) Longer-Term R Visuals Support for integrating previous versions of SQL Server Databases (2008 +)  and Analysis Services (2012 SP1 +) with SSRS 2016 Support for all data connectors currently enabled for Power BI Cloud Microsoft have also listed some Power BI cloud features that are not planned for the On-Premise version: Dashboards – The concept of pinning a report and sharing it on an ad hoc basis Q&A (Natural query language) Quick Insights Another pertinent question is “When will Power BI On-Premise actually be available in the real world?”.  Microsoft are targeting a production ready release for mid-2017, although nothing is official yet.  One thing is certain – it won’t be coming in a Service Pack or Cumulative Update.  Another big thing to consider is migrating from SSRS 2016 to SSRS with Power BI Reports, which Microsoft are promising will be easy. Conclusion At this stage, it is difficult to give any concrete information on where Power BI On-Premise is heading.  All we know is that Power BI and SSRS will be working together a lot more closely and the majority of functionality will be available in both.  The concept of having Power BI reports shared and deployed to physical, on premise servers will accommodate companies worried about moving their data to the cloud.  As Power BI continues to increase in popularity, the overall security and infrastructure model will be scrutinized.  Cloud storage is often falsely labelled as a security risk, which is where the common corporate misconceptions are born.  It will be hard to change this train of thought, which is where the On-Premise Power BI offering will come in handy. Even more appealing is the natural integration with SharePoint and SRRS, enabling companies to use hybrid approaches, as well as not need to migrate old SSRS reports into Power BI.  Everything is managed in one location, thus reducing security risks and costs.  Now we all sit tight and wait for Microsoft’s next big announcement.  If anyone has more information around Power BI On-Premise, please comment below.   Further Reading o   Power BI Reports in SSRS Release Notes - o   October 2016 Technical Preview Blog – o   December 2016 Feedback Review Blog – o   January 2017 Technical Preview Blog –

On-Premise Power BI: Part 1 - Technical Preview

Back in October 2016 at SQL Pass, Seattle, Microsoft announced that Power BI was going on premise.  Along with the big fanfare, there was also a Technical Preview released – in which a user could download a pre-built VM from Azure Marketplace and get a first look of how SSRS and Power BI could work together.  Roll on a few months (January 17th 2017 – to be precise) and the new Technical Preview is available.  This time, you can actually download a standalone copy of Power BI Desktop (for SSRS) and a slimmed down SQL Server Reporting Services 2016 configuration tool.  If you follow the official Microsoft blog, getting set up is really easy.  Just a heads up – there are still lots of limitations with this preview, which can be found at the bottom of the January blog. The blog will be broken into two parts – with Part 1 focusing on my experiences with the January Technical Preview of SSRS and Power BI. Technical Preview Feedback Setting up a local SSRS Server (on my own machine) and deploying a Power BI Report was really easy, which is good feedback in its own right.  Annoyingly, only Analysis Services connections are currently supported within the preview Power BI Desktop application and all of my workbooks use online services or SQL Server databases!  This meant I had to download a tabular model for Adventure Works and create a quick test report using the dataset.  Not a major problem, but an inconvenience nonetheless. All of my other observations are based around differences between the regular SSRS service and this new offering.  In essence, not much has changed – which can only help a user’s experience.  Here are the things to be aware of: SSRS Configuration Tool There are two new options – ‘Scale-out Deployment’ and ‘Power BI Integration’.  The latter is the one we care about.  In effect, you need to tie SSRS version of Power BI Desktop (which is available through the Technical Preview install) to the SSRS Configuration. Power BI Desktop The application looks pretty much the same, with the only differences highlighted below.  The most important bit is saving the report to the SSRS Server.  I thought it would have been under the ‘Publish’ option, but Microsoft opted to put it in ‘Save As’.  The application itself is called ‘Power Bi Desktop (SQL Server Reporting Services)’ and I imagine it will stay this way, in order to differentiate between On-Premise and Cloud versions. The final step is to define the SSRS Server.  A grid appears and you can either choose an old URL or define a new one.  As long as you are connected to an Analysis Services model and type in the correct server name, then the report will successfully deploy. SSRS Report Server I deployed a test Power BI Report to the local SSRS Server and there are a few specific options now.  By clicking the ellipses, you can open the report as normal, Edit directly in Power BI Desktop or even download the pbix file.  Notice the ‘Power BI’ icon on the tile of the report, which also helps with distinguish the type of reports in the SSRS portal. Conclusion My first impressions of Power BI On-Premise are good.  Whilst there are still some clear gaps and limitations, the fact that Microsoft are looking at bring SSRS and Power BI together must be a good thing.  They recognize the need to bring Mobile, standard On-Premise and ‘director pleasing’ reports together in a one stop shop. I am certainly excited about the next Technical Preview for Power BI On-Premise.  Not only should it contain more data connectors and features, Microsoft should reveal more about the long term visions.  Look out for my next blog where I will discuss licensing implications, architecture and some of the common Power BI questions I get from prospective clients.   Further Reading o   Power BI Reports in SSRS Release Notes - o   October 2016 Technical Preview Blog - o   December 2016 Feedback Review Blog - o   January 2017 Technical Preview Blog -

Buffer() M Function in Query Editor (Power BI)

Whilst I have been aware of the Buffer() M function in the Query Editor of Power BI for a while, I had never really utilised its capabilities until now.  There are two main types of buffer functionality – Table.Buffer and List.Buffer.  To define them simply, Table.Buffer puts an entire table into memory and prevents change during evaluation, whereas List.Buffer provides a stable list, meaning it has some form of count or order. This blog will focus on the theory behind the general Buffer() M functionality, picking a specific scenario of when it can outperform the standard Query Editor behavior.  I will also demonstrate that this is not always the most effective technique within the same scenario.  The article will not give you a hard and fast rule of when to use the Buffer() function, because it can depend on a number of factors.  These are described further below. Note:    It is assumed you have existing knowledge of Query Folding and if not, one of my previous blogs should help greatly. Scenario I found inspiration from Chris Webb’s example, using the Adventure Works DW 2014 database – available here. The requirements are: 1.       Obtain the first 10,000 rows from FactInternetSales 2.       Remove the majority of columns, retaining ONLY: a.       SalesOrderLineNumber b.      CustomerKey c.       SalesAmount 3.       Rank the current row based on Sales Amount. List.Buffer() Assuming your database exists on a local server and is named AdventureWorksDW2014, copy the following code into the Advanced Editor in the Query Editor screen. let     //Connect to SQL Server     Source = Sql.Database("localhost", "AdventureWorksDW2014"),     //Get first 2000 rows from FactInternetSales     dbo_FactInternetSales = Table.FirstN(           Source{[Schema="dbo",Item="FactInternetSales"]}[Data],           10000),     //Remove unwanted columns     RemoveColumns = Table.SelectColumns(           dbo_FactInternetSales,           {"SalesOrderLineNumber", "CustomerKey","SalesAmount"}),     //Get sorted list of values from SalesAmount column    RankValues = List.Sort(RemoveColumns[SalesAmount], Order.Descending),     //Calculate ranks     AddRankColumn = Table.AddColumn(RemoveColumns , "Rank",           each List.PositionOf(RankValues,[SalesAmount])+1) in     AddRankColumn You can visibly see the rows loading – one by one.  In total, it takes nearly 1 minute to load all off the results. Now let’s use the List.Buffer() function in the RankValues step. Replace: = List.Sort(RemoveColumns[SalesAmount], Order.Descending) With: = List.Buffer(List.Sort(RemoveColumns[SalesAmount], Order.Descending)) The entire transformation (from start to finish) completes in just under 2 seconds!  This is because the List.Buffer function stores the sorted values in memory and therefore, the rank calculation is only evaluated once.  The last query (and previous steps) were being evaluated multiple times.  The M language is both functional and at times, lazy.  In order to prevent the constant re-evaluation, buffer the list into memory.  The final query output is shown below: Query Folding We will implement the same requirements, but this time using Query Folding.  The third step in our current transformation is called ‘Removed Columns’. This is what prevents Query Folding, as this function cannot be interpreted/translated to the native SQL Server T-SQL language.  All steps below are inadvertently not supported either.  The way around this is to write SQL Server View (in SSMS) to import just the fields required from the underlying FactInternetSales Table.  The below query will give you the same result up to the ‘Remove Columns’ step. CREATE VIEW dbo.VwFactInternetSalesAmount AS        SELECT SalesOrderNumber                         ,[CustomerKey]                      ,[SalesOrderLineNumber]                      ,[SalesAmount]                      ,RANK() over( order by [SalesAmount] desc) AS [Rank]        FROM   [AdventureWorksDW2014].[dbo].[FactInternetSales] The final steps are to filter on the top 10,000 rows and Group the Rows together – inserting the following M syntax into the last Applied Step: let     //Connect to SQL Server     Source = Sql.Database(".", "AdventureWorksDW2014"),     // Connect to SQL Server     dbo_FactInternetSales = Source{[Schema="dbo",Item="VwFactInternetSalesAmount"]}[Data],     #"Sorted Rows" = Table.Sort(dbo_FactInternetSales,{{"SalesOrderNumber", Order.Ascending}}),     #"Kept First Rows" = Table.FirstN(#"Sorted Rows",10000),     #"Grouped Rows" = Table.Group(#"Kept First Rows", {"CustomerKey", "SalesOrderLineNumber", "Rank"}, {{"TotalSalesAmount", each List.Sum([SalesAmount]), type number}}) in     #"Grouped Rows"   The query now returns instantly (under 1 second).   Right click on the last applied step and select the View Native Query option, to show the underlying SQL. select top 10000     [rows].[CustomerKey] as [CustomerKey],     [rows].

Row Level Security (RLS) using the Power BI Service

Following on from my last blog on RLS, I will now demonstrate how to initiate the roles created in Power BI Desktop.  Whilst the desktop version is great for testing security, you have to configure the report in the Power BI Service to enforce permissions.    Before doing anything, ensure you have published a report with RLS onto the Power BI Service.  If you would like to use my workbook as the example to follow, leave a comment below. Configure Power BI Service 1.       Navigate to the Datasets tab and click the ellipsis button (‘...’) on the ‘RLS Part 2’ dataset (or whatever you have called your published report).  The select the Security option.   2.       Let’s assign two email accounts to the users ‘Stephen’ and ‘Amy’.  These must use a work domain – Hotmail, Outlook, etc. accounts are not permitted with Power BI.  Click Add when you are happy.     3.       Ensure the RLS roles are saved.  You will end up with the following: 4.       The roles are now configured.  Both email accounts will only see data for the person they have been assigned to. NOTE:   You can add numerous email accounts to a specific role.  For this example, the roles are done by person, but you may want to add a whole department containing 10 people to a given role. Testing Roles using Power BI Service This is actually really simple.  Go back to the Security settings (shown above).  Click the Ellipsis button and select ‘Test as Role’. You will now only see sales data for the Sales Representatives that report to Stephen.  Conclusion RLS is now out of preview mode and released into General Availability.  Whilst the capabilities are still pretty new, the additional DAX layer over the GUI enables more complex security capabilities.  As with all features in Power BI, regular updates are to be expected – so if RLS currently isn’t fit for your scenario, it could be very soon. Recommended Reading  o   Reza Rad’s Blog - o   Power BI Community Blog - o   Devin Knight’s Blog -

Row Level Security in Power BI Desktop

In the June 2016 monthly Power BI release, Row Level Security (RLS) was introduced into Power BI desktop.  This is great news for people using the application, especially as the configuration is stored within the Power BI model.  Previously, you had to create the security in the web environment, which could easily be overwritten when publishing multiple times from a desktop workbook. In this blog, I will show you how to set up RLS in Power BI desktop and how to test it works. My example uses the AdventureWorksDW2014 database (download here), specifically applying permissions for a manager.  Each manager will only be able to see data for the Sales Representatives that report to them. NOTE:   This article assumes you have prior experience with creating reports and visuals in Power BI.  Please leave a comment if you would like a copy of the workbook. Report Setup From the AdventureWorksDW2014 database, import the the FactResellerSales table.  Then add another SQL dataset, pasting in the below query: SELECT EMP.EmployeeKey               ,EMP.FirstName + ' ' + EMP.LastName As SalesRepName               ,EMP.Title As SalesRepTitle               ,MGR.FirstName + ' ' + MGR.LastName As ManagerName               ,MGR.Title As ManagerTitle FROM   [dbo].[DimEmployee] EMP INNER JOIN               (                      SELECT EmployeeKey                                   ,FirstName                                   ,LastName                                   ,Title                      FROM   [dbo].[DimEmployee]               ) MGR ON     MGR.EmployeeKey = EMP.ParentEmployeeKey WHERE  EMP.Title = 'Sales Representative’ Create a basic bar chart and rep slicers – like shown below.  You should end up with 3 managers – Amy Alberts, Stephen Jiang and Syed Abbas. Create Roles To create a role, navigate to Modeling tab (at the top of the page) in Power BI Desktop. You will see a section named Security.  Click on Manage Roles. Next, we need to create a role for the three managers that were previously identified.  Follow the below annotation and necessary steps to create a specific role for Amy Alberts: 1.       Create a Role. 2.       Name the role as ‘Sales Manager Amy’. 3.       Click the ‘..’ on the QueryEmployee Table. 4.       Add filter (a DAX Expression). 5.       Choose [ManagerName], as we want to filter on the manager’s full name.  This defines what data they can see. 6.       Change the “Value” filter to “Amy Alberts”.  Only data containing Sales Representatives working for Amy will be shown in the report. 7.       Repeat steps 1-6 for Stephen Jiang and Syed Abbas, simply replacing the “Amy Alberts” filter with the applicable person’s name.  Click save to close down the RLS form.  You will end up with the following: Role Testing Navigate to the same Modeling tab and select View As Roles. Select ‘Sales Manager Amy’ and click OK.  You will notice only Amy’s Sales Representative’s show in the report. You can follow the same steps to impersonate a different manager or to remove the Roles filter completely. Conclusion The Row Level Security feature in Power BI Desktop makes it really easy to apply security around the data and what users can and cannot seen.  The ability to adopt DAX filter expressions through the UI enables the user to deploy an effective security model in a simple, easy manner. Currently, the Power BI desktop security in my example is not being used in a real life situation and the three managers can see all data in the online report.  Look out for the second part of my blog on RLS (coming soon), where I will bring this to life using the Power BI Service.  Tasks include tying emails to security roles/groups and actually impersonating a user to ensure they configured correctly.  Recommended Reading  o   Reza Rad’s Blog - o   Power BI Community Blog - o   Devin Knight’s Blog -

October 28th Technical Preview for SSRS – Including Power BI in SSRS

Today at the Pass Summit, Microsoft announced some cool new features that are coming into SSRS 2016.   I attended the ‘SQL Server Reporting Services: What’s New and What’s Next’ session (presented by Chris Finlan and Riccardo Muti and there was certainly some very exciting news for anyone in the BI sector. There will be a technical preview available from 28th October, in which you can download a pre-configured Virtual Machine in the Azure Marketplace.  This includes sample reports and data, enabling you to try the features whilst Microsoft work on a complete preview for on premise R&D. This blog lists the new feature from a high level.  I will announce more information as and when it is available. Features   1.       Power BI reports in SQL Server Reporting Services. a.       I won’t steal Microsoft’s thunder and instead, redirect you to their blog written on October 25th: b.      In essence, you can create a Power BI desktop report and hook it straight into your on premise SSRS reporting solution. Amazing, right?! 2.       Report Manager. a.       List View is back!  You can again view reports the old fashioned way. b.      Show Hidden Items is now unchecked by default.  Sounds insignificant, but it is the little things that sometimes matter. 3.       Report Comments Section a.       This is really cool.  A user can write a comment against a given report and even ‘snapshot’ the report as an image and upload it.  This is a good way to start an internal company discussion, as well as trace how a visualization has changed over time. b.      All comments and images are stored in the Report Server db, which helps for auditing or plugging the data into an external tool/text file. 4.       Other – More information to be announced soon a.       Direct URL link from a KPI.  If you click a KPI, it will take you to a specified link, without the need to select it form an option. b.      Mobile Reporting auditing. c.       Better support for generated MDX.  The current date workaround will no longer be needed and make everyone’s life easier.  d.      General performance. Current Preview Limitations   As this is an interim SSRS Preview release, there are some current limitations.  These are more specific to the Power BI reports in SSRS: -          Power BI reports that connect “live” to Analysis Services models – both Tabular and         Multidimensional  (cubes). No other data sources are currently available. -         Custom Visuals not supported. Coming Soon   As soon as I get back from the Pass Summit (and over my jet lag), I will be downloading the preview and trying out the cool features.  Please come back to my blog page to check out my findings and more importantly, feel free to comment on any quirks/issues/limitations that you have come across yourself. It is certainly exciting times for the on premise enthusiast out there.  I had lost all hope for SSRS, but with 2016, Microsoft have rekindled my love for enterprise data visualisation.

Using Azure Machine Learning and Power BI to Predict Sporting Behaviour

Can we predict peoples’ sporting performance by knowing some details about them? If so, what is better at making these predictions: machines or humans? These questions seem so interesting so we decided to answer them; by creating a working IT solution to see how it would perform. The blog will provide an overview of the project providing a simple results analysis and details of technologies that we used to make it happen. As It would be hard to check all available sport disciplines we decided to focus on the one we love the most – Cycling. Our aim was to predict the maximum distance a rider would ride within one minute from standing start. Although “one minute” sounds insignificant, this is really tough exercise as we were simulating quite a tough track. We used the following equipment to perform the experiment: bike with all necessary sensors to enable recording of speed, cadence, crank turns, wheel turns, distance velodrome bike simulator heart rate monitor in form of wrist band Using this equipment allowed us to capture data about the ride in real time and display this using streaming analytics and Power BI in a live interactive dashboard as shown below (Picture 1):   Picture 1: Real time Power BI dashboards showing:  average heart rate(top row left); current speed in km/h(top row middle); average of speed(top row right); Current Crank turns, wheel turns and cadence(bottom row Left); Average of Crank turns, wheel turns and cadence (bottom row right)   Sensors were used to capture information about how fast our rider was cycling, how many crank turns they made, what was their heart rate during the ride and the most important - how far they did go within the time limit. Each rider had a chance to try to predict their maximum distance before their ride. We also made a prediction based upon previous cyclist results using Machine Learning algorithms. In order for the Machine Learning Algorithms to make estimates about each of the riders, we had to capture some representative properties about each rider before the ride. All riders needed to categorise themselves for each of properties listed below: age height weight gender smoking volume drinking volume cycling frequency   So taking weight as an example, people were asked to allocate themselves to the one out of the available buckets: e.g. Bucket 1 - 50-59kg, Bucket 2 - 60-69kg, Bucket 3 – 70-79kg … Bucket N – Above 100kg   Bucketing properties were used to help us reduce amount of distinct values, so it increased the probability that for a given ride we would find someone with similar characteristics, who had already had a ride. Obviously to make the prediction work we had to have an initial sample. That’s why we asked “Adatis people” to have a go on Friday afternoon. In true competitive spirit some of them even tried a few times a day! By the beginning of the SQLBits conference we had managed to save details of around 40 different rides. In a nutshell let me describe the process that we repeated for each rider. First step was to capture details of the volunteer by using ASP.NET Web app, including the maximum distance they think they will be able to reach (human prediction). Next, behind the scenes we provided their details to the machine learning algorithm exposed as web service to get a predicted distance. We then turned on all sensors and let the cyclist ride the bike. During the ride we captured all the data from the sensors and transferred it to the database through the Azure IoT stack. After the ride finished we updated the distance reached by the rider. The more cyclists participated, the bigger sample size we had to predict result for the next rider. Overall we captured 150 rides for 138 different riders. The initial sample size we used to make prediction was 40 riders and it grew up as more riders got involved.  The table below (Table 1) contains basic statistics of the differences between the machine learning predictions and human predictions.   Prediction Type Avg. Difference Std. Dev. For Difference Max Difference Min Difference Azure Machine Learning 119m 87m 360m 2m Humans 114m 89m 381m 0m Table 1: Absolute difference between Predicted and Reached distance for 1 minute ride. (Average distance reached 725m)   From these numbers we can easily see that neither Humans nor Machine Learning  came close to the real results reached by riders. The average difference over a 725m ride was 114m for humans with a standard deviation of 89 meters and 119 with a standard deviation of 87 meters. That means both of them were equally inaccurate. Although it is worth mentioning that we had single cases when the prediction was very close or even equal to the one reached. In trying to determine the reason behind the miscalculations in the ML prediction? I would say that the main reason is the sample size was not sufficient to make accurate predictions. Besides the small l sample there might be other reasons why predictions were so inaccurate such as: Incorrect bucket sizes for rider properties Too many properties to make a match Lack of strong enough correlation between properties and distance reached   It is also worth mentioning that some properties would show high correlation between property and distance like height of the rider or low correlation like drinking volume. The Best examples of high correlation we can see are on the charts attached below (Chart 1):   Chart 1: Correlation between distance reached in meters and height  category of the rider   And even more significant regarding fitness level (Chart 2):   Chart 2: Correlation between distance reached in meters and fitness category of the rider   On the other hand, some rider’s properties did not show the correlation that we would expect e.g. age (Chart 3)   Chart 3: Correlation between distance reached in meters and age of the rider   Although there is no straightforward correlation as previously stated we can observe a general trend that we tend to perform better the closer we get to our round birthdays. We can observe peaks at the ages of 18, 29, 39, 49. Is it perhaps because of the fear of getting to the next decade? I will leave this up to your interpretation… If you are interested into more technical explanation how we designed and build our project, I would like to invite you to the second part of the blog that would cover top level architecture of the project and also some deep insights into some core technologies used including: Azure Stream Analytics, Azure Event Bus, PowerBI web, ASP.NET MVC4 and SignalR.

View Native Query Function in Power Query

Following on from my last blog (found here), I wanted to share a new Query Folding feature in Power Query.  The View Native Query functionality was actually released in the June 2016 Power BI update but it was hardly advertised/promoted by Microsoft. What is View Native Query? In a nutshell, it enables you to check if each of your step by step transformations in Power Query are being folded back to the source.  I have shown techniques of using external monitoring tools, as well as an in-built M function.  However, the View Native Query feature makes our lives so much easier.  I will demonstrate how easy it is to use. Example I loaded up an old Query Folding Power BI workbook in Power Query, which was connecting to the AdventureWorksDW2014 SQL Server database.  If you want a copy of it, feel free to comment below and I’ll get back to you.  However, you can do this with any workbook and need to follow the simple instructions below: 1.    Right click on the last step of your transformation and a new option called View Native Query should be available.  Left click to open the SQL query.   2.    Whilst the above SQL written at source is not the most efficient, query folding still means the data is brought back into Power Query far quicker than it would if done within the Power BI application.   3.   Now copy and paste the code into SSMS, This is exactly the same code that SQL Server Profiler would produce.  Execute the query.     4.    In effect, this is replicating the transformation steps in Power Query.  The below screenshot is taken from Power Query and note the top 10 results from this and the above output. The coolest thing about the View Native Query feature is the fact you can check every step in the Power Query transformation.  I only demonstrated the final step, but you can go back to each one and find the underlying query folding logic that has been applied. Why is View Native Query not available? If you get the below screenshot, the answer is simple – your transformation step doesn’t support Query Folding!  NOTE: View Native Query is ‘greyed’ out. If possible, move any steps that aren’t supported to the end.  Power Query will attempt to query fold, until there is a step it cannot.  All steps after this (even if query folding is possible) will not be supported and could drastically slow down the remaining transformations. Conclusion This feature is something that should have been made available a long time ago.  Whilst the power of query folding is undeniable, the methods of testing it were long and arduous.  With View Native Query, we can easily check where, when and more importantly, when folding is NOT happening.  Debugging slow performing transformations is so much easier now and you can in fact, swap steps around and folding will still persist where possible. I would advise every Power Query developer to make use of this simple feature.  If there are any other Query Folding techniques, tips or tricks out there, please contact me.  Look out for future blogs, where I will be looking at the performance of query folding on very large data volumes (100 million + records) and whether the automated SQL Server code eventually becomes too unwieldy and poorly performing.