Tristan Robinson

Tristan Robinson's Blog

Power BI Dataflows – Bringing self-service ETL to the Business User

Announced at the recent PASS Summit, Power BI Dataflows has now gone into public preview. Previously, any ETL that users applied via Power Query within Power BI Desktop was only applied to their own dataset. With Dataflows, ETL processes are now classified as a first-class citizen and this update provides centralised self-service data prep as part of the Power BI product so that business users can expose cleaned/schematised entities across the business in a similar way to a Data Warehouse. No longer will the same lightweight ETL processes be occurring across multiple users for the same dataset, and often in different ways. Traditionally, self-service BI is limited to analytic models and reports/dashboards but with Dataflows that is changing as it shifts down a peg.  In this blog, I’ll take a look at the new functionality and give my thoughts.

Before we begin, its worth noting that this is an entirely different piece of functionality to that of the similarly named Data Flows in Data Factory which is in private preview. You can tell the difference by the capitalisation of the word “Flow” for Data Factory. I’m not sure what the marketing department would say about this but it is slightly confusing having both been released as news at the same time.

 

What is a Dataflow?

A dataflow is a collection of entities (which are similar to tables) that are created and managed from within the Power BI Service (powerbi.com). This is then stored under the Common Data Model (CDM) in an Azure Data Lake Storage (Gen2) resource as files and folders – no longer as a Tabular Model that we are used to with Datasets.  Dataflows are used to ingest, cleanse, transform, enrich and schematize/model the data in a similar way to that of ETL to create a Data Warehouse. Once created you can use Power BI Desktop to create datasets based off of these entities in the usual manner. While dataflows can be created and edited by both Pro and Premium users, there is a larger set of functionality only available via premium. This includes creating computed entities (calculation aggregates), linking to existing entities in other dataflows, and incremental refreshes.

 

Creating a Dataflow

To go about creating a dataflow, navigate to an App Workspace on the Power BI Service (note that this cannot be My Workspace for obvious reasons, also note this cannot be done in Power BI Desktop). You’ll notice alongside Dashboards, Reports, Workbooks, and Datasets that there is a new artefact labelled Dataflows (Preview). If you have Premium capacity this will need to be enabled via the Admin Portal Capacity Settings. Click the + Create and select Dataflow. Each dataflow has only one owner and only the owner can edit it. To create a dataflow, you can either define new entities which involves connecting to a data source and then mapping to the Common Data Model entities / defining custom entities OR linking to other existing dataflows (see below).

image

 

If you are creating new entities, you are provided with a set of familiar data source connectors (see below). This set is not as complete as those through Power BI Desktop but I expect more to appear within time. I suspect there is also more work to do here because some of these connectors documented below, i.e. Excel - ask for a File Path or URL without giving any functionality to go get this through a lookup. This is something we have come to expect as part of Power BI Desktop.

image

 

Once you have loaded your data, you will then be provided with a fairly familiar query editor pane (see below). The first thing you will notice is that the options to manipulate and transform your data are limited to that of Power Query as part of Power BI Desktop. Again, I suspect this will mature when the functionality hits GA but for the time being gives some reasonable functionality to do what you need to do to create entities in the form of column/row transformations. You are also provided an option to “map to standard” which allows you to join into the Common Data Model (CDM), essentially some generic templates for entities such as Account/Product/Customer/etc. While I can see how this could be useful for an average user, I think it would still be better to create your own based around your businesses own definition of an Account/Product/Customer/etc. These are then stored as custom entities within the CDM. Once the entities are transformed/schematised how you want them, you save them to the Power BI Service with a name/description. Under the hood, this is then written to a Data Lake as a series of files along with a JSON schema.

image

 

Dataflows are then available via Power BI Desktop using Get Data (see below). I believe you will need the November 2018 release or later to see this option. The dataset is then treated in a similar way to everything else you would have seen previously within Power BI Desktop. When selected, you should be able to see all of the workspaces within your BI tenant that you have access to which contain dataflows. Within each dataflow you can see each entity.

That’s really it, they’ve kept it simple on purpose and abstract from what’s going on under the hood.

image

 

Wait, but what does this mean for the classic Enterprise Data Warehouse?

I certainly think from a business perspective it will blur the lines slightly between using traditional ETL processes to create a DW or going down this route, more so at smaller scales. It would be hard pressed to replace an Enterprise level solution, but there is definitely a use case somewhere around departmental level instead of creating a data mart. It will allow users to leverage their data faster than waiting on IT to build out a solution.

It really depends on the complexity and maturity of the platform they are intending to develop and its purpose. The fact is sits on ADLS Gen2 is a positive and allows other applications to access the data at a raw level rather than relying on a specific connector. It also allows users of products such as PowerApps to connect into a cleaned CDM, another plus.

On the other side, its worth bearing in mind that you will need the Premium version to facilitate incremental refresh which is required by the majority of data warehouse solutions.  The lowest cost of Power BI Premium is that of a single P1 node which costs at the time of writing £3,766/month or £45k/pa – not small by any standards. This is worth taking into consideration.

The other fact is that traditional DWHs deal with concepts such as SCD Type II and creating History which Dataflows will not be able to facilitate. You may also have noticed there was no mention of DAX or measures, which still needs to be down at the usual level within Power BI Desktop, not ideal to share KPIs across users. Lastly, there is currently no data lineage, although this has been rumoured to be part of the roadmap.

 

Conclusion

While this feature will be a game changer for a number of end users and projects, I think it will join the set of questions such as “Do I need SSAS or can I use Power BI for my models?” I expect to see this appear at next years conferences! The answer will always be – it depends!

As we have seen with some aspects of Power BI, this new functionality also opens up businesses to governance issues with potentially having users create their own version of the truth rather than something which has been thought out and modelled by someone with a certain skillset or authoritative source. This could cause contention between departments.

As noted above, I expect the Power BI team to continue to mature the functionality of the product – but for the time being, it certainly opens up another avenue to draw people into the toolset and provide another level of functionality for business and users alike.

Iconography in Design

In this blog I will go about discussing the importance of design, in particular looking at how icons can help add the finishing touches to a piece of front end development.

Whether you’re building a Power BI report, SSRS report, PowerApp, or doing any kind of front end visualisation and design, you’ll know that 50% of the battle is making whatever you’re building jump out of the page. You could have built the most useful report that a user could wish for, but unless it looks good, you’re not going to get any plaudits. Essentially, good design goes a long way to making a success on a piece of development. This is even more important if you’re building the underlying architecture / ETL, and for all our good practice and thoughtfulness in this area as developers, will rarely impress or impact on an end user.

Its important to get inspiration for design from somewhere, so before going about designing or even building something, take a look on Google, do some research and try to look for similar things to what you’re doing and see what you think looks good, or not so good. You should build an idea up of how you want to design you’re landing page, report or report headers, etc. within the tool. If you’re in Power BI, take a look at the partner showcase for example. There’s some really good examples to give you ideas – like this one!

Microsoft apps such as Power BI or PowerApps go a long way in helping us build the best when it comes to data visualisation, but unfortunately they sometimes rely on developers to go outside the box to finish things off.

 

Icon Finder

image

Recently I’ve started to use a site called iconfinder.com which has a large pool of useful icons you can use for building out certain corners of apps that the standard MS tooling will not support. The icons are mostly $2 each, but you can get a subscription for £20/month or $10 with a discount code (which you should cancel as soon as you register). Please, don’t jump in if you think you need to use this as a resource. Alternatively, save the icon using Chrome and use it as a placeholder until you are happy to push to production. For non-subscribed users, the icon will always come on a background of faint grey lines. This isn’t too bad as they don’t completely ruin the look and feel of the icon in the development and are good for a placeholder for demos, etc. To get started, just type in your keyword for the type of icon you’re looking for, and then its just a case of wading through the results to find the icon that fits the look and feel your inspired to build against. Sometimes, you’ll get another bit of inspiration off the back of this which you can use as another key word to find even more icons.

The site also comes with a very handy icon editor tool, essential Paint Shop Pro on the web. There’s lots of these sites out there but its useful its all integrated into one place at no extra cost. It will load the SVG icon into it automatically if subscribed which then allows you to edit colours or shapes etc. In my instance, I found a nice % complete icon set which would look good on a white background. Unfortunately, I wanted it on a blue header bar, so needed to change it up slightly to fit the look and feel. No problem, took less than a minute to modify and download.

image

Its also worth mentioning that the site does a good job at helping you find a pool of icons which will fit together nicely using the same look and feel, showing you icons from the same icon set. In one instance, I replaced an icon I was looking for to be from the same pool even though the icon wasn’t exactly what I was looking for – because overall it just felt like it fitted together nicer with the other icons on the screen.

 

Design in Practice

As mentioned above, doing some research before you build can really help you create a much better finish. For inspiration for a recent PowerApps design I did a quick search for landing pages on Google, and found a few I liked the look of (below). As long as the general elements you are working to are similar, it really doesn’t matter where the inspiration comes from. In these cases, they were in the form of mobile apps.

imageimage

From these images, I was able to identify the key components which made me bookmark them:

  • I wanted some kind of non-offensive background, possibly semi-transparent, or with overlay.
  • I wanted a title that stands out the page, so white on grey or similar.
  • I wanted a small section for a blurb for the PowerApp.
  • I wanted 2 buttons, and the buttons to stand out.
  • I wanted logos in the top corners.
  • I wanted a nice look and feel for the colour palette.

From this, I then produced the following landing page.

image

I found the background on picjumbo.com which turned out to be quite a nice resource for some generic business style artwork, and then added a blurring filter across the top. This still interferes slightly with the buttons / title so I’m not completely happy but satisfied enough that it achieves the look and feel I was looking for. For the title, the range of fonts supplied with PowerApps is rather limited so I could go externally for this too but was happy enough for the time being. The layout also leaves room to shrink the title and add a small blurb if need be. The buttons are made up of a number of icons and fit with the theme for the app.

image

As mentioned above, I also added % complete icons to each page so users were able to understand how far they were along the scoring pages within the app. PowerApps provides sufficient icons for the back/refresh buttons that fit in with the white on blue theme, so I didn’t have to go externally for these. These were placed on the page header next to the logo.

 

 

Power BI

While this most recent bit of design was focused on PowerApps, I also add small bits into PowerBI during report design. For instance, rather than just have a generic button that can push you to a “details” page which has a table for the row by row breakdown of some aggregated data - I looked for an icon, edited the colour palette slightly and added this to the report. With recent Power BI functionality, I can make the image act as a button and redirect the user to another page.

image

I’ve also used icons in dashboard design where a single visual didn’t really represent the content of the report to drill into. This can also be a good way to go about adding a bit of flavour to a dashboard to mix things up. In my case, it also meant the drill down into the report level was less ambiguous, by asking a question as the title if that’s what the user wants to do.

image

Design is always subjective of course, but its great to use other resources at your disposal to go about building out apps. Depending upon the current estate in which you develop, it also helps them stand out a bit more and add a unique context to the reports/apps within the project. Hopefully this blog has given you a few ideas for your next project!

Considerations for Creating a Power BI Enterprise Report Deck

Creating or re-creating an Enterprise report deck in Power BI should be reasonably straight forward given a specification, but there are a number of considerations which need to be made when building something of this nature. In the following blog post, I will detail some of these, and the route I would suggest taking. The contents of this blog revolve around more tabular reports than chart visuals, but the same themes can apply.

 

Fonts

I think it goes without saying to keep the font consistent across both a single report, and a report deck.  The default for Power BI is Segoe UI which for the most part is pleasant, just be careful not to flick between this and Segoe UI Light as this can cause discrepancies. It is however the font size that will cause you more of an issue.

  1. The first consideration is to set a minimum size. As report developers we want the report to look as tidy as possible, and usually this means fitting everything on 1 page. The easiest way to do this is to set the font smaller if you are having space issues – but this does not always translate so well to the end user. Depending on the device, they may consider the minimum Power BI lets you set (size 8) as too small for consumption on something such as an iPad – so this is worth checking first.
  2. The second consideration is to set the font size for different elements of the report, i.e. row level data at something like a 10, and header level elements at a 12. Anything else that exists such as filter elements should be set the same as the header levels. I would usually set titles a number of points above this, at something like an 18. In general, having varying levels of font size on a single report between elements will look inconsistent so the aim here is consistency!
  3. The third consideration if possible is to keep the font size the same across all the reports within the report deck for the same type of element. Again, this adds a consistent feel to the deck. If one report has more rows than another, in my opinion its still better to use the same font size across both, rather than filling the page on both using varying sizes.
  4. The last consideration is to be careful when mixing text from a textbox and a card together in the same area of the report. Unfortunately Power BI does not currently support expressions like SSRS does, thus a textbox is for static text only. Dynamic text can only be created through a measure and assigned to a card. However having both elements side-by-side with one another does not give the expected result. The font size of the font in a text box is not the same as a card; a card size 10 is roughly equivalent to a text box size 13 (although you can only set 12-14), thus leaving you with some inconsistent fonts between elements. My suggestion is to create measures referring to elements on the report, and use them for both static/dynamic text, thus every textbox will be a card visual and have that consistent look and feel. If you only need static text, stick to text boxes.

 

Objects

The next consideration is around the number of objects on a report – keep it simple.  Avoid building a giant monolithic report, the more objects you use, the slower the report will perform on PBI service, iPad’s and even to develop.  This is especially true for tables/matrices which will each need to fire off separate DAX queries to return the data elements. Too many objects also has knock on effects for exporting to PowerPoint as objects will overlap with one another more which may not be as much of a case within PBI service but will affect other apps. You can use the selection pane (in the view tab) so move objects above/below one another which will bring forward/push back the elements.

 

Alignment

Another scenario which I have come across is that sometimes it is necessary to include a column header in between the header for a measure and the actual data – for instance to specify the currency or unit. There are 2 options available; the first is to set the headers of the table as white text and insert text boxes over their position. While this achieves the goal, the final look and feel is not ideal as a large proportion of time can be spent aligning the columns with the text in the text boxes, and even then it can still be pixels out of alignment. Adding/removing measures then means you have to go through the same process again as everything shifts left/right. Fortunately, in the November release of Power BI, they have added the ability to align data within the tables better. A better approach for this scenario is to rename the measures within the table visual to whichever unit you want to show for that column. The downside of this is for a developer you will then need to hover the measures to see where the original measure came from, a small annoyance which is compensated by the amount of time saved trying to do alignment within a table. Also, this means less objects in the report, and less objects will generally create a quicker, more manageable report.

For anyone particularly new to Power BI, you an use the arrow keys to move around a single element pixel by pixel, to help with alignment. There’s also options on the format tab. I’m still waiting for the ability to pixel nudge multiple elements when selected together!

clip_image002

 

Colour

Hopefully you should be guided in terms of colour by a corporate colour scheme. This often comprises of a set of 4 or so main RGB values to use, complimented by a further set of colours. Pick 1 (or even 2) of these from the main set of colours and use that for the report framework, either the border/frame, or report header/footer, and then use another colour for the table headers, or two if the report requires double table headers. Again, consistency is the key across reports within the report deck. If using double headers for the columns, consider using the colours as background behind the header text rather than colouring in the text in the corporate colour. Consider white text on the darker backgrounds.

 

Parameter Selection

Most reports will contain some kind of slicer visual, to allow the user to change the context of the data – usually by period. As part of the report build, you’ll need to assess where the best position for this is on the report and to keep it consistent between reports within the deck. If your reports will require the real estate going across the page (i.e. wide tables), then consider the top of the report, else if they need the real estate going down the page (i.e. long tables), consider the right hand side. I think by default I would build it at the top, alongside any logos / titles. If you require multiple slicers, maybe move these to the side and build a panel to group them together.

Another little hidden option is that of switching the slicer visual between List/Dropdown (circled red below). For some reason, list is the default but most users will agree that the dropdown is neater, and saves space. I’m not sure why this appears here rather than in the standard visual configuration tab, maybe they will move it soon? The dropdown slicer visual still has some issues which I hope will be fixed soon such as not de-selecting the dropdown after a selection has been made. Another click is required outside of the dropdown to hide the options. This is not the best for the end users, and there seems to be no viable alternative.

image

 

Header Logic Swapping

Unfortunately as I mentioned previously, Power BI does not support expressions within tables, and therefore switching context based on a parameter is not easy to achieve. This is possible but it needs to be done entirely within DAX. To keep the DAX measures for this simple, consider specifying the position on the report as the name of the measure. Then within the definition of the measure, reference other created measures and keep the statement simple, allowing anyone debugging the report to trace the switching logic easily. Also use a DAX formatter such as this to make the DAX as readable as possible. It would be nice for this to be included within Power BI, hopefully it will soon! I’ve included an example DAX statement below to provide this functionality.

 

New Measure = 
IF (
    HASONEVALUE('Dim'[Value]),
    SWITCH(
        VALUES('Dim'[Value]),
        "Comments describing the logic definition", "",
        "Value", 
        [Measure],
        [AlternativeMeasure]
    ),
    BLANK ()
)

 

Template

What does all of this lead to? The answer is a template for the report deck. As well as having guidelines for the above items which make up a report, its also good to build a physical .pbix template for your suite of reports. This way, you are not starting from scratch for each report,and you will get a more consistent feel down to the pixel level of where the objects are. Don’t over complicate the template, but leave enough elements on it to save you re-creating them each time you build a new report. I would generally avoid copying from an existing report each time to then build another report, as this will sometimes include elements like PBI defined measures, which you do not want to carry between reports. Instead define a template which you take from each time.

 

Conclusion

Once decided on a number of these points, it is worth gaining a consensus from the product owner over whether this is acceptable to use moving forward. Do not get to the end of the deck, and demonstrate your decisions across the report set, this will leave you with far too much re-work. Instead aim to deliver maybe one of the more complex reports with a number of the items above, and then apply those decisions to the rest of the report deck.

Generating Usage Statistics from a SSAS Tabular Cube

Once you have users accessing your cube it’s almost inevitable at some point that someone will ask you to generate usage statistics from it, and there are a number of methods to achieve this. In this quick blog post, I’ll detail them and my experiences with each, and then use this data to create a PBI report.

 

Native Tabular Properties

The first method is natively through the tabular cube properties. This also has the added bonus (read impact) that it will optimise future aggregations based on usage – in fact that’s its main purpose.

This can be done by setting the CreateQueryLogTable to true, setting up the QueryLogConnectionString (to point to the DB where the usage table requires hosting), setting the QueryLogSamping rate (10 means every 10th query will be logged), and finally the name of the QueryLog table.

SQL Server Analysis Services Query Log Related Properties

Advantages of this method is that its very easy to setup with limited knowledge required and it could potentially improve performance if you have an environment where users submit repetitive queries. Unfortunately there are also a number of disadvantages which led me to find other methods. Firstly, it creates a degree of overhead on the cube if its sampling too often; we actually had visible performance related complaints once we turned it on – either through the sampling or change to the “optimised” aggregations. Depending on the sampling rate, you could also find that users who rarely use the cube are not picked up as part of the stats.  As well as this any changes to the cube structure will cause the logging table to be reset. The table is also limited in terms of what it actually logs (as you can see below) – useful if you just want just the user and timestamp info but not much else, and no real ability to configure.

Query the OlapQueryLog table for Cube activity

 

AS Trace

To that extent, I looked for other tools to do the same task but better and I found AS Trace.

Originally built for SQL Server 2012, it works fine on 2014 – and provides you the ability to run a trace against the cube activities (and log to a table) exactly like the SQL profiler but without the overhead of the GUI which adds unnecessary memory/processor power. It also runs as a windows service allowing it to restart automatically when the server reboots. If this is the case, the tool also logs the existing data to a History table and truncates the logging table. Exactly what I was after.

The tool collects information based on a preconfigured Analysis Services Profiler template, which can be optimised depending on which events you are interested in. I initially ran it using most events selected, and with a limited user set it was generating in the region of 25,000 rows a day. This was clearly not maintainable for a long period of time. I then used the following blog post to understand what each event of the profiler was giving me and then just created a lightweight trace definition file to give me what I wanted. I limited it to Query Begin, Query End (for DAX/MDX statements) and Audit Logon/Logout (for session data).

The setup is very straight forward, just run the install.bat as an escalated privileged account, and check it installs the service correctly. Next, add your SSAS service account to the Logon of the service, make sure the account has “Log on as Service” and membership to the database you are writing to in the form of DDL and DML access, i.e. able to create tables, write to tables – and lastly admin rights to the instance of SSAS you intend to use.

Next, configure the ASTrace.exe.config file with the parameters you want the tool to use. This includes the location of the cube (can handle multiple cubes), the location of the trace definition file, the location of the DB instance and table you want to log to and lastly whether you want to preserve history on restart. The only thing I couldn’t do here, is set the schema of the table it was using to log to, which defaults to dbo.

All that’s left is to start the service, and check the log file to see if it has created any errors on start-up. If not, the table should be created correctly and awaiting input.

I also saw another method while researching using Extended Events (XEvents) but did not implement this once AS Trace provided me with the information I needed.

 

View / Power BI Report

I initially used the data to run a limited set of queries to extract total users, and total queries for a given time period. This was useful to a degree but from the data collected I realised I could be doing so much more.

This lead me to do some analysis across the type of metrics being logged, and allowed me to create a view on top of the tables of what I thought might be useful on a report. I removed all the redundant columns it was tracking, and created some friendly names for the EventSubclass, and other columns. I used the PATINDEX function to check the query statement for existence of some important values – while not an exact science, it would give me a good picture of the split between certain user groups and KPIs being run. I’ve included the view definition below.

I ended up limiting the data to EventClass 10 as this seemed to capture all the necessary data. The only downside I have seen so far is that users querying through the Power BI web service are anonymised under the service account name. I’m currently looking into options to resolve this which I’ve seen as configuration options on Power BI – to allow through the username as long as it can be matched at the other end.

SELECT 
RowNumber AS ID, 
SPID AS SessionID,
CurrentTime AS DateQueried, 
NTUserName AS Username,
CASE EventSubClass 
WHEN 0 THEN 'MDX Query (Excel)' 
WHEN 3 THEN 'DAX Query (Power BI)' 
WHEN 1 THEN 'METADATA Query'
END AS QueryType, 
CASE Success WHEN 1 THEN 'Successful Query' ELSE 'Query Error' END AS SuccessfulQuery,
CONVERT(DECIMAL(10,2),CONVERT(DECIMAL(18,3),CPUTime)/1000) AS CPUTimeSec, 
CONVERT(DECIMAL(10,2),CONVERT(DECIMAL(18,3),Duration)/1000) AS DurationSec, 
TextData AS Query, 
CASE PATINDEX('%Mexico%',TextData) WHEN 0 THEN 0 ELSE 1 END AS MexicoMarket,
CASE PATINDEX('%Colombia%',TextData) WHEN 0 THEN 0 ELSE 1 END AS ColombiaMarket,
CASE PATINDEX('%CS4%',TextData) WHEN 0 THEN 0 ELSE 1 END AS CS4,
ServerName
FROM 
[dbo].[ASTraceTable]
WHERE
EventClass = 10

Once I had the view, creating the report was relatively straight forward, and can be seen below.

I included metrics for number of queries by user (blurred out) which also doubled as a filter, the % split of queries for things such as Excel/Power BI, a measure of queries by timeframe, a logarithmic scaled display for queries by query duration, and lastly a split of queries by KPI. I intend to tweak these once I receive more data from the trace, but was relatively happy with the information that they were providing.

Untitled (002)

Please let me know if you have any comments.