Callum

Callum Green's Blog

Handling Web Page Errors in Power BI Query Editor

I was recently asked if it was possible to handle Error Rows in the Query Editor and more specifically, to web pages that do not exist from source. The user broached me with this question after following a previous blog on looping through multiple web pages in Query Editor – found here.

In my blog post, there were no errors from the loop output but in the user’s workbook, they were losing data directly below the expanded error row. My first suggestion was to use the Remove Errors UI option, which would delete the problematic row. However, they wanted a way of treating the errors as an Unknown Member and did not want to lose the data entirely

It is assumed consumers of this blog already have knowledge of the Power BI Query Editor and the UI functions used in the examples.  

Scenario

I created a new workbook that connects to local Council Facebook pages. Each page has a unique Object ID, which will be used as the parameter in the loop. The Council “Camberley” deliberately contains an invalid Object ID. I then proceeded to create a Parameter and Function, replicating the exact steps from my previous blog.

When I invoke the function (through the use of a Custom Column), the following is produced:

clip_image001[4]

As you can see, Camberley Council produces an error in the fnInvokeCouncils column. If we expand the contents (highlighted in yellow), the Facebook page data appears. Upon further inspection, the Farnham and Rushmoor council data are available, but Camberley (incorrect Object ID) and Guildford are not.

clip_image003[4]

The error message is a little misleading but let’s save the debugging debate for another day. The key observation is “Guildford” data is not available, simply because it comes after “Camberley” in the list. Whilst we want to see errors in a Query, we do not want them causing data loss.

Resolution

As I mentioned at the beginning of this article, using the Remove Errors function would prevent the loss of Guildford data. However, the user needs to handle errors as Unknown Members and conform to a typical Kimball Data Warehouse. 

I am sure there are many ways to fulfil the requirement, but here is how I approached it:

1.       Duplicate the existing ‘Councils’ query, naming it ‘Councils Error Rows’.

2.       Switch back to the ‘Councils’ query and Remove Errors, leaving only three records:

 

clip_image005[4]

3.       Expand the fnInvokeCouncils column, opening up the underlying fields and data:


clip_image007[4]

4.       In the ‘Council Error Rows’ query, apply the Replace Errors UI function - inserting the string “Validation Failed”.

5.       Add a Custom Column, writing the following M:

if [fnInvokeCouncils] = "Validation Failed" then 1 else 0

This is a simple IF statement that sets the error rows to 1.

6.       Now filter the selection to only display ErrorRows with the value of 1. This is achieved by using the Filter Rows UI function. The ‘Council Error Rows’ query now looks like the following:


clip_image008[4]

7.       The columns must match the ‘Councils’ query, meaning 4 new Custom Columns are needed. We can hardcode the values and remove any unwanted columns.

 

clip_image010[4]

 

8.       Right click on the previously modified ‘Councils’ query select Reference. Name the query ‘All Councils’. This makes it easier to track the transformations and persists any future changes made to the raw data.  

9.       Within the ‘All Council’ query, select Append Query transformation. Choose ‘Council Error Rows’ as the table to append and click OK.
 

10.   We are now left with a Union of both datasets, containing the Unknown Member and data from all other Councils.

clip_image012[4]

11.   The Unknown Member record is visible within the final Query.

 

clip_image014[4]

 

Summary

I have shown you how to get around two different problems with Error Rows in the Power BI Query Editor. The first is how to retain all correct data, whilst the second is keeping error records and inserting them back into the dataset as an Unknown Member. Both methods are dynamic, meaning that if we added a new Council called ‘Basingstoke’, we would see the underlying data regardless of whether the underlying Facebook Object ID exists or not.

Whilst none of the transformations in this blog are overly technical, there are quite a few hoops to jump through to retain all data from a web page/Facebook. Having said that, I am sure there are a few other ways people could approach this problem. I would be really interested to speak to anyone who does have an alternative solution.

Further Reading

Query Editor Basics (Power BI blog) – http://bit.ly/2pwBdo1
Unknown Members in Data Warehousing -
http://bit.ly/2qTefwe
Loop through Multiple Web Pages using Power Query - http://bit.ly/2q3a8Nc

Contact Me

If you would like a copy of the workbook containing the examples or want to know more about the Query Editor within Power BI, please leave a comment below. My Twitter details are also provided.

Twitter:  @CallumGAdatis

Parameterise Multiple Queries in Power Query

The purpose of this article is to demonstrate the simplistic power (excuse the pun) of Power Query and how you can configure numerous queries to run from one single parameter.  The concepts covered are similar to what was discussed in my last blog but instead of configuring iterated outputs, we will use one query to derive the data in a number of other queries.

Please Note:       Substantial ETL was required to prepare the data before any parameterisation could take place.  I will not be showing this in today’s blog but am happy to provide the full workbook upon request.  

Scenario

Continuing on with the football theme, I want to find out the Premier Leagues top goal scorer for any season, ranging from 1992/93 all the way up to the present day.  The aim is to pass a value into a workbook query, which will determine the data shown in our formatted, transformed query.

“This sounds exactly the same as you last blog post”, I hear you say.  Well, it is similar – but I will show you how to pass a given query value in as a parameter into another query.  In other words, we can use basic configuration to drive our output. 

Import Data

Firstly, to follow this example, you need to import the 15/16 premier league leading scorer web data into Power Query.  This can be found at:- http://www.transfermarkt.co.uk/premier-league/torschuetzenliste/wettbewerb/GB1/plus/?saison_id=2015

You will see that there are over 30 tables that could be imported.  Upon further inspection, each player is put into their own table and we will need to bring most of the tables in.  I decided to pick the top 20 players and any associated data we may need, which is Table 3 – Table 23.

Once the tables are opened as queries in Power Query, rename Table 3 to ‘Stats’, Table 4 to Rank1, Table 5 to Rank2, etc. – all the way up to Rank20.  The queries will now look like the below:

clip_image001

Config Table

If you haven’t already noticed in the above web address, ‘saison_id’ is the parameter we can pass different years into.  We will now set up a one value config query (using the Enter Data button) that will be used to derive ‘saison_id’.  Save the query and give your column a meaningful name – see below.

clip_image002

Referencing Config Value

The next step is to actually reference the config value and nest it into the web address connection of all our other queries but for this example, we will use the Rank1 query.  Use the Advanced Editor to edit the M code.  Here you will see the web connection but before we add our table value in as a parameter, we need to declare it – just like you would do in any other programming language.  The code looks as follows:

clip_image004

-          ‘YearParam’ calls the Config query, which I named ObtainYear.

-          ‘YearParamValue’ find the ‘{0}’ positioned record, which in Power Query, means the first row.  ‘[Year]’ is the name of the column and can be seen in a screenshot above.

-          Finally, ‘YearParamValue’ is called in the web connection string, replacing the hardcoded value of ‘2015’.

We can now close the Advanced Editor and test that the code works.  To do so, change the config query value to ‘2014’, by using the Replace Values option in the top menu ribbon. The value in Rank1 should now read as ‘Sergio Aguero’.

clip_image005

Apply the same code to all of the other queries so that all values will change when a different year is entered into the ‘Year’ column of ObtainYear.

Automating Config Query

To advance our workbook even further, we can turn the ObtainYear query into a function.   This will automate the result in all our other queries and enable the export of different datasets.  I have shown you how to create a function in a previous blog, but have still provided the syntax required at the beginning of the M code in Advanced Editor.

clip_image006

Finally, we can Invoke the function by entering a year.  In my example (not previously shown), I have invoked the year ‘1999’ in the ObtainYear function.  The below output is the result of a transformed query that brings all the data from the other queries together.

clip_image008

There it is, a fully dynamic import process that will connect to a web site and import data determined by a query parameter!  This is all achieved using Power Query’s intuitive UI, with the extra help of the more configurable and flexible M native language.

Other Uses

This article may use web scraping as the example, but you can adopt the same approach to looping through text files, database objects or blob storage.  I have now demonstrated how you can use both function and query values as parameters but with the potential to combine the two, creating ETL processes in Power Query could be very powerful.

I am just scratching the surface here and aim to explore more ETL automation very soon.  Rest assured I will share all my findings - so look out for future blogs.