Callum

Callum Green's Blog

Loop Through Webpages with Multiple Parameters in Power Query

Back in February 2016, I posted a step by step article (found here) on how to loop through a website and dynamically pull down multiple pages into a single table in Power Query. I was recently approached by a gentleman who had read my blog, wanting to enhance the looping process. The requirements were:

Current Problem:

-          The user can currently only loop through Year, having to hard code the Page and Month.

Requirements:

-          Dynamically loop through three webpage parameters and output them into a single table – highlighted below:

http://www.boxofficemojo.com/monthly/?page=1&view=calendargross&yr=2017&month=1&p=.htm

-          Specify the Page, Year and Month to loop through

o   Years 2010 - 2017

o   Months 1 - 12

o   Pages 1 - 10

Parameters

Whilst GUI Parameters are not a new concept in Power Query, they did not exist when I wrote my initial blog.  As a result, I had to code a parameter into the M query and then define it as a function.  A similar concept will be adopted here, but the out of the box parameter functionality makes life so much easier now.

I previously wrote an article explaining how to set up parameters (here), so I will simply show you the output of them below:

clip_image002[4]

Note – all three parameters are set to Text.  This is needed in order to pass the parameters into the website address later on.

Web Connection

The first step is to import the data from the website (shown above) using the web connector.  To pass our parameter values into the URL, click the cog in the first Applied Step – typically called ‘Source’.

clip_image004[4]

This will open a new window, where by design, the hardcoded URL is shown under the Basic radio button.  Select Advanced and you will see additional configuration capabilities.

clip_image006[4]

URL Parts enable you to build up elements of the webpage, whilst dynamically passing in parameters.  The URL Preview will show you how the URL looks with any parameters or URL Parts.  All other options can be ignored for this example.

The URL Parts had to be broken down quite significantly and I will show you both the code and how it looks in the Advanced window.

User Interface:

clip_image007[4]

The sections highlighted in red are the parameters and sit in between some of the hard-coded URL text

Code Breakdown:

-          Text = http://www.boxofficemojo.com/monthly/?page=

-          Parameter = [Page]

-          Text = &view=calendargross&yr=

-          Parameter = [Year]

-          Text = &month=

-          Parameter = [Month]

-          Text = &p=.htm

URL Preview:

http://www.boxofficemojo.com/monthly/?page={Page}&view=calendargross&yr={Year}&month={Month}&p=.htm

Year/Month/Page Permutations Table

Before creating/invoking the function, a table that contains the different years, months and pages is needed.  There are many ways to achieve this, but I used a cross join technique (found here) with three disparate tables.  I followed the end user requirements and end up with a table like below:

clip_image009[4]

This will be the final table where the function is called.

Function

Let’s go back to the original table, where the parameters have been applied to the Web URL.  Right click the table and select Create Function.  Nothing else is required (apart from naming it), as Power Query will automatically package up the table and you end up with a new Function like below:

clip_image010[4]

The last step is to invoke the function.  Go back to the Permutations table (called ‘All Data’ in my example), and go to Add Column > Invoke Custom Function.

clip_image011[4]

When prompted with the next screen, map each parameter to the equivalent column.  The red highlighted values are the columns and yellow contain the parameters.  Both are named the same in my example and make mapping even easier.

clip_image013[4]

The function will loop through each URL permutation, dynamically changing the parameters on execution.  Once it has finished, tidy up any errors (for Years, Pages and/or Months) that do not exist.  Apply all transformations within the Query Editor and that is it – the data is ready for Reporting!  You may need to configure some of the measures (e.g. Total Gross) to be numbers, but that is easy enough to do within the Query Editor.

The example in this blog can be adapted in many ways and illustrates how dynamic Power Query can be – both in Excel and Power BI.

Further Reading

o   Cross Join in Power BI –  http://bit.ly/2ij8zJ1

o   Reza Rad Custom Functions – http://bit.ly/2zz6X2f

Contact Me

If you would like a copy of the PBIX workbook or have any general questions, feel free to leave a comment below.

Twitter:                @DataVizWhizz

Parameterise Multiple Queries in Power Query

The purpose of this article is to demonstrate the simplistic power (excuse the pun) of Power Query and how you can configure numerous queries to run from one single parameter.  The concepts covered are similar to what was discussed in my last blog but instead of configuring iterated outputs, we will use one query to derive the data in a number of other queries.

Please Note:       Substantial ETL was required to prepare the data before any parameterisation could take place.  I will not be showing this in today’s blog but am happy to provide the full workbook upon request.  

Scenario

Continuing on with the football theme, I want to find out the Premier Leagues top goal scorer for any season, ranging from 1992/93 all the way up to the present day.  The aim is to pass a value into a workbook query, which will determine the data shown in our formatted, transformed query.

“This sounds exactly the same as you last blog post”, I hear you say.  Well, it is similar – but I will show you how to pass a given query value in as a parameter into another query.  In other words, we can use basic configuration to drive our output. 

Import Data

Firstly, to follow this example, you need to import the 15/16 premier league leading scorer web data into Power Query.  This can be found at:- http://www.transfermarkt.co.uk/premier-league/torschuetzenliste/wettbewerb/GB1/plus/?saison_id=2015

You will see that there are over 30 tables that could be imported.  Upon further inspection, each player is put into their own table and we will need to bring most of the tables in.  I decided to pick the top 20 players and any associated data we may need, which is Table 3 – Table 23.

Once the tables are opened as queries in Power Query, rename Table 3 to ‘Stats’, Table 4 to Rank1, Table 5 to Rank2, etc. – all the way up to Rank20.  The queries will now look like the below:

clip_image001

Config Table

If you haven’t already noticed in the above web address, ‘saison_id’ is the parameter we can pass different years into.  We will now set up a one value config query (using the Enter Data button) that will be used to derive ‘saison_id’.  Save the query and give your column a meaningful name – see below.

clip_image002

Referencing Config Value

The next step is to actually reference the config value and nest it into the web address connection of all our other queries but for this example, we will use the Rank1 query.  Use the Advanced Editor to edit the M code.  Here you will see the web connection but before we add our table value in as a parameter, we need to declare it – just like you would do in any other programming language.  The code looks as follows:

clip_image004

-          ‘YearParam’ calls the Config query, which I named ObtainYear.

-          ‘YearParamValue’ find the ‘{0}’ positioned record, which in Power Query, means the first row.  ‘[Year]’ is the name of the column and can be seen in a screenshot above.

-          Finally, ‘YearParamValue’ is called in the web connection string, replacing the hardcoded value of ‘2015’.

We can now close the Advanced Editor and test that the code works.  To do so, change the config query value to ‘2014’, by using the Replace Values option in the top menu ribbon. The value in Rank1 should now read as ‘Sergio Aguero’.

clip_image005

Apply the same code to all of the other queries so that all values will change when a different year is entered into the ‘Year’ column of ObtainYear.

Automating Config Query

To advance our workbook even further, we can turn the ObtainYear query into a function.   This will automate the result in all our other queries and enable the export of different datasets.  I have shown you how to create a function in a previous blog, but have still provided the syntax required at the beginning of the M code in Advanced Editor.

clip_image006

Finally, we can Invoke the function by entering a year.  In my example (not previously shown), I have invoked the year ‘1999’ in the ObtainYear function.  The below output is the result of a transformed query that brings all the data from the other queries together.

clip_image008

There it is, a fully dynamic import process that will connect to a web site and import data determined by a query parameter!  This is all achieved using Power Query’s intuitive UI, with the extra help of the more configurable and flexible M native language.

Other Uses

This article may use web scraping as the example, but you can adopt the same approach to looping through text files, database objects or blob storage.  I have now demonstrated how you can use both function and query values as parameters but with the potential to combine the two, creating ETL processes in Power Query could be very powerful.

I am just scratching the surface here and aim to explore more ETL automation very soon.  Rest assured I will share all my findings - so look out for future blogs.