Adatis

Adatis BI Blogs

Loop Through Webpages with Multiple Parameters in Power Query

Back in February 2016, I posted a step by step article (found here) on how to loop through a website and dynamically pull down multiple pages into a single table in Power Query. I was recently approached by a gentleman who had read my blog, wanting to enhance the looping process. The requirements were: Current Problem: -          The user can currently only loop through Year, having to hard code the Page and Month. Requirements: -          Dynamically loop through three webpage parameters and output them into a single table – highlighted below: http://www.boxofficemojo.com/monthly/?page=1&view=calendargross&yr=2017&month=1&p=.htm -          Specify the Page, Year and Month to loop through o   Years 2010 - 2017 o   Months 1 - 12 o   Pages 1 - 10 Parameters Whilst GUI Parameters are not a new concept in Power Query, they did not exist when I wrote my initial blog.  As a result, I had to code a parameter into the M query and then define it as a function.  A similar concept will be adopted here, but the out of the box parameter functionality makes life so much easier now. I previously wrote an article explaining how to set up parameters (here), so I will simply show you the output of them below: Note – all three parameters are set to Text.  This is needed in order to pass the parameters into the website address later on. Web Connection The first step is to import the data from the website (shown above) using the web connector.  To pass our parameter values into the URL, click the cog in the first Applied Step – typically called ‘Source’. This will open a new window, where by design, the hardcoded URL is shown under the Basic radio button.  Select Advanced and you will see additional configuration capabilities. URL Parts enable you to build up elements of the webpage, whilst dynamically passing in parameters.  The URL Preview will show you how the URL looks with any parameters or URL Parts.  All other options can be ignored for this example. The URL Parts had to be broken down quite significantly and I will show you both the code and how it looks in the Advanced window. User Interface: The sections highlighted in red are the parameters and sit in between some of the hard-coded URL text Code Breakdown: -          Text = http://www.boxofficemojo.com/monthly/?page= -          Parameter = [Page] -          Text = &view=calendargross&yr= -          Parameter = [Year] -          Text = &month= -          Parameter = [Month] -          Text = &p=.htm URL Preview: http://www.boxofficemojo.com/monthly/?page={Page}&view=calendargross&yr={Year}&month={Month}&p=.htm Year/Month/Page Permutations Table Before creating/invoking the function, a table that contains the different years, months and pages is needed.  There are many ways to achieve this, but I used a cross join technique (found here) with three disparate tables.  I followed the end user requirements and end up with a table like below: This will be the final table where the function is called. Function Let’s go back to the original table, where the parameters have been applied to the Web URL.  Right click the table and select Create Function.  Nothing else is required (apart from naming it), as Power Query will automatically package up the table and you end up with a new Function like below: The last step is to invoke the function.  Go back to the Permutations table (called ‘All Data’ in my example), and go to Add Column > Invoke Custom Function. When prompted with the next screen, map each parameter to the equivalent column.  The red highlighted values are the columns and yellow contain the parameters.  Both are named the same in my example and make mapping even easier. The function will loop through each URL permutation, dynamically changing the parameters on execution.  Once it has finished, tidy up any errors (for Years, Pages and/or Months) that do not exist.  Apply all transformations within the Query Editor and that is it – the data is ready for Reporting!  You may need to configure some of the measures (e.g. Total Gross) to be numbers, but that is easy enough to do within the Query Editor. The example in this blog can be adapted in many ways and illustrates how dynamic Power Query can be – both in Excel and Power BI. Further Reading o   Cross Join in Power BI –  http://bit.ly/2ij8zJ1 o   Reza Rad Custom Functions – http://bit.ly/2zz6X2f Contact Me If you would like a copy of the PBIX workbook or have any general questions, feel free to leave a comment below. Twitter:                @DataVizWhizz

Buffer() M Function in Query Editor (Power BI)

Whilst I have been aware of the Buffer() M function in the Query Editor of Power BI for a while, I had never really utilised its capabilities until now.  There are two main types of buffer functionality – Table.Buffer and List.Buffer.  To define them simply, Table.Buffer puts an entire table into memory and prevents change during evaluation, whereas List.Buffer provides a stable list, meaning it has some form of count or order. This blog will focus on the theory behind the general Buffer() M functionality, picking a specific scenario of when it can outperform the standard Query Editor behavior.  I will also demonstrate that this is not always the most effective technique within the same scenario.  The article will not give you a hard and fast rule of when to use the Buffer() function, because it can depend on a number of factors.  These are described further below. Note:    It is assumed you have existing knowledge of Query Folding and if not, one of my previous blogs should help greatly. Scenario I found inspiration from Chris Webb’s example, using the Adventure Works DW 2014 database – available here. The requirements are: 1.       Obtain the first 10,000 rows from FactInternetSales 2.       Remove the majority of columns, retaining ONLY: a.       SalesOrderLineNumber b.      CustomerKey c.       SalesAmount 3.       Rank the current row based on Sales Amount. List.Buffer() Assuming your database exists on a local server and is named AdventureWorksDW2014, copy the following code into the Advanced Editor in the Query Editor screen. let     //Connect to SQL Server     Source = Sql.Database("localhost", "AdventureWorksDW2014"),     //Get first 2000 rows from FactInternetSales     dbo_FactInternetSales = Table.FirstN(           Source{[Schema="dbo",Item="FactInternetSales"]}[Data],           10000),     //Remove unwanted columns     RemoveColumns = Table.SelectColumns(           dbo_FactInternetSales,           {"SalesOrderLineNumber", "CustomerKey","SalesAmount"}),     //Get sorted list of values from SalesAmount column    RankValues = List.Sort(RemoveColumns[SalesAmount], Order.Descending),     //Calculate ranks     AddRankColumn = Table.AddColumn(RemoveColumns , "Rank",           each List.PositionOf(RankValues,[SalesAmount])+1) in     AddRankColumn You can visibly see the rows loading – one by one.  In total, it takes nearly 1 minute to load all off the results. Now let’s use the List.Buffer() function in the RankValues step. Replace: = List.Sort(RemoveColumns[SalesAmount], Order.Descending) With: = List.Buffer(List.Sort(RemoveColumns[SalesAmount], Order.Descending)) The entire transformation (from start to finish) completes in just under 2 seconds!  This is because the List.Buffer function stores the sorted values in memory and therefore, the rank calculation is only evaluated once.  The last query (and previous steps) were being evaluated multiple times.  The M language is both functional and at times, lazy.  In order to prevent the constant re-evaluation, buffer the list into memory.  The final query output is shown below: Query Folding We will implement the same requirements, but this time using Query Folding.  The third step in our current transformation is called ‘Removed Columns’. This is what prevents Query Folding, as this function cannot be interpreted/translated to the native SQL Server T-SQL language.  All steps below are inadvertently not supported either.  The way around this is to write SQL Server View (in SSMS) to import just the fields required from the underlying FactInternetSales Table.  The below query will give you the same result up to the ‘Remove Columns’ step. CREATE VIEW dbo.VwFactInternetSalesAmount AS        SELECT SalesOrderNumber                         ,[CustomerKey]                      ,[SalesOrderLineNumber]                      ,[SalesAmount]                      ,RANK() over( order by [SalesAmount] desc) AS [Rank]        FROM   [AdventureWorksDW2014].[dbo].[FactInternetSales] The final steps are to filter on the top 10,000 rows and Group the Rows together – inserting the following M syntax into the last Applied Step: let     //Connect to SQL Server     Source = Sql.Database(".", "AdventureWorksDW2014"),     // Connect to SQL Server     dbo_FactInternetSales = Source{[Schema="dbo",Item="VwFactInternetSalesAmount"]}[Data],     #"Sorted Rows" = Table.Sort(dbo_FactInternetSales,{{"SalesOrderNumber", Order.Ascending}}),     #"Kept First Rows" = Table.FirstN(#"Sorted Rows",10000),     #"Grouped Rows" = Table.Group(#"Kept First Rows", {"CustomerKey", "SalesOrderLineNumber", "Rank"}, {{"TotalSalesAmount", each List.Sum([SalesAmount]), type number}}) in     #"Grouped Rows"   The query now returns instantly (under 1 second).   Right click on the last applied step and select the View Native Query option, to show the underlying SQL. select top 10000     [rows].[CustomerKey] as [CustomerKey],     [rows].

ETL in Power Query

If you have read any of my other Power Query blogs, you will have seen the results of transformed an [More]