Adatis BI Blogs

Buffer() M Function in Query Editor (Power BI)

Whilst I have been aware of the Buffer() M function in the Query Editor of Power BI for a while, I had never really utilised its capabilities until now.  There are two main types of buffer functionality – Table.Buffer and List.Buffer.  To define them simply, Table.Buffer puts an entire table into memory and prevents change during evaluation, whereas List.Buffer provides a stable list, meaning it has some form of count or order. This blog will focus on the theory behind the general Buffer() M functionality, picking a specific scenario of when it can outperform the standard Query Editor behavior.  I will also demonstrate that this is not always the most effective technique within the same scenario.  The article will not give you a hard and fast rule of when to use the Buffer() function, because it can depend on a number of factors.  These are described further below. Note:    It is assumed you have existing knowledge of Query Folding and if not, one of my previous blogs should help greatly. Scenario I found inspiration from Chris Webb’s example, using the Adventure Works DW 2014 database – available here. The requirements are: 1.       Obtain the first 10,000 rows from FactInternetSales 2.       Remove the majority of columns, retaining ONLY: a.       SalesOrderLineNumber b.      CustomerKey c.       SalesAmount 3.       Rank the current row based on Sales Amount. List.Buffer() Assuming your database exists on a local server and is named AdventureWorksDW2014, copy the following code into the Advanced Editor in the Query Editor screen. let     //Connect to SQL Server     Source = Sql.Database("localhost", "AdventureWorksDW2014"),     //Get first 2000 rows from FactInternetSales     dbo_FactInternetSales = Table.FirstN(           Source{[Schema="dbo",Item="FactInternetSales"]}[Data],           10000),     //Remove unwanted columns     RemoveColumns = Table.SelectColumns(           dbo_FactInternetSales,           {"SalesOrderLineNumber", "CustomerKey","SalesAmount"}),     //Get sorted list of values from SalesAmount column    RankValues = List.Sort(RemoveColumns[SalesAmount], Order.Descending),     //Calculate ranks     AddRankColumn = Table.AddColumn(RemoveColumns , "Rank",           each List.PositionOf(RankValues,[SalesAmount])+1) in     AddRankColumn You can visibly see the rows loading – one by one.  In total, it takes nearly 1 minute to load all off the results. Now let’s use the List.Buffer() function in the RankValues step. Replace: = List.Sort(RemoveColumns[SalesAmount], Order.Descending) With: = List.Buffer(List.Sort(RemoveColumns[SalesAmount], Order.Descending)) The entire transformation (from start to finish) completes in just under 2 seconds!  This is because the List.Buffer function stores the sorted values in memory and therefore, the rank calculation is only evaluated once.  The last query (and previous steps) were being evaluated multiple times.  The M language is both functional and at times, lazy.  In order to prevent the constant re-evaluation, buffer the list into memory.  The final query output is shown below: Query Folding We will implement the same requirements, but this time using Query Folding.  The third step in our current transformation is called ‘Removed Columns’. This is what prevents Query Folding, as this function cannot be interpreted/translated to the native SQL Server T-SQL language.  All steps below are inadvertently not supported either.  The way around this is to write SQL Server View (in SSMS) to import just the fields required from the underlying FactInternetSales Table.  The below query will give you the same result up to the ‘Remove Columns’ step. CREATE VIEW dbo.VwFactInternetSalesAmount AS        SELECT SalesOrderNumber                         ,[CustomerKey]                      ,[SalesOrderLineNumber]                      ,[SalesAmount]                      ,RANK() over( order by [SalesAmount] desc) AS [Rank]        FROM   [AdventureWorksDW2014].[dbo].[FactInternetSales] The final steps are to filter on the top 10,000 rows and Group the Rows together – inserting the following M syntax into the last Applied Step: let     //Connect to SQL Server     Source = Sql.Database(".", "AdventureWorksDW2014"),     // Connect to SQL Server     dbo_FactInternetSales = Source{[Schema="dbo",Item="VwFactInternetSalesAmount"]}[Data],     #"Sorted Rows" = Table.Sort(dbo_FactInternetSales,{{"SalesOrderNumber", Order.Ascending}}),     #"Kept First Rows" = Table.FirstN(#"Sorted Rows",10000),     #"Grouped Rows" = Table.Group(#"Kept First Rows", {"CustomerKey", "SalesOrderLineNumber", "Rank"}, {{"TotalSalesAmount", each List.Sum([SalesAmount]), type number}}) in     #"Grouped Rows"   The query now returns instantly (under 1 second).   Right click on the last applied step and select the View Native Query option, to show the underlying SQL. select top 10000     [rows].[CustomerKey] as [CustomerKey],     [rows].

View Native Query Function in Power Query

Following on from my last blog (found here), I wanted to share a new Query Folding feature in Power Query.  The View Native Query functionality was actually released in the June 2016 Power BI update but it was hardly advertised/promoted by Microsoft. What is View Native Query? In a nutshell, it enables you to check if each of your step by step transformations in Power Query are being folded back to the source.  I have shown techniques of using external monitoring tools, as well as an in-built M function.  However, the View Native Query feature makes our lives so much easier.  I will demonstrate how easy it is to use. Example I loaded up an old Query Folding Power BI workbook in Power Query, which was connecting to the AdventureWorksDW2014 SQL Server database.  If you want a copy of it, feel free to comment below and I’ll get back to you.  However, you can do this with any workbook and need to follow the simple instructions below: 1.    Right click on the last step of your transformation and a new option called View Native Query should be available.  Left click to open the SQL query.   2.    Whilst the above SQL written at source is not the most efficient, query folding still means the data is brought back into Power Query far quicker than it would if done within the Power BI application.   3.   Now copy and paste the code into SSMS, This is exactly the same code that SQL Server Profiler would produce.  Execute the query.     4.    In effect, this is replicating the transformation steps in Power Query.  The below screenshot is taken from Power Query and note the top 10 results from this and the above output. The coolest thing about the View Native Query feature is the fact you can check every step in the Power Query transformation.  I only demonstrated the final step, but you can go back to each one and find the underlying query folding logic that has been applied. Why is View Native Query not available? If you get the below screenshot, the answer is simple – your transformation step doesn’t support Query Folding!  NOTE: View Native Query is ‘greyed’ out. If possible, move any steps that aren’t supported to the end.  Power Query will attempt to query fold, until there is a step it cannot.  All steps after this (even if query folding is possible) will not be supported and could drastically slow down the remaining transformations. Conclusion This feature is something that should have been made available a long time ago.  Whilst the power of query folding is undeniable, the methods of testing it were long and arduous.  With View Native Query, we can easily check where, when and more importantly, when folding is NOT happening.  Debugging slow performing transformations is so much easier now and you can in fact, swap steps around and folding will still persist where possible. I would advise every Power Query developer to make use of this simple feature.  If there are any other Query Folding techniques, tips or tricks out there, please contact me.  Look out for future blogs, where I will be looking at the performance of query folding on very large data volumes (100 million + records) and whether the automated SQL Server code eventually becomes too unwieldy and poorly performing.   

SQL PASS Summit 2014 – Kick Off

Day 1 has kicked off in Seattle, a remarkable city. Having arrived a week early, I’ve had plenty of time to check out the sights and the food and have enjoyed it immensely - a brilliant venue for PASS! There were a few announcements at this morning’s Keynote, mostly regarding Azure. Azure SQL Databases are gaining larger index handling, parallel queries, extended events and in-memory columnstore for data marts. Joseph Sirosh gave a talk about Machine Learning & Information management showing a cool example of Azure Stream Analytics using Microsoft Kinect sensor information of customer interactions in a shop being uploaded straight into Power Map! I am looking forward to hearing more on Machine Learning. There are also a handful of great improvements for Power BI. I am most looking forward to the new Live Ops Dashboards and drill-through actions! Combo Reports also look promising… Moving onto the first session, I chose to attend ‘What’s new in Microsoft Power Query for Excel’. As it turns out there’s not a massive amount of new stuff – some new data sources and a tick box when you do a merge to remove name prefixes. However one of these new sources is the long-awaited Analysis Services data source. The new ODBC Data Source is a great addition also. There was a mention regarding the possibility of a decoupled M-query SSIS component! We probably won’t hear of anything until later in 2015, unfortunately. I would say this was not a level 300 session, more like 100/200. The second session was ‘SQL Server 2014 Always On (High Availability and Disaster Recovery)’: a nice overview of what was brought in in 2012 and the newer 2014 features – these including an increased maximum number of secondary replicas, increased availability of readable secondary’s and the Add Azure Replica Wizard. Despite not being a DBA and it being a level 300 session, I found it easy to follow and absorbable. I feel many of the DBAs in the room may not have taken away any information they would have not already have known, however. Niko Neugebauer gave a fantastic, in depth session on ‘ETL Patterns with Clustered Columnstore Indexes’. It was a fast moving talk, despite the time spent waiting for some executions. It demanded your full attention! Definitely worthy of its 400 level. It left me a little tired for Marco Russo’s ‘DAX Patterns’ session which showed some examples of workarounds for common tabular weaknesses like distinct counts for Type 2 slowly changing dimensions and cumulative quantities. Overall it was a strong day. I am looking forward to tomorrow. More to follow…

SQL PASS Summit – Day 1

Having spent several days enjoying Seattle and getting to know and enjoy the city the conference is now underway. Yesterday there were some introductory meetings where we got to meet some of the other attendees and get a feel for the environment, today everything was in full swing. The morning started with the opening keynote presented to a huge audience. There were demonstrations of some really exciting new features – in one we observed Kinect being used with Power Map in order to track customer movements to observe customer interest in different parts of the store. We saw some great looking Power BI dashboarding functionality with the ability to drillthrough into detailed reports. As well as this we saw some further enhancements SQL Server Azure and on-premise integration including a new stretch functionality which will allow users to seamlessly ‘stretch’ their data into the cloud, keeping the most frequently queried records on premise and the other ones in the cloud. We also saw a Columnstore index being created on an in memory table! Miguel Lopes gave a talk on the new features in Power Query where we saw the ability to query using ODBC and support for Analysis services connections, on the whole though whilst I think the ODBC will be particularly useful for some situations, much of this talk was giving an overview of Power query as a whole rather than demonstrating new functionality. The integration of SSIS and power query in future was mentioned, however no dates have been set for this and we are told that this may (or may not) be available at some point in 2015. Jen Stirrup gave an interesting overview of some of the features available in R, the session was so popular that many people had to sit round the front on the floor! Niko Neugebauer’s contagious enthusiasm when presenting his session on ETL Patterns with Clustered Columnstore indexes was great to see and I picked up a few tricks here that were quite handy when working in 2014 environments. I also very much enjoyed John Welch’s session in Continuous Delivery for Data Warehouses and Marts, this is something I myself have been involved with a lot recently and it was very interesting to see his methods of achieving this and also to discover that in many cases we were both doing things in the same way :) Overall the day has been very interesting, we have seen some exciting new features announced today and some significant enhancements to the Power BI product range, it seemed to me for some time that the lack of dashboarding functionality in Power View was holding it back and I think many people will be very pleased with this new functionality and the further enhancements to the Azure service.

Dynamic Top N rows in Power Query

I had a situation recently where I was importing a CSV file into Power Query. This file had some summary data at the beginning of the file that I wanted to skip. This is perfectly easy enough using the Remove Top Rows step. However a problem I soon encountered was that if the files summary data varied in the number of rows then the Remove Top Rows step produced errors, especially if you then want to promote a row to be your header in a later step. To get around this you can search for a particular string that is expected to appear, perhaps one that will become one of your headers or signify the start of a section of the file. In this example I am using an Azure Usage file. I want to cut out the summary data at the top and start with the Daily Usage data. Below is an abstract of a large query, starting at one of the file import steps: 1. Source = Csv.Document(File.Contents("NameRemoved.csv"),null,",",null,1252) 2. #"FilteredRows" = Table.SelectRows(Source, each Text.Contains([Column1], "Daily Usage")), 3. #"Position of Daily Usage" = Table.PositionOf(Source, FilteredRows {0}), 4. #"TopRemoved" = Table.Skip(Source, (#"Position of Daily Usage" + 1)), 5. #"First Row as Header" = Table.PromoteHeaders(#"TopRemoved"), Pay attention to where steps 2, 3 and 4 all reference the step 1. These steps can be added using the advanced query editor. Breaking it down by steps starting with Step 2 (#”FilteredRows”); this filtered a particular column, in this case column 1, by the string you are looking for, e.g. “Daily Usage”. The result of this is inserted into a table using Table.SelectRows. Step 3 (#”Position of Daily Usage”) then finds the position of the 1st row of the table from Step 2, within the imported data in Step 1. Table.PositionOf requires a table as the input. The {0} denotes the first row in the table. The row number is then used in Step 4 (#”TopRemoved”), in the Table.Skip function. In this example I wanted to remove the row with the Column 1 String “Daily Usage” so I added a + 1.

Importing multiple files into Power Query

You can automatically append data from multiple files by choosing to place them in a folder and query the folder. You can choose a folder and it will also search subfolders. You can drop more files into the folder (or remove some) at a later stage, refresh the query and the new files will appear automatically. Once you select From Folder you will be asked to specify the folder location with the option to browse to it. Once selected, Power Query will find suitable files and list them in the query editor window. There will be a column named Content. Like the name suggests this is the contents of the file. There are other columns with details of the file like the name, extension and file path. There is also an attributes column that can be expanded to expose more properties. At the top of the Content column there is a button. If you click this it will import your files and combine them. However when you use this method, you lose all the information about the files themselves and are left with just the contents. What if you want to use the information about the file? To get around that you can create a custom column like so: This example is using csv files. Creating this adds a table object into each row of the custom column. From there you can expand the contents alongside the file information. The same can be done for files stored on SharePoint. You will be prompted to provide the URL for the SharePoint site. This will be the root URL. When you do this, a navigator window will open up but your files won’t appear. If you click on the root and click edit you can see why in the query editor. You’ll notice that the Source query step is in the form of this: = SharePoint.Tables("https://SharePoint/Site/") Change this to SharePoint.Files. This will list all the files within the site! Now you can filter this list by file extension or by Folder Path (using the ‘Begins with’ Text Filter), for example.