Dan

Dan Evans' Blog

SQL PASS Summit 2014 – Kick Off

Day 1 has kicked off in Seattle, a remarkable city. Having arrived a week early, I’ve had plenty of time to check out the sights and the food and have enjoyed it immensely - a brilliant venue for PASS!

There were a few announcements at this morning’s Keynote, mostly regarding Azure. Azure SQL Databases are gaining larger index handling, parallel queries, extended events and in-memory columnstore for data marts. Joseph Sirosh gave a talk about Machine Learning & Information management showing a cool example of Azure Stream Analytics using Microsoft Kinect sensor information of customer interactions in a shop being uploaded straight into Power Map! I am looking forward to hearing more on Machine Learning.

There are also a handful of great improvements for Power BI. I am most looking forward to the new Live Ops Dashboards and drill-through actions! Combo Reports also look promising…

Moving onto the first session, I chose to attend ‘What’s new in Microsoft Power Query for Excel’. As it turns out there’s not a massive amount of new stuff – some new data sources and a tick box when you do a merge to remove name prefixes. However one of these new sources is the long-awaited Analysis Services data source. The new ODBC Data Source is a great addition also. There was a mention regarding the possibility of a decoupled M-query SSIS component! We probably won’t hear of anything until later in 2015, unfortunately. I would say this was not a level 300 session, more like 100/200.

The second session was ‘SQL Server 2014 Always On (High Availability and Disaster Recovery)’: a nice overview of what was brought in in 2012 and the newer 2014 features – these including an increased maximum number of secondary replicas, increased availability of readable secondary’s and the Add Azure Replica Wizard. Despite not being a DBA and it being a level 300 session, I found it easy to follow and absorbable. I feel many of the DBAs in the room may not have taken away any information they would have not already have known, however.

Niko Neugebauer gave a fantastic, in depth session on ‘ETL Patterns with Clustered Columnstore Indexes’. It was a fast moving talk, despite the time spent waiting for some executions. It demanded your full attention! Definitely worthy of its 400 level. It left me a little tired for Marco Russo’s ‘DAX Patterns’ session which showed some examples of workarounds for common tabular weaknesses like distinct counts for Type 2 slowly changing dimensions and cumulative quantities.

Overall it was a strong day. I am looking forward to tomorrow. More to follow…

Dynamic Top N rows in Power Query

I had a situation recently where I was importing a CSV file into Power Query. This file had some summary data at the beginning of the file that I wanted to skip. This is perfectly easy enough using the Remove Top Rows step. However a problem I soon encountered was that if the files summary data varied in the number of rows then the Remove Top Rows step produced errors, especially if you then want to promote a row to be your header in a later step.

To get around this you can search for a particular string that is expected to appear, perhaps one that will become one of your headers or signify the start of a section of the file.

clip_image002

In this example I am using an Azure Usage file. I want to cut out the summary data at the top and start with the Daily Usage data.

Below is an abstract of a large query, starting at one of the file import steps:

1. Source = Csv.Document(File.Contents("NameRemoved.csv"),null,",",null,1252)

2. #"FilteredRows" = Table.SelectRows(Source, each Text.Contains([Column1], "Daily Usage")),

3. #"Position of Daily Usage" = Table.PositionOf(Source, FilteredRows {0}),

4. #"TopRemoved" = Table.Skip(Source, (#"Position of Daily Usage" + 1)),

5. #"First Row as Header" = Table.PromoteHeaders(#"TopRemoved"),

Pay attention to where steps 2, 3 and 4 all reference the step 1. These steps can be added using the advanced query editor.

Breaking it down by steps starting with Step 2 (#”FilteredRows”); this filtered a particular column, in this case column 1, by the string you are looking for, e.g. “Daily Usage”. The result of this is inserted into a table using Table.SelectRows.

clip_image004

Step 3 (#”Position of Daily Usage”) then finds the position of the 1st row of the table from Step 2, within the imported data in Step 1. Table.PositionOf requires a table as the input. The {0} denotes the first row in the table.

clip_image006

The row number is then used in Step 4 (#”TopRemoved”), in the Table.Skip function. In this example I wanted to remove the row with the Column 1 String “Daily Usage” so I added a + 1.

clip_image008

Importing multiple files into Power Query

You can automatically append data from multiple files by choosing to place them in a folder and query the folder. You can choose a folder and it will also search subfolders. You can drop more files into the folder (or remove some) at a later stage, refresh the query and the new files will appear automatically.

image

Once you select From Folder you will be asked to specify the folder location with the option to browse to it.

Once selected, Power Query will find suitable files and list them in the query editor window.

clip_image002

There will be a column named Content. Like the name suggests this is the contents of the file. There are other columns with details of the file like the name, extension and file path. There is also an attributes column that can be expanded to expose more properties.

At the top of the Content column there is a button. If you click this it will import your files and combine them.

clip_image004

However when you use this method, you lose all the information about the files themselves and are left with just the contents. What if you want to use the information about the file? To get around that you can create a custom column like so:

clip_image006

This example is using csv files. Creating this adds a table object into each row of the custom column. From there you can expand the contents alongside the file information.

The same can be done for files stored on SharePoint.

clip_image008

You will be prompted to provide the URL for the SharePoint site. This will be the root URL.

When you do this, a navigator window will open up but your files won’t appear. If you click on the root and click edit you can see why in the query editor.

You’ll notice that the Source query step is in the form of this:

= SharePoint.Tables("https://SharePoint/Site/")

Change this to SharePoint.Files. This will list all the files within the site! Now you can filter this list by file extension or by Folder Path (using the ‘Begins with’ Text Filter), for example.