EmmaStewart

Emma Stewart's Blog

How to do row counts in Azure SQL Data Warehouse

Continuing on from my last couple of blog post about working with the Azure Data Warehouse, here is another issue which has came up during development and is handy to know if you are going to be developing a solution! Smile

Keeping track of how much data has been loaded plays a key part in a BI Solution. It is important to know for a given load for example, how many rows were inserted, updated or deleted. Traditionally, we were able to use the @@ROWCOUNT function

@@ROWCOUNT returns the number of rows affected by the last statement.

Unfortunately, in Azure SQL Data Warehouse @@ROWCOUNT is not supported.

How does it work?

In the Microsoft Azure documentation,they do provide a workaround for this, please see here  for more information and a list of other unsupported functions.

They suggest creating a stored procedure which will query the system tables sys.dm_pdw_exec_requests and sys.dm_pdw_request_steps in order to get the row count for the last SQL statement for the current session.

sys.dm_pdw_exec_requests holds information about all requests currently or recently active in SQL Data Warehouse. It lists one row per request/query. holds information about all SQL Server query distributions as part of a SQL step in the query.

sys.dm_pdw_request_steps holds information about all steps that are part of a given request or query in SQL Data Warehouse. It lists one row per query step.

This is an example of what the stored procedure would look like:

Row_Count

 

As you can see above, we pass through a ‘LabelContext’ parameter. A Label is a concept in Azure SQL Data Warehouse that allows us to provide our queries with a text label that is easy to understand and we can find that label in the DMVs. For example:

label

Here, we have given our query the label ‘Test Label’ and if we wanted to find information about this query in the DMVs we can search using this label like:

label2

 

So, putting this into context, in the ETL we are calling stored procedures to load our data (for example between clean and warehouse). Therefore, within the stored procedure we have the query written to insert or update the data and we would give this query a label. Then, within the same stored procedure, we would call the Row Count stored procedure, passing through the Label as parameter so we can retrieve the row count.

 

Be careful though!

On my current project we have come across times where we haven’t been able to get the row count back. This is because the sys.dm_pdw_exec_requests DMV we are querying is transient and only stores  the last 10,000 queries executed. So when we were running the query above, our requests were no longer there and we were getting nothing back! The table holds data on all queries that go against the distribution nodes and statistics gathering for each of the nodes. So in order to try and limit the records in this table, keep the nesting level of queries as low as possible to avoid the table blowing up and not having the data you need in it!

 

Stay tuned for another blog about working with Azure Data Warehouse!