Adatis BI Blogs

Lightweight Testing in Azure through Pester & Azure Automation

I’m currently working on a relatively lightweight PaaS modern data warehouse project, and as part of the build there was a requirement to add automated testing into the platform. While I was thinking about how I could achieve this - I remembered during the 2018 SQLBits I attended there was a DevOps session by the team in which they did something very similar. They used PowerShell to add tests into part of a continuous integration pipeline hosted on VSTS through a framework called Pester. In this blog, I will talk about my implementation of Pester, but specifically look at it through the context of hosting it in a Azure PaaS environment (something which adds a small amount of overhead to its implementation and took me a while to get working).   What is Pester? Pester is basically a ubiquitous test and mock framework for PowerShell (hosted on GitHub here). It can literally be used to test anything - as long as you can invoke whatever you are trying to test through PowerShell. This includes SQL statements, Azure resources, Windows resources, etc. Pester provides the language to DEFINE and EXECUTE the test case. I won’t go through the numerous advantages of automated testing, but lets just say its pretty useful to have it as part of a platform to alert you about issues early. I won’t talk about it so much in the context of DevOps CI/CD, but its got use cases here too. Microsoft is also on board with the framework, and it comes pre-installed with Windows builds nowadays.   How do I define a test case? Describe defines a group of tests, and all test files must contain at least one describe block. It then defines a single test case, so in my example below it will invoke a SQL statement to return a result which is then tested. The test is then passed or failed using an assertion such as Should be 0 or Should BeNullOrEmpty based on the object passed to it. You can also use other variations such as ShouldBeExactly or even Should Throw to pass a test based on a terminating error. In the code segment below, I’m checking keys that were unable to lookup against our dimensions from the fact tables. One thing to be careful of (as there was no related error message) was that you will need to alias your COUNT or SUM in the SQL query and then call that as a property of your result object afterwards to check against. Describe "Warehouse Award Fact -1 Keys" { It "Award Fact - Supplier -1 Keys" { $Query = "SELECT COUNT(*) AS AwardSupplier FROM Warehouse.FctAward WHERE SupplierKey = -1 " $Result = Invoke-Sqlcmd -ServerInstance $AsqlServerName -Database $AsqlDatabaseName -Query $Query -Username $AsqlUsername -Password $AsqlPassword $Result.AwardSupplier | Should be 0 } It "Award Fact - Contract -1 Keys" { $Query = "SELECT COUNT(*) AS AwardContract FROM Warehouse.FctAward WHERE ContractKey = -1 " $Result = Invoke-Sqlcmd -ServerInstance $AsqlServerName -Database $AsqlDatabaseName -Query $Query -Username $AsqlUsername -Password $AsqlPassword $Result.AwardContract | Should be 0 } }   As part of the definition I decided it would be best to spit the test blocks up into logical groups, and then keep them all within the same file. However, if you wanted to test both data and something else such as Azure resources, then I would consider splitting out the tests into separate files to keep things modular. In terms of tests, to give you an idea of the sort of things we were keen to monitor, I created the following: Stage row counts match clean row counts Business keys are distinct within source Business keys were distinct within dimensions Total £ amounts matched between source and warehouse No MDS errors existed on data import Fact table unknown keys for each dimension (above) Misc tests that tripped us up during early build phases based on assumptions These tests were added to over time and formed a group of acceptance tests for each DW run. As data is always changing, its good to have these to validate your initial premises put in place around a data set. While I didn’t need to use it, there are also commands to mock variables therefore putting the PowerShell code into a specific state for a particular test. This is helpful to avoid changing the real environment while replicating states.   Standing up Pester in Azure Implementing the pester framework with traditional resources is already well documented and very simple to get started in both a manual and automated way. Unfortunately as I was working on a fully PaaS project, I needed to implement the framework within Azure. This did not seem to be as well documented. To help me out with the task, I went straight to Azure Automation – for anyone that has not used this before, its basically a way to host PowerShell scripts within Runbooks. While it was fairly intuitive to import the Pester framework from their GitHub repo, there is also an option with Automation to select from a modules gallery. Pester is part of this gallery, so for ease of use, I would download it here which also makes maintenance slightly easier. Its also worth mentioning that its also worth defining important variables such as DB connections, etc outside of the Runbook within Automation and then passing them in as parameters. Think of this as similar to environment variables on a SSIS project. Credentials such as our Runas account are also defined externally to the runbook.     Now on to the important bit; defining the testing execution script. The first body of code will connect to Azure using the Runas account as a service principal. This allows us to execute the script without using our own credentials. Setup of this account is a whole separate blog in itself so I won’t go into that detail here. Once this has been defined, the script will then connect into Blob storage to extract the tests (defined above), and place it in the local Temp directory of Azure Automation. This was one of my major blockers initially as I was trying to find a way to Invoke the test scripts from within the same PS script as the execution, therefore not needing to host the tests anywhere and keep everything contained in a runbook or parent/child runbooks. Unfortunately (as far as I can tell) the Invoke-Pester command which executes the tests needs a separate file location to be defined. Either way, we had a blob storage area already setup for the project so this was not really an overhead to create a new container to store the tests in. Automation uses a Temp folder internally to store anything, so I used this to land the tests ready for processing. # Get required variables $AutomationConnectionName = Get-AutomationVariable -Name 'AutomationConnectionName' # Connect to an automation connection to get TenantId and SubscriptionId $Connection = Get-AutomationConnection -Name $AutomationConnectionName $TenantId = $Connection.TenantId $SubscriptionId = $Connection.SubscriptionId # Login to Azure using AzureRunAsConnection Connect-AzureRmAccount -ServicePrincipal -TenantId $Connection.TenantId -ApplicationId $Connection.ApplicationId -CertificateThumbprint $Connection.CertificateThumbprint # Connect to Storage Account to get tests script $resourceGroup = Get-AutomationVariable -Name 'ResourceGroupName' $storageAccountName = Get-AutomationVariable -Name 'PreStageBlobStorageAccountName' $storageAccount = Get-AzureRmStorageAccount -ResourceGroupName $resourceGroup -Name $storageAccountName $ctx = $storageAccount.Context $data = Get-AzureStorageBlobContent -Blob "PesterTests.ps1" -Container 'pester' -Destination "C:\Temp\" -Context $ctx   The next section of the script will depend on what you want to do with the results. For me, we already had an ETL control schema setup in our SQLDB, so it felt right to submit the results into a new table in there for reporting purposes. Alternatively you could create a new schema called UnitTesting or similar. There are also lots of other things you could do with the results, i.e. to trigger other events or use in a CI/CD environment. The below code will open up a SQLDB connection, and then create a function to insert rows into the DB which will call in the next block of code. For security purposes, the parameters are defined outside the SQL command and added in at execution. While there are quite a few things you can extract from a pester test result, I decided to take the Test Name, Test Result, and Failure Message – to keep things simple. I also included an inserted date so we can work out the latest tests, as well as a Trigger Id to join it back into the parent pipeline that called the test scripts. This ties in nicely to other bits of our ETL reporting framework. # Open SQL connection $DBServer = Get-AutomationVariable -Name 'DatabaseServerFullName' $DBName = Get-AutomationVariable -Name 'DatabaseName' $DBUsername = Get-AutomationVariable -Name 'DatabaseAdminUsername' $DBPassword = Get-AutomationVariable -Name 'DatabaseAdminPassword' $sqlConnection = New-Object System.Data.SqlClient.SqlConnection $sqlConnection.ConnectionString = "Server=$DBServer;Database=$DBName;UID=$DBUsername;PWD=$DBPassword" $sqlConnection.Open() # Create GUID to group tests together $GUID = [guid]::Newguid() $GUID = $GUID.ToString() # Create Inserted datetime $Inserted = Get-Date # Define function to submit to database function Do-InsertRow { $sqlCommand = New-Object System.Data.SqlClient.SqlCommand $sqlCommand.Connection = $sqlConnection $sqlCommand.CommandText = "SET NOCOUNT ON; " + "INSERT INTO BISystem.UnitTests (TriggerId,TestName,TestResult,FailureMessage,Inserted)" + "VALUES (@TriggerId,@TestName,@TestResult,@FailureMessage,@Inserted); " $sqlCommand.Parameters.Add((New-Object Data.SqlClient.SqlParameter("@TriggerId",[Data.SQLDBType]::NVarChar, 50))) | Out-Null $sqlCommand.Parameters.Add((New-Object Data.SqlClient.SqlParameter("@TestName",[Data.SQLDBType]::NVarChar, 200))) | Out-Null $sqlCommand.Parameters.Add((New-Object Data.SqlClient.SqlParameter("@TestResult",[Data.SQLDBType]::NVarChar, 10))) | Out-Null $sqlCommand.Parameters.Add((New-Object Data.SqlClient.SqlParameter("@FailureMessage",[Data.SQLDBType]::NVarChar, 500))) | Out-Null $sqlCommand.Parameters.Add((New-Object Data.SqlClient.SqlParameter("@Inserted",[Data.SQLDBType]::DateTime2))) | Out-Null $sqlCommand.Parameters[0].Value = $PipelineTriggerID $sqlCommand.Parameters[1].Value = $TestName $sqlCommand.Parameters[2].Value = $TestResult $sqlCommand.Parameters[3].Value = $TestError $sqlCommand.Parameters[4].Value = $Inserted $sqlCommand.ExecuteNonQuery() }   Time to Invoke the test scripts. This can be done through the Invoke-Pester command. As mentioned above, I had to pass in the test scripts location. I also defined the PassThru parameter so that the results were passed into an $Output object ready to deconstruct and write to SQLDB. While its not mandatory you can also define the test groups to pickup as part of the invocation, which has the added bonus of being able to use wildcards against. The script will then loop round each object in the array of the $Output object defined previously. For each one, it will extract the attributes of the test that I’m after and then call my Insert to DB function to write the results to SQLDB. # Invoke Pester $Output = Invoke-Pester -Script C:\Temp\PesterTests.ps1 -PassThru -TestName '*CaSIE*' # Output Results Write-Output "TestNameFilter: $($Output.TestNameFilter)." Write-Output "Total Count: $($Output.TotalCount)." Write-Output "Passed Count: $($Output.PassedCount)." Write-Output "Failed Count: $($Output.FailedCount)." Write-Output "Time: $($Output.Time)." Write-Output "" # Loop over TestResult objects to submit to database $Output.TestResult | ForEach-Object { $TestName = $_.Name $TestResult = $_.Result $TestError = $_.FailureMessage Write-Output "Test Result: $($TestName), $($TestResult), $($TestError)" Do-InsertRow } # Close the connection. if ($sqlConnection.State -eq [Data.ConnectionState]::Open) { $sqlConnection.Close() } # Show done when finished (for testing/logging purpose only) Write-Output "Finished Running Tests"   Once everything has been created and you’ve tested the scripts via the test pane and checked the results populate into the database, you are ready to hook it into your ADF pipelines and integrate it into your solution! To do this is very simple – within each Runbook you can create a webhook to the runbook for any external resources to call. This comes in the form of a secret URL which will then kick off the runbook with the embedded script. Passing parameters into the runbook at this point requires a bit more work and I go into this detail in a separate blog post. Be careful to take a copy of this URL, as you cannot view it after creation. Its then just a case of creating a web activity in your ADF pipeline to call this, to run the scripts. All in all, a very straightforward mechanism. Once the results were in SQLDB I also defined a view on top of the table as I was having trouble extracting the test group out. To make things simple, I just re-worded the individual test cases to include this, and then use a CHARINDEX to split them out, thus meaning I could now report by group also. I also added a field to calculate the Latest Test result set, thus meaning we were only reporting on the most relevant test set. I then built a Power BI report on top of this to integrate into our existing solution (below). Test results were presented as a measure % Passed (defined below). This was then displayed overall, vs each test group, and then vs each individual test. % Passed = VAR PercentPassed = DIVIDE( CALCULATE(COUNT('BISystem UnitTestsView'[TestId]),'BISystem UnitTestsView'[TestResult] = "Passed", 'BISystem UnitTestsView'[LatestTest] = 1), CALCULATE(COUNT('BISystem UnitTestsView'[TestId]),'BISystem UnitTestsView'[LatestTest] = 1) ) RETURN IF(PercentPassed = BLANK(),0,PercentPassed)     Summary Pester is a great tool to use to add automated testing in to your project. It’s taken a few hours to stand up but now that’s been done, it’s just a case of defining tests. The ubiquitous nature of the framework means we can define all sorts of test across data/software/hardware. The thing I like the most about using it is that its simple. While I decided it was best to populate the test results into a database for ease of use for reporting, it might also be worth investigating population into the VSTS testing framework. Hopefully you will find this blog useful.

Executing SQL Scripts From TeamCity

Recently I was working in an environment where we used TeamCity to automate unit testing (further blogs on this coming later). Essentially the process was that the latest source would be deployed, a set of ETL processes would be run and tests would validate the results to ensure changes to the project over the day had not affected the reliability of the results. In the interests of ensuring that the tests were the same each time my preferred method was to use a script to remove the databases from the SQL instance on the server (for other reasons I was keen not to change the deployment scripts in order to achieve this). Whilst in this example I delete databases using the SQL script it should be noted you can use this method to execute any SQL script on a target server. 1. Create a new Build Configuration (or add as a step to an existing one). 2. Create a SQL Script that does whatever it is you want it to do and save it in a text file eg ‘ClearDatabases.sql’. 3. Create a folder on the server running team city – this will need to be a folder which the service running the team city build agent has access to otherwise the process will fail. 4. Add a build step – giving a name that indicates what the script will be doing – in this case I have called it ‘Drop Existing Databases’. The script should be configured with a runner type of ‘Command Line’ and run a custom script. In the ‘Custom script’ section I have the following: sqlcmd -e -b -d master -i "C:\UnitTestingScripts\ClearDatabases.sql" -S "DEVTESTDW\SQL2014" 5. If the SQL server on which the script is to be executed is not the same as the Team City server (very likely) then you will need to ensure that the target server has the correct firewall and SQL configuration in order to accept the connection.  

SQL PASS Summit – Day 2

This morning started with a talk from Rimma Nehme a key Microsoft architect. The talk was informative and clearly Azure and the cloud are something which Microsoft are very keen, it was well thought through and there was a fantastic analogy between pizza and different flavours of Azure which is not what I had expected. Adam Machanic did a presentation covering different methods which could be used in order to force the query optimiser into choosing more efficient query plans. This was not for the feint hearted and included some ingenious methods which he had developed which could be used to ‘fool’ the optimiser into selecting different query plans. The methods tried to push towards parallel queries and ensuring that the right workload was assigned to each of the threads so they finished at roughly the same time. The only drawback to this was that some of the examples produced code which was quite obfuscated and not something which I thought could be used except in the case that there was an exceptional issue which necessitated their use. Davide Mauri’s session on Unit Testing and Agile development was very interesting, he showed demonstrations of NUnit, BI.Quality and NBI. These are tools which allow users to create unit tests when following an agile approach, whilst we also use similar methods at Adatis to provide unit testing functionality the demonstrations were very effective and I will be doing some further research into these products to realise their full potential. Connor Cunningham also presented a fantastic talk titled ‘Managing 1M + db’s – How big data is used to run SQL Azure’. This was a very insightful talk which detailed the issues that had arisen when running the SQL Azure service and the methods used in order to identify, quickly resolve and identify the root cause of the problems. Central to this was both the collection of telemetry data from many sourced which could be used to perform diagnostics and the methods used in order to identify problems. Here we saw a very interesting usage of Azure Machine Learning, which had been setup to trigger Alerts on the occurrence of unusual behaviour. Connor is a fantastic speaker and clearly one of the most knowledgeable people on the SQL engine, I really enjoyed this talk. I’m off to the evening party now at the EMP museum, looking forward to having a few beers and meeting some new people :)