Adatis

Adatis BI Blogs

Regression Testing a Data Platform with Pester

In a previous post, I gave an overview to regression tests. In this post, I will give a practical example of developing and performing regression tests with the Pester framework for PowerShell. The code for performing regression tests is written in PowerShell using the Pester Framework. The tests are run through Azure DevOps pipelines and are designed to test regression scenarios. The PowerShell scripts, which contain the mechanism for executing tests, rely upon receiving the actual test definitions from a metadata database. The structure of the metadata database will be exactly the same as laid out in the Integration Test post.Regression TestsThe regression tests will need to be designed so that existing functionality isn’t regressed by any changes made to the code. In an analytics system, the functionality is typically going to be aligned to the target schema that’s used for reporting and analysis. If we change the cleaning transformation logic in the source tables which make up our customer dimension, we’ll want to ensure that the customer dimension itself doesn’t change expected outcomes, for example row counts or a specific value. For this example, we’ll be putting in some data into the data lake, run it through the various layers until it ends up in the CURATED layer. Because the majority of the processing is orchestrated using Azure Data Factory V2 (ADF), we only really need to ensure that the pipeline(s) run successfully and some valid data appears in all the layers of the lake, as well as logged into the metadata database.Because we’re deploying some data, we’ve got elements of setup and teardown in the script. Setup and teardown in the metadata database, so that ADF knows what to process. Setup and teardown in the data lake, so that there is data to process.This should give you enough to start using Pester for testing your own Azure data platform implementations.

Regression Testing Overview

In a previous post, I touched on the point of testing and briefly touched on regression testing. In this post, I will be going into more detail about what regression testing is and why it’s important to do it.In the previous post, I said that Regression Tests are intended to:verify that newly developed code into a deployed product does not regress expected results. We’ll still need to go through the process of unit testing and integration testing; but do we want to go through the rigmarole of manual testing to check if a change has changed more than what it was meant to? That’s something that we would like to avoid, so we have regression testing to alleviate that need. Like integration tests, they do need multiple parts of the product available so would need to be executed as part of a Release Pipeline in Azure DevOps. Regression testing is expensive to automate and maintain; and slow to run – but that doesn’t mean that they should be avoided. They add a layer of confidence to a newly changed code base which is about to be deployed. However, because we are testing targeted elements, perhaps the entire solution at once, we don’t want to run all regressions tests all the time because they would take a very long time to complete.Regression TechniquesThere are a variety of methods and techniques that can be used in the design and execution of regression tests. These are:Retest AllTest SelectionTest Case PrioritisationRetest All executes all the documented test cases to check the integrity of the solution. This is the most expensive technique for regression testing as it runs all the test cases, however, it does ensure that there are no errors in the modified code that could be released into Production.Test Selection executes a defined selection of documented test cases to check the integrity of a section of the solution. Less expensive than the Retest All technique, but does introduce an element of risk as the test coverage does not cover the entire solution. Test Case Prioritisation executes tests in priority order, executing higher priority tests over lower priority tests. Regression TestingRegression tests are executed for the various functional slices that exist in a solution. Like most forms of testing, regression tests follow a pattern of:Initialise system under testCall functionality under testAssert expected outcome against result of methodGenerally, regression tests will be executed after deployment as they often require the infrastructure to exist. Generally, regression tests are dependent on data, which must be created at the time of setting up the tests. Most of the time, you can automate regression tests using a unit test framework such as Pester or NUnit.Some best practicesTo get you going I’m going to set out some best practices that you should aim to follow:Adopt a hybrid technique of mixing and matching regression techniques to use what’s best for you at the timeCreate the data needed for the tests before execution, as part of the test setupMultiple asserts per test. You might have dependencies on external resources that you’d like to keep open or you want a fast running set of tests. Multiple asserts help with all of theseChoose unit tests over regression tests when feasibleChoose integration tests over regression tests when feasible

Integration Testing a Data Platform with Pester

In a previous post, I gave an overview to integration tests and documenting integration points. In this post, I will give a practical example of developing and performing integration tests with the Pester framework for PowerShell. With a data platform, especially one hosted in Azure, it’s important to test that the Azure resources in your environment have been deployed and configured correctly. After we’ve done this, we can test the integration points on the platform, confident that all the components have been deployed. The code for performing integration tests is written in PowerShell using the Pester Framework. The tests are run through Azure DevOps pipelines and are designed to test documented integration points. The PowerShell scripts, which contain the mechanism for executing tests, rely upon receiving the actual test definitions from a metadata database. Database StructureThe metadata database should contain a schema called Test, which is a container for all the database objects for running tests using Pester. These objects are:Test.TestCategory - contains what category of test is to be run e.g. Integration TestsTest.TestType - contains the type of tests that need to be run and are associated with a particular type of functionality. In the Pester Framework, Test Type maps to the Describe function. Test.Test - contains the individual tests to be run, with reference to the test type and environment. In the Pester Framework, Test maps to the Context function.Test.Assert contains the individual asserts to be executed against the output from the test run, with reference to the Test and type of assert. In the Pester Framework, Test maps to the It function.How you design the tables, is up to you, but I suggest that the schema looks similar to the above. Test Environment SetupBefore we begin testing all the integration points, we need to be confident that the environment, for which the platform is deployed to, has been created and configured correctly. If it hasn’t, there’s no point in progressing with the actual integration tests as they would fail. For this, we have an initial script to perform these checks. The script executes a stored procedure called Test.ObtainTests, which returns the list of tests to be run. Within a Pester Describe block, the tests are executed. The tests use the Get-AzureRmResource cmdlet and asserts that the name of the deployed resource matches that of the expected resource, as defined in the TestObject.If any of the tests fail in this phase, no further testing should take place. Integration TestsWe’re confident that the environment has been created and configured correctly, so now we’re ready to run the integration tests according to the documented integration points. For this example, we’ll be putting in some data into the RAW layer of the data lake, running it through the various layers until it ends up in the CURATED layer and can be read by Azure SQL DW. Because the majority of the processing is orchestrated using Azure Data Factory V2 (ADF), and the majority of the integration points are within ADF, we only really need to ensure that the pipeline(s) run successfully and some valid data appears in the CURATED layer for SQL DW to consume via PolyBase.Because we’re also deploying some data, we’ve got elements of setup and teardown in the script. Setup and teardown in the metadata database, so that ADF knows what to process. Setup and teardown in the data lake, so that there is data to processTying it all togetherWe’ve got our scripts, but how does it get invoked? This is where the InvokePester script comes in. For anyone not familiar with Pester, this is effectively the orchestrator for your testing scripts. If you deploy the tests to Azure DevOps as part of a release pipeline, you’ll see a similar output to the image below:This should give you enough to start using Pester for testing your own Azure data platform implementations.

Lightweight Testing in Azure through Pester & Azure Automation

I’m currently working on a relatively lightweight PaaS modern data warehouse project, and as part of the build there was a requirement to add automated testing into the platform. While I was thinking about how I could achieve this - I remembered during the 2018 SQLBits I attended there was a DevOps session by the sabin.io team in which they did something very similar. They used PowerShell to add tests into part of a continuous integration pipeline hosted on VSTS through a framework called Pester. In this blog, I will talk about my implementation of Pester, but specifically look at it through the context of hosting it in a Azure PaaS environment (something which adds a small amount of overhead to its implementation and took me a while to get working).   What is Pester? Pester is basically a ubiquitous test and mock framework for PowerShell (hosted on GitHub here). It can literally be used to test anything - as long as you can invoke whatever you are trying to test through PowerShell. This includes SQL statements, Azure resources, Windows resources, etc. Pester provides the language to DEFINE and EXECUTE the test case. I won’t go through the numerous advantages of automated testing, but lets just say its pretty useful to have it as part of a platform to alert you about issues early. I won’t talk about it so much in the context of DevOps CI/CD, but its got use cases here too. Microsoft is also on board with the framework, and it comes pre-installed with Windows builds nowadays.   How do I define a test case? Describe defines a group of tests, and all test files must contain at least one describe block. It then defines a single test case, so in my example below it will invoke a SQL statement to return a result which is then tested. The test is then passed or failed using an assertion such as Should be 0 or Should BeNullOrEmpty based on the object passed to it. You can also use other variations such as ShouldBeExactly or even Should Throw to pass a test based on a terminating error. In the code segment below, I’m checking keys that were unable to lookup against our dimensions from the fact tables. One thing to be careful of (as there was no related error message) was that you will need to alias your COUNT or SUM in the SQL query and then call that as a property of your result object afterwards to check against. Describe "Warehouse Award Fact -1 Keys" { It "Award Fact - Supplier -1 Keys" { $Query = "SELECT COUNT(*) AS AwardSupplier FROM Warehouse.FctAward WHERE SupplierKey = -1 " $Result = Invoke-Sqlcmd -ServerInstance $AsqlServerName -Database $AsqlDatabaseName -Query $Query -Username $AsqlUsername -Password $AsqlPassword $Result.AwardSupplier | Should be 0 } It "Award Fact - Contract -1 Keys" { $Query = "SELECT COUNT(*) AS AwardContract FROM Warehouse.FctAward WHERE ContractKey = -1 " $Result = Invoke-Sqlcmd -ServerInstance $AsqlServerName -Database $AsqlDatabaseName -Query $Query -Username $AsqlUsername -Password $AsqlPassword $Result.AwardContract | Should be 0 } }   As part of the definition I decided it would be best to spit the test blocks up into logical groups, and then keep them all within the same file. However, if you wanted to test both data and something else such as Azure resources, then I would consider splitting out the tests into separate files to keep things modular. In terms of tests, to give you an idea of the sort of things we were keen to monitor, I created the following: Stage row counts match clean row counts Business keys are distinct within source Business keys were distinct within dimensions Total £ amounts matched between source and warehouse No MDS errors existed on data import Fact table unknown keys for each dimension (above) Misc tests that tripped us up during early build phases based on assumptions These tests were added to over time and formed a group of acceptance tests for each DW run. As data is always changing, its good to have these to validate your initial premises put in place around a data set. While I didn’t need to use it, there are also commands to mock variables therefore putting the PowerShell code into a specific state for a particular test. This is helpful to avoid changing the real environment while replicating states.   Standing up Pester in Azure Implementing the pester framework with traditional resources is already well documented and very simple to get started in both a manual and automated way. Unfortunately as I was working on a fully PaaS project, I needed to implement the framework within Azure. This did not seem to be as well documented. To help me out with the task, I went straight to Azure Automation – for anyone that has not used this before, its basically a way to host PowerShell scripts within Runbooks. While it was fairly intuitive to import the Pester framework from their GitHub repo, there is also an option with Automation to select from a modules gallery. Pester is part of this gallery, so for ease of use, I would download it here which also makes maintenance slightly easier. Its also worth mentioning that its also worth defining important variables such as DB connections, etc outside of the Runbook within Automation and then passing them in as parameters. Think of this as similar to environment variables on a SSIS project. Credentials such as our Runas account are also defined externally to the runbook.     Now on to the important bit; defining the testing execution script. The first body of code will connect to Azure using the Runas account as a service principal. This allows us to execute the script without using our own credentials. Setup of this account is a whole separate blog in itself so I won’t go into that detail here. Once this has been defined, the script will then connect into Blob storage to extract the tests (defined above), and place it in the local Temp directory of Azure Automation. This was one of my major blockers initially as I was trying to find a way to Invoke the test scripts from within the same PS script as the execution, therefore not needing to host the tests anywhere and keep everything contained in a runbook or parent/child runbooks. Unfortunately (as far as I can tell) the Invoke-Pester command which executes the tests needs a separate file location to be defined. Either way, we had a blob storage area already setup for the project so this was not really an overhead to create a new container to store the tests in. Automation uses a Temp folder internally to store anything, so I used this to land the tests ready for processing. # Get required variables $AutomationConnectionName = Get-AutomationVariable -Name 'AutomationConnectionName' # Connect to an automation connection to get TenantId and SubscriptionId $Connection = Get-AutomationConnection -Name $AutomationConnectionName $TenantId = $Connection.TenantId $SubscriptionId = $Connection.SubscriptionId # Login to Azure using AzureRunAsConnection Connect-AzureRmAccount -ServicePrincipal -TenantId $Connection.TenantId -ApplicationId $Connection.ApplicationId -CertificateThumbprint $Connection.CertificateThumbprint # Connect to Storage Account to get tests script $resourceGroup = Get-AutomationVariable -Name 'ResourceGroupName' $storageAccountName = Get-AutomationVariable -Name 'PreStageBlobStorageAccountName' $storageAccount = Get-AzureRmStorageAccount -ResourceGroupName $resourceGroup -Name $storageAccountName $ctx = $storageAccount.Context $data = Get-AzureStorageBlobContent -Blob "PesterTests.ps1" -Container 'pester' -Destination "C:\Temp\" -Context $ctx   The next section of the script will depend on what you want to do with the results. For me, we already had an ETL control schema setup in our SQLDB, so it felt right to submit the results into a new table in there for reporting purposes. Alternatively you could create a new schema called UnitTesting or similar. There are also lots of other things you could do with the results, i.e. to trigger other events or use in a CI/CD environment. The below code will open up a SQLDB connection, and then create a function to insert rows into the DB which will call in the next block of code. For security purposes, the parameters are defined outside the SQL command and added in at execution. While there are quite a few things you can extract from a pester test result, I decided to take the Test Name, Test Result, and Failure Message – to keep things simple. I also included an inserted date so we can work out the latest tests, as well as a Trigger Id to join it back into the parent pipeline that called the test scripts. This ties in nicely to other bits of our ETL reporting framework. # Open SQL connection $DBServer = Get-AutomationVariable -Name 'DatabaseServerFullName' $DBName = Get-AutomationVariable -Name 'DatabaseName' $DBUsername = Get-AutomationVariable -Name 'DatabaseAdminUsername' $DBPassword = Get-AutomationVariable -Name 'DatabaseAdminPassword' $sqlConnection = New-Object System.Data.SqlClient.SqlConnection $sqlConnection.ConnectionString = "Server=$DBServer;Database=$DBName;UID=$DBUsername;PWD=$DBPassword" $sqlConnection.Open() # Create GUID to group tests together $GUID = [guid]::Newguid() $GUID = $GUID.ToString() # Create Inserted datetime $Inserted = Get-Date # Define function to submit to database function Do-InsertRow { $sqlCommand = New-Object System.Data.SqlClient.SqlCommand $sqlCommand.Connection = $sqlConnection $sqlCommand.CommandText = "SET NOCOUNT ON; " + "INSERT INTO BISystem.UnitTests (TriggerId,TestName,TestResult,FailureMessage,Inserted)" + "VALUES (@TriggerId,@TestName,@TestResult,@FailureMessage,@Inserted); " $sqlCommand.Parameters.Add((New-Object Data.SqlClient.SqlParameter("@TriggerId",[Data.SQLDBType]::NVarChar, 50))) | Out-Null $sqlCommand.Parameters.Add((New-Object Data.SqlClient.SqlParameter("@TestName",[Data.SQLDBType]::NVarChar, 200))) | Out-Null $sqlCommand.Parameters.Add((New-Object Data.SqlClient.SqlParameter("@TestResult",[Data.SQLDBType]::NVarChar, 10))) | Out-Null $sqlCommand.Parameters.Add((New-Object Data.SqlClient.SqlParameter("@FailureMessage",[Data.SQLDBType]::NVarChar, 500))) | Out-Null $sqlCommand.Parameters.Add((New-Object Data.SqlClient.SqlParameter("@Inserted",[Data.SQLDBType]::DateTime2))) | Out-Null $sqlCommand.Parameters[0].Value = $PipelineTriggerID $sqlCommand.Parameters[1].Value = $TestName $sqlCommand.Parameters[2].Value = $TestResult $sqlCommand.Parameters[3].Value = $TestError $sqlCommand.Parameters[4].Value = $Inserted $sqlCommand.ExecuteNonQuery() }   Time to Invoke the test scripts. This can be done through the Invoke-Pester command. As mentioned above, I had to pass in the test scripts location. I also defined the PassThru parameter so that the results were passed into an $Output object ready to deconstruct and write to SQLDB. While its not mandatory you can also define the test groups to pickup as part of the invocation, which has the added bonus of being able to use wildcards against. The script will then loop round each object in the array of the $Output object defined previously. For each one, it will extract the attributes of the test that I’m after and then call my Insert to DB function to write the results to SQLDB. # Invoke Pester $Output = Invoke-Pester -Script C:\Temp\PesterTests.ps1 -PassThru -TestName '*CaSIE*' # Output Results Write-Output "TestNameFilter: $($Output.TestNameFilter)." Write-Output "Total Count: $($Output.TotalCount)." Write-Output "Passed Count: $($Output.PassedCount)." Write-Output "Failed Count: $($Output.FailedCount)." Write-Output "Time: $($Output.Time)." Write-Output "" # Loop over TestResult objects to submit to database $Output.TestResult | ForEach-Object { $TestName = $_.Name $TestResult = $_.Result $TestError = $_.FailureMessage Write-Output "Test Result: $($TestName), $($TestResult), $($TestError)" Do-InsertRow } # Close the connection. if ($sqlConnection.State -eq [Data.ConnectionState]::Open) { $sqlConnection.Close() } # Show done when finished (for testing/logging purpose only) Write-Output "Finished Running Tests"   Once everything has been created and you’ve tested the scripts via the test pane and checked the results populate into the database, you are ready to hook it into your ADF pipelines and integrate it into your solution! To do this is very simple – within each Runbook you can create a webhook to the runbook for any external resources to call. This comes in the form of a secret URL which will then kick off the runbook with the embedded script. Passing parameters into the runbook at this point requires a bit more work and I go into this detail in a separate blog post. Be careful to take a copy of this URL, as you cannot view it after creation. Its then just a case of creating a web activity in your ADF pipeline to call this, to run the scripts. All in all, a very straightforward mechanism. Once the results were in SQLDB I also defined a view on top of the table as I was having trouble extracting the test group out. To make things simple, I just re-worded the individual test cases to include this, and then use a CHARINDEX to split them out, thus meaning I could now report by group also. I also added a field to calculate the Latest Test result set, thus meaning we were only reporting on the most relevant test set. I then built a Power BI report on top of this to integrate into our existing solution (below). Test results were presented as a measure % Passed (defined below). This was then displayed overall, vs each test group, and then vs each individual test. % Passed = VAR PercentPassed = DIVIDE( CALCULATE(COUNT('BISystem UnitTestsView'[TestId]),'BISystem UnitTestsView'[TestResult] = "Passed", 'BISystem UnitTestsView'[LatestTest] = 1), CALCULATE(COUNT('BISystem UnitTestsView'[TestId]),'BISystem UnitTestsView'[LatestTest] = 1) ) RETURN IF(PercentPassed = BLANK(),0,PercentPassed)     Summary Pester is a great tool to use to add automated testing in to your project. It’s taken a few hours to stand up but now that’s been done, it’s just a case of defining tests. The ubiquitous nature of the framework means we can define all sorts of test across data/software/hardware. The thing I like the most about using it is that its simple. While I decided it was best to populate the test results into a database for ease of use for reporting, it might also be worth investigating population into the VSTS testing framework. Hopefully you will find this blog useful.