Adatis

Adatis BI Blogs

Regression Testing a Data Platform with Pester

In a previous post, I gave an overview to regression tests. In this post, I will give a practical example of developing and performing regression tests with the Pester framework for PowerShell. The code for performing regression tests is written in PowerShell using the Pester Framework. The tests are run through Azure DevOps pipelines and are designed to test regression scenarios. The PowerShell scripts, which contain the mechanism for executing tests, rely upon receiving the actual test definitions from a metadata database. The structure of the metadata database will be exactly the same as laid out in the Integration Test post.Regression TestsThe regression tests will need to be designed so that existing functionality isn’t regressed by any changes made to the code. In an analytics system, the functionality is typically going to be aligned to the target schema that’s used for reporting and analysis. If we change the cleaning transformation logic in the source tables which make up our customer dimension, we’ll want to ensure that the customer dimension itself doesn’t change expected outcomes, for example row counts or a specific value. For this example, we’ll be putting in some data into the data lake, run it through the various layers until it ends up in the CURATED layer. Because the majority of the processing is orchestrated using Azure Data Factory V2 (ADF), we only really need to ensure that the pipeline(s) run successfully and some valid data appears in all the layers of the lake, as well as logged into the metadata database.Because we’re deploying some data, we’ve got elements of setup and teardown in the script. Setup and teardown in the metadata database, so that ADF knows what to process. Setup and teardown in the data lake, so that there is data to process.This should give you enough to start using Pester for testing your own Azure data platform implementations.

Regression Testing Overview

In a previous post, I touched on the point of testing and briefly touched on regression testing. In this post, I will be going into more detail about what regression testing is and why it’s important to do it.In the previous post, I said that Regression Tests are intended to:verify that newly developed code into a deployed product does not regress expected results. We’ll still need to go through the process of unit testing and integration testing; but do we want to go through the rigmarole of manual testing to check if a change has changed more than what it was meant to? That’s something that we would like to avoid, so we have regression testing to alleviate that need. Like integration tests, they do need multiple parts of the product available so would need to be executed as part of a Release Pipeline in Azure DevOps. Regression testing is expensive to automate and maintain; and slow to run – but that doesn’t mean that they should be avoided. They add a layer of confidence to a newly changed code base which is about to be deployed. However, because we are testing targeted elements, perhaps the entire solution at once, we don’t want to run all regressions tests all the time because they would take a very long time to complete.Regression TechniquesThere are a variety of methods and techniques that can be used in the design and execution of regression tests. These are:Retest AllTest SelectionTest Case PrioritisationRetest All executes all the documented test cases to check the integrity of the solution. This is the most expensive technique for regression testing as it runs all the test cases, however, it does ensure that there are no errors in the modified code that could be released into Production.Test Selection executes a defined selection of documented test cases to check the integrity of a section of the solution. Less expensive than the Retest All technique, but does introduce an element of risk as the test coverage does not cover the entire solution. Test Case Prioritisation executes tests in priority order, executing higher priority tests over lower priority tests. Regression TestingRegression tests are executed for the various functional slices that exist in a solution. Like most forms of testing, regression tests follow a pattern of:Initialise system under testCall functionality under testAssert expected outcome against result of methodGenerally, regression tests will be executed after deployment as they often require the infrastructure to exist. Generally, regression tests are dependent on data, which must be created at the time of setting up the tests. Most of the time, you can automate regression tests using a unit test framework such as Pester or NUnit.Some best practicesTo get you going I’m going to set out some best practices that you should aim to follow:Adopt a hybrid technique of mixing and matching regression techniques to use what’s best for you at the timeCreate the data needed for the tests before execution, as part of the test setupMultiple asserts per test. You might have dependencies on external resources that you’d like to keep open or you want a fast running set of tests. Multiple asserts help with all of theseChoose unit tests over regression tests when feasibleChoose integration tests over regression tests when feasible

Integration Testing a Data Platform with Pester

In a previous post, I gave an overview to integration tests and documenting integration points. In this post, I will give a practical example of developing and performing integration tests with the Pester framework for PowerShell. With a data platform, especially one hosted in Azure, it’s important to test that the Azure resources in your environment have been deployed and configured correctly. After we’ve done this, we can test the integration points on the platform, confident that all the components have been deployed. The code for performing integration tests is written in PowerShell using the Pester Framework. The tests are run through Azure DevOps pipelines and are designed to test documented integration points. The PowerShell scripts, which contain the mechanism for executing tests, rely upon receiving the actual test definitions from a metadata database. Database StructureThe metadata database should contain a schema called Test, which is a container for all the database objects for running tests using Pester. These objects are:Test.TestCategory - contains what category of test is to be run e.g. Integration TestsTest.TestType - contains the type of tests that need to be run and are associated with a particular type of functionality. In the Pester Framework, Test Type maps to the Describe function. Test.Test - contains the individual tests to be run, with reference to the test type and environment. In the Pester Framework, Test maps to the Context function.Test.Assert contains the individual asserts to be executed against the output from the test run, with reference to the Test and type of assert. In the Pester Framework, Test maps to the It function.How you design the tables, is up to you, but I suggest that the schema looks similar to the above. Test Environment SetupBefore we begin testing all the integration points, we need to be confident that the environment, for which the platform is deployed to, has been created and configured correctly. If it hasn’t, there’s no point in progressing with the actual integration tests as they would fail. For this, we have an initial script to perform these checks. The script executes a stored procedure called Test.ObtainTests, which returns the list of tests to be run. Within a Pester Describe block, the tests are executed. The tests use the Get-AzureRmResource cmdlet and asserts that the name of the deployed resource matches that of the expected resource, as defined in the TestObject.If any of the tests fail in this phase, no further testing should take place. Integration TestsWe’re confident that the environment has been created and configured correctly, so now we’re ready to run the integration tests according to the documented integration points. For this example, we’ll be putting in some data into the RAW layer of the data lake, running it through the various layers until it ends up in the CURATED layer and can be read by Azure SQL DW. Because the majority of the processing is orchestrated using Azure Data Factory V2 (ADF), and the majority of the integration points are within ADF, we only really need to ensure that the pipeline(s) run successfully and some valid data appears in the CURATED layer for SQL DW to consume via PolyBase.Because we’re also deploying some data, we’ve got elements of setup and teardown in the script. Setup and teardown in the metadata database, so that ADF knows what to process. Setup and teardown in the data lake, so that there is data to processTying it all togetherWe’ve got our scripts, but how does it get invoked? This is where the InvokePester script comes in. For anyone not familiar with Pester, this is effectively the orchestrator for your testing scripts. If you deploy the tests to Azure DevOps as part of a release pipeline, you’ll see a similar output to the image below:This should give you enough to start using Pester for testing your own Azure data platform implementations.

Integration Testing Overview

In a previous post, I touched on the point of testing and briefly talked about integration testing. In this post, I will be going into more detail about what integration testing is and why it’s important to do it.In the previous post, I said that Integration Tests are:intended to verify that the units of code and the services used in a product work together. As a result, they are more expensive to automate and maintain than unit tests; and can take considerably longer to run. Whilst unit tests can be run without dependencies of other parts of the product being available, integration tests often require multiple parts of the product – including infrastructure – to be up and running so that the integrations between units and services can be tested. Because integration tests might require infrastructure to be available, and certainly multiple parts of the product available, integration tests are best run as part of a Release Pipeline in Azure DevOps.To expand on this, integration tests are written for each integration point for a solution. But what do we mean by “integration point”? An integration point is typically where two or more units of code interact with each other, or two or more services interact with each other – verifying that the individual parts or components of a solution works as intended together with other parts. How do we define an integration point?Integration PointsWe define an integration point by whiteboarding each component of our solution with the aim to document how they interact with each other. We can highlight the integration point by drawing a circle around it. Consider the following architecture:It’s a fairly typical modern data warehouse solution. We’re ingesting data from a variety of sources and storing it in a data lake. We’re then transforming and processing that data into our warehouse schema before presenting it in a data warehouse; processing it in an analysis services model so that it can be reported on. That’s the architecture, but the components used might be very different and interact differently with the architecture. For the ingestion, our integration points are going to be between the following components:For the transformation piece, our integration points are going to look like:Finally, for processing our data into the semantic model, the integration points look like:As you can see, the integration points do not align perfectly with the architecture – bear in mind that every solution is different, so your integration points will definitely look different even if the broad architecture is the same.Integration TestingWe’ve documented our integration points and now we need to write some integration tests. Integration tests are executed for the various integration points that exist in a solution. Like most forms of testing, integration tests follow a pattern of:Initialise system under testCall functionality under testAssert expected outcome against result of methodGenerally, integration tests will be executed after deployment as they often require the infrastructure to exist. Most of the time, integration tests should not be dependent on data. However, if data does need to exist, this must be created at the time of setting up the tests. Most of the time, you can automate integration tests using a unit test framework such as Pester or NUnit. Some Best PracticesTo get you going, I’m going to set out some best practices that you should aim to follow:Only create integration tests you needDon’t depend on data being available. If you have tests that depend on data – create that data before execution, as part of the test setupMultiple asserts per test. You might have dependencies on external resources that you’d like to keep open or you want a fast running set of tests. Multiple asserts help with all of these. Choose unit tests over integration tests when feasible. Don’t duplicate effort.Further readingMy colleague Ben has written an excellent blog on SQL Integration Testing using NUnit.I’ll add another post soon about how to do Integration Testing using Pester.

Unit Testing Overview

In a previous post, I touched on the point of testing and briefly talked about unit testing. In this post, I will be going into more detail about what unit testing is and why it’s important to do it. In the previous post, I said that Unit Tests are: low level tests, meaning that they are close to the source of the product. They should be written with the aim of testing individual methods and functions for a given code base, using a unit test framework to support the authoring and execution of a test. As a developer, you would typically author the unit tests in a development tool like Visual Studio; you’d run them locally to ensure that the tests pass; and then they would be executed on a regular basis as a task in a Build Pipeline within Azure DevOps. Unit Tests are cheap to automate and should be quick to run.To expand on this, unit tests are written by a developer to apply to a unit of code. But what do we mean by “unit of code”? A unit of code is the smallest testable part of a solution – verifying that the individual part or component of a solution works as intended, independently from other parts. A unit could be a C# method; a PowerShell function; a T-SQL Stored Proc, and many others. Like most forms of testing, unit tests follow a pattern of:Initialise system under testCall method under testAssert expected outcome against result of methodA best practice would be to write the unit test before the writing any code, but if you’ve not got to that level of maturity with your test approach - writing tests after code is still good practice. How do you write a good unit test?Keep it simpleA unit test shouldn’t replicate the code it is intended to test. You’ll be writing lots of them, so make them quick and easy to write.ReadableBy keeping it simple, the test should also be readable. Making it easy to know what method is being tested and the expected behaviour of the method.By making it readable, you can easily address any failures that may surface.Reliable and RepeatableUnit tests should only fail if there are bugs in the system, not because there are bugs in the tests. Keeping it simple and readable will avoid that issue. Unit tests need to be run many times, sometimes multiple times throughout the course of a day, so they need to be executed quickly in a repeatable manner. Keeping it simple helps achieve this aim.How do you write a unit test?We’ve got an understanding of what a unit test is, but how do we write one? For this example, we’ll be writing our code and tests using C#.Our application is a very simple calculator, which adds two numbers together. Simply, to add a new Unit Test, we can right-click on the method and select Create Unit Tests. Because we’ve not built any unit tests before, we can use it to create a new unit test project using a framework of choice. If we already had a unit test project, we could add the new test to the existing project. Using this method, it creates a skeleton of a unit test from which we can amend for our needs.As you can see, this doesn’t contain what we need, so we amend the test so that it reflects our requirements, as in the below.To run a Unit Test, you can either right-click on the test method and click on Run Test(s) or open up the Test Explorer window, navigate to the desired test and click on Run Selected Tests.Unit Tests in Azure DevOpsWe’ve written our unit tests and have run them locally, but how do we make it repeatable? We utilise the power of Azure DevOps to have repeatable tests run against a changing code base as part of the Build or Continuous Integration process. The process is:Install NuGet on the Build AgentRestore any packages from NuGet that your application requiresBuild solutionRun testsPublish testsCopy successfully built and tested artifacts to a staging directoryPublish those artifactsUsing Azure DevOps, or another CI tool, we can rely upon our tests in a repeatable manner. Additional ReadingThere’s a good post by Sergey Kolodiy on the importance of writing good code and how unit testing encourages good behaviour. My colleague Jon has also written a post on the subject: Setup Unit Testing with NUnit and NBi.

Testing: What’s the point?

I’m almost certain that every developer has asked themselves this question at least once throughout their careers. You’ve developed your solution, it works fine on your machine and now the deployment into production is being held up because someone mentions the need to do testing. What’s the point of testing? Ultimately, to provide assurance about the quality of a product.With testing, there are two approaches:ManualAutomatedManual testing is what most developers complain about: it’s expensive to setup; laborious to execute; time consuming to repeat; and prone to human error. Manual testing typically takes the form of User Acceptance Tests – and sometimes can be the only tests that are conducted on a product. How confident are we that the product is of high quality if we only do manual testing? Not very. Automated testing is what every developer should be doing: they’re executed by a machine; they’re repeatable; they’re more robust and reliable than manual testing. However, like manual testing, the quality of the test is dependent on how well the test scripts have been written and the test scripts can vary hugely in complexity. The tests could vary from very simple build verification tests through to complex regression tests. Types of TestingAt its most simplest, testing can be build verification and at its most complex, testing can be user acceptance testing. But to get a true feel of how complex they are and how often you should use them, we should refer to a testing tree. As we can see, the wider the segment the more frequently we should employ it and, as we work our way up the pyramid, the more complex the type of testing becomes. For the remainder of this blog post, I’m going to briefly expand on the following types of tests:Build VerificationUnitIntegrationRegressionBuild Verification TestsA build verification test is using a tool like MS Build to answer the question: does my code compile? If it does compile, the test has passed. If it doesn’t compile, then the test has failed. This can be used in the local development environment, through Visual Studio or it can be conducted as a task in a Build Pipeline within Azure DevOps. These types of tests are extremely cheap to automate and maintain; and very quick to run. Unit TestsUnit tests are low level tests, meaning that they are close to the source of the product. They should be written with the aim of testing individual methods and functions for a given code base, using a unit test framework to support the authoring and execution of a test. As a developer, you would typically author the unit tests in a development tool like Visual Studio; you’d run them locally to ensure that the tests pass; and then they would be executed on a regular basis as a task in a Build Pipeline within Azure DevOps. Unit Tests are cheap to automate and should be quick to run.Integration TestsWe know that individual units of code work, due to unit tests, but how can we be sure that those units work together? Integration tests are intended to verify that the units of code and the services used in a product work together. As a result, they are more expensive to automate and maintain than unit tests; and can take considerably longer to run. Whilst unit tests can be run without dependencies of other parts of the product being available, integration tests often require multiple parts of the product – including infrastructure – to be up and running so that the integrations between units and services can be tested. Because integration tests might require infrastructure to be available, and certainly multiple parts of the product available, integration tests are best run as part of a Release Pipeline in Azure DevOps.Regression TestsWe’ve verified that individual elements of the product work; and we’ve verified that the individual elements of the product work together; what happens if we change elements of the product? This is where regression testing comes in – to verify that newly developed code into a deployed product does not regress expected results. We’ll still need to go through the process of unit testing and integration testing; but do we want to go through the rigmarole of manual testing to check if a change has changed more than what it was meant to? That’s something that we would like to avoid, so we have regression testing to alleviate that need. Like integration tests, they do need multiple parts of the product available so would need to be executed as part of a Release Pipeline in Azure DevOps. Regression testing is expensive to automate and maintain; and slow to run – but that doesn’t mean that they should be avoided. The add a layer of confidence to a newly changed code base which is about to be deployed. However, because we are testing targeted elements, perhaps the entire solution at once, we don’t want to run all regressions tests all the time because they would take a very long time to complete. SummaryWe know why we’re do testing; we are aware of some high-level approaches; and we’ve gone through some types of automated tests in brief detail. This post is first in a series on testing, future posts will include:Unit TestingIntegration TestingRegression TestingAs always, do let me know if you have any feedback or questions in the comments section.

Azure Data Factory Custom Activity Development–Part 4: Testing ADF Pipelines

This is the fourth post in a series on Azure Data Factory Custom Activity Development.Pipeline ExecutionExecuting Pipeline Activities basically requires resetting the status of prior Data Slices so that they can re-execute. We can do this in the “Manage and Monitor“ Data Factory dashboard provided by Azure. We can also do this relatively simply using PowerShell. What is missing is really being able to do this from the comfort of your own Visual Studio environment.When writing Unit and Integration Tests it is most beneficial to be able to do this within the IDE, using frameworks such as VS Test or NUnit to assist with our test logic and execution. This follows proven practice with other types of development and allows easy automation of testing for CI/CD activities. So having made the case, how can we easily go about running our Pipeline Data Slices? Well the ADF API isn’t that intimidating, and although there are some caveats to be aware of, it is relatively easy to go about creating a class or two to assist with the task of executing ADF Activities from DotNet test code. We are of course working on the assumption that we are doing our testing in a Data Factory environment that is indeed purposed for testing and can therefore update the Pipeline accordingly.PipelineExecutorThe first task we need is one that will execute our Pipeline Activity Data Slices. It will need to have access to various properties of the Pipeline itself. It will also need various methods and properties to conduct the desired Data Slice operations. Hence our class with a relatively fitting name of PipelineExecutor. The PipelineGetResponse class within the Microsoft.Azure.DataFactory.Models namespace allows us to request a Pipeline object’s details from our Data Factory. We’ll create a private field for this and retrieve this when constructing our PipelineExecutor class.public class PipelineExecutor { #region Private Fields private string dataFactoryName; private DataFactoryManagementClient dfClient; private DateTime maxDateTime = DateTime.MaxValue; private DateTime minDateTime = DateTime.MinValue; private PipelineGetResponse pipelineGR; private string pipelineName; private string resourceGroupName; #endregion Private Fields #region Public Constructors /// <summary> /// Initializes a new instance of the <see cref="pipelineexecutor"> class. /// /// The adf application identifier. /// The adf application secret. /// The domain. /// The login windows prefix. /// The management windows DNS prefix. /// The subscription identifier. /// Name of the resource group. /// Name of the data factory. /// Name of the pipeline. public PipelineExecutor(string adfApplicationId, string adfApplicationSecret, string domain, string loginWindowsPrefix, string managementWindowsDnsPrefix, string subscriptionId, string resourceGroupName, string dataFactoryName, string pipelineName) { this.resourceGroupName = resourceGroupName; this.dataFactoryName = dataFactoryName; this.pipelineName = pipelineName; ClientCredentialProvider ccp = new ClientCredentialProvider(adfApplicationId, adfApplicationSecret); dfClient = new DataFactoryManagementClient(ccp.GetTokenCloudCredentials(domain, loginWindowsPrefix, managementWindowsDnsPrefix, subscriptionId)); LoadPipeline(); } #endregion Public Constructors /// /// Loads the pipeline using a get request. /// /// public void LoadPipeline() { this.pipelineGR = dfClient.Pipelines.Get(resourceGroupName, dataFactoryName, pipelineName); if (this.pipelineGR == null) throw new Exception(string.Format("Pipeline {0} not found in Data Factory {1} within Resource Group {2}", pipelineName, dataFactoryName, resourceGroupName)); }The PipelineGetResponse.Pipeline property is the actual ADF Pipeline object, so we can apply the Decorator pattern to simply reference this and any other relevant properties within our code whenever required. For example, we create a simple readwrite property End, which will be used for getting and setting the Pipeline End, as below:/// /// Gets or sets the pipeline end. /// /// /// The end. /// public DateTime? End { get { return this.pipelineGR.Pipeline.Properties.End; } set { if (this.pipelineGR.Pipeline.Properties.End != value) { this.pipelineGR.Pipeline.Properties.End = value; IsDirty = true; } } }Notice that we are setting an IsDirty property when updating the property. This can then be used to determine whether we need to save any changes to the Pipeline back to the Data Factory when we are actually ready to run our Data Slice(s).We have properties for the major Pipeline attributes, such as a List of Activities and Data Sets, whether the Pipeline is Paused and various others. These will be used within a number of helper methods that allow for easier running of our Activity Data SlicesAs previously mentioned the method for executing an Activity Data Slice is to set the status accordingly. We also need to check that the Pipeline is not currently paused, and if it is, resume it. For these purposes there are a number of methods that do similar things with regard to Data Slices, such as setting the status of the last Data Slice prior to a specified datetime, for all the Data Sets within the Pipeline, as below:/// /// Sets the state of the pipeline activities last data slice prior to the date specified. Updates the pipeline start and end If data slices fall outside of this range. /// /// The date prior to which the last data slice will be updated. /// State of the slice. /// Type of the update. public void SetPipelineActivitiesPriorDataSliceState(DateTime priorTo, string sliceState = "Waiting", string updateType = "UpstreamInPipeline") { Dictionary<string, dataslice> datasetDataSlices = GetPipelineActivitiesLastDataSlice(priorTo); //the earliest start and latest end for the data slices to be reset. DateTime earliestStart = datasetDataSlices.OrderBy(ds => ds.Value.Start).First().Value.Start; DateTime latestEnd = datasetDataSlices.OrderByDescending(ds => ds.Value.End).First().Value.End; //If the pipeline start and end values do not cover this timespan, the pipeline schedule needs to be updated to use the expanded time period, else //the dataslice updates will fail. if (this.Start > earliestStart) { this.Start = earliestStart; } if (this.End < latestEnd) { this.End = latestEnd; } SavePipeline(); foreach (string datasetName in datasetDataSlices.Keys) { DataSlice ds = datasetDataSlices[datasetName]; SetDataSetDataSliceState(datasetName, ds.Start, ds.End, sliceState, updateType); } } /// /// Saves the pipeline. /// public void SavePipeline() { //update the pipeline with the amended properties if (IsDirty) dfClient.Pipelines.CreateOrUpdate(resourceGroupName, dataFactoryName, new PipelineCreateOrUpdateParameters() { Pipeline = this.pipelineGR.Pipeline }); }Pipeline Data Slice Range Issues…The first gotcha you may encounter when resetting Activity Data Slices is that of possibly exceeding the Pipeline Start and/or End times. If you try and run a Pipeline in this state you will get an exception. Hence the need to check our earliest and latest Data Slice range over all our activities and amend the Start and End properties for our Pipeline (wrapped up inside our PipelineExecutor.Start, End properties). Nothing too taxing here and we’re back in the game.For completeness I’ll include the SetDataSetDataSliceState method called above so you can see what is required to actually set the Data Slice statuses using the Data Factory API, via the Microsoft.Azure.Management.DataFactories.DataFactoryManagementClient class./// /// Sets the state of all data set data slices that fall within the date range specified. /// /// <param name=" datasetname"="">Name of the dataset. /// The slice start. /// The slice end. /// State of the slice. /// Type of the update. /// /// public void SetDataSetDataSliceState(string datasetName, DateTime sliceStart, DateTime sliceEnd, string sliceState = "Waiting", string updateType = "UpstreamInPipeline") { DataSliceState sliceStatusResult; if (!Enum.TryParse<DataSliceState&lgt;(sliceState, out sliceStatusResult)) throw new ArgumentException(string.Format("The value {0} for sliceStatus is invalid. Valid values are {1}.", sliceState, string.Join(", ", Enum.GetNames(typeof(DataSliceState))))); DataSliceUpdateType updateTypeResult; if (!Enum.TryParse<DataSliceUpdateType>(updateType, out updateTypeResult)) throw new ArgumentException(string.Format("The value {0} for sliceStatus is invalid. Valid values are {1}.", sliceState, string.Join(", ", Enum.GetNames(typeof(DataSliceUpdateType))))); DataSliceSetStatusParameters dsssParams = new DataSliceSetStatusParameters() { DataSliceRangeStartTime = sliceStart.ConvertToISO8601DateTimeString(), DataSliceRangeEndTime = sliceEnd.ConvertToISO8601DateTimeString(), SliceState = sliceState, UpdateType = updateType }; dfClient.DataSlices.SetStatus(resourceGroupName, dataFactoryName, datasetName, dsssParams); }You’ll see there are a couple of annoyances we need to code for, such as having to parse the sliceState and updateType parameters against some Enums of valid values that we’ve had to create to ensure only permitted values for these, and having to call ConvertToISO8601DateTimeString() on our slice start and end times. No biggie though.Now that we have a class that gives us some ease of running our Pipeline Slices in a flexible manner (I’ve left out other methods for brevity) we can move on to using these within the various test methods of the Test Project classes we will be using.A simple Base Testing ClassIn the case our the project in question, we will be writing a lot of similar Tests that will simply check row counts on Hive destination objects once a Data Factory Pipeline has executed. To make writing these easier, and remembering our DRY principle mentioned back in Part 2 of this series, we can encapsulate the functionality to do this in a base class method, and derive our row counting Test classes from this.public class TestBase { #region Private Fields protected string adfApplicationId = Properties.Settings.Default.ADFApplicationId; protected string adlsAccountName = Properties.Settings.Default.ADLSAccountName; protected string adlsRootDirPath = Properties.Settings.Default.ADLSRootDirPath; protected string domain = Properties.Settings.Default.Domain; protected string loginWindowsPrefix = Properties.Settings.Default.LoginWindowsDnsPrefix; protected string managementWindowsDnsPrefix = Properties.Settings.Default.ManagementWindowsDnsPrefix; protected string clusterName = Properties.Settings.Default.HDIClusterName; protected string clusterUserName = Properties.Settings.Default.HDIClusterUserName; protected string outputSubDir = Properties.Settings.Default.HDIClusterJobOutputSubDir; protected string dataFactory = Properties.Settings.Default.DataFactory; protected string resourceGroup = Properties.Settings.Default.ResourceGroup; protected string subscriptionId = Properties.Settings.Default.SubscriptionId; protected string tenantId = Properties.Settings.Default.TenantId; #endregion Private Fields #region Public Methods //[TestMethod] public void ExecuteRowCountTest(string pipelineName, long expected, string databaseName, string objectName) { string rowCountStatement = "select count(*) as CountAll from {0}.{1};"; //check for sql injection if (databaseName.Contains(";") | objectName.Contains(";")) throw new ArgumentException(string.Format("The parameters submitted contain potentially malicious values. databaseName : {0}, objectName{1}. This may be an attempt at sql injection", databaseName, objectName)); PipelineExecutor plr = new PipelineExecutor(adfApplicationId, adfApplicationSecret, domain, loginWindowsPrefix, managementWindowsDnsPrefix, subscriptionId, resourceGroup, dataFactory, pipelineName); plr.SetPipelineActivitiesPriorDataSliceState(DateTime.Now); plr.AwaitPipelineCompletion().Wait(); string commandText = string.Format(rowCountStatement, databaseName, objectName); IStorageAccess storageAccess = new HiveDataLakeStoreStorageAccess(tenantId, adfApplicationId, adfApplicationSecret, adlsAccountName, adlsRootDirPath); HiveDataLakeStoreJobExecutor executor = new HiveDataLakeStoreJobExecutor(clusterName, clusterUserName, clusterPassword, outputSubDir, storageAccess); long actual = Convert.ToInt64(executor.ExecuteScalar(commandText)); Assert.AreEqual(expected, actual); } #endregion Public Methods }Our test project classes methods for destination row counts can then be simplified to something such as that below:[TestClass] public class MyPipelineName : TestBase { #region Private Fields private string pipelineName = "MyPipelineName"; #endregion Private Fields #region Public Methods //todo centralise the storage of these [TestMethod] public void DestinationRowCountTest() { string databaseName = "UniversalStaging"; string objectName = "extBrandPlanning"; long expected = 10; base.ExecuteRowCountTest(pipelineName, expected, databaseName, objectName); } #endregion Public Methods } Going Further For tests involving dependent objects we can use mocking frameworks such as NSubstitute, JustMock or Moq to create these, and define expected behaviours for the mocked objects, thereby making the writing and asserting of our test conditions all very much in line with proven Test Driven Development (TDD) practices.I won’t go into this here as there are plenty of well grounded resources out there on these subjects. Up Next… In the final instalment of the series we get remove some of the referenced library limitations inherent in the ADF execution environment. Y’all come back soon now for Part 5: Using Cross Application Domains in ADF…