Ust

Ust Oldfield's Blog

Whitelisting Azure IP addresses for SQL Server

In a recent blog post, I wrote about whitelisting Azure Data Centre IP addresses for Key Vault. Key Vault’s firewall uses CIDR notation for IP ranges, which is exactly what is contained within the list of IP addresses supplied by Microsoft. However, there are some resources, like Azure SQL Server, which only accept IP ranges. Therefore, we need a way of converting CIDR to an IP range.

Handily, there’s a PowerShell script which exists to provide that conversion – called ipcalc.ps1. When you download it, make sure it’s in the same working folder as the script you’re going to use to create the new firewall rules.

From there, we can make slight amends to the script we had in the previous post and produce the following script:


If you need to assign the IP ranges to other resources you can substitute the New-AzSqlServerFirewallRule with the appropriate cmdlet and parameters


Whitelisting Azure IP addresses for Key Vault

A colleague came to me with an interesting request:

We want to put Key Vault behind a firewall, but when we do that it means that Azure Data Factory can no longer access the secrets. Is there a way to whitelist the IP addresses for a given Azure Data Centre?

The short answer is: Yes.

By default, the following option is enabled on Azure Key Vault under the Firewalls and virtual networks blade.

image

For most users, having unrestricted access from external networks to a resource that holds secrets, certificates and other sensitive information is a big red flag.

If we choose to only allow access from Selected Networks we get the following options opening up for us:

image

Note that trusted Microsoft services is not an extensive list and does not include Azure Data Factory.

image

Therefore we need to whitelist a series of IP Addresses in the firewall rules. The list of IP Addresses are published by Microsoft and are updated on a weekly basis. The IP addresses are published in an XML document, which isn’t always the best format when one needs to update firewalls in Azure.

Shredding XML

To update the Firewall in Azure, we’re going to use PowerShell to shred the XML and extract the IP ranges for a given region. Then, we’re going to use the updated Azure PowerShell module to register the IP ranges against the Key Vault.

Using the last command, we can check that the IP ranges have been registered successfully. You should see something like:

image

There we have it, explicit IP whitelisting of Azure Data Centres so we can lock down Azure resources, only opening up access when we need to.

Update

Key Vault is currently limited to 127 firewall rules. If you are adding a region with more than 127 IP ranges, you might have an issue…

Testing: What’s the point?

I’m almost certain that every developer has asked themselves this question at least once throughout their careers. You’ve developed your solution, it works fine on your machine and now the deployment into production is being held up because someone mentions the need to do testing. What’s the point of testing? Ultimately, to provide assurance about the quality of a product.

With testing, there are two approaches:

  • Manual
  • Automated

Manual testing is what most developers complain about: it’s expensive to setup; laborious to execute; time consuming to repeat; and prone to human error. Manual testing typically takes the form of User Acceptance Tests – and sometimes can be the only tests that are conducted on a product. How confident are we that the product is of high quality if we only do manual testing? Not very.

Automated testing is what every developer should be doing: they’re executed by a machine; they’re repeatable; they’re more robust and reliable than manual testing. However, like manual testing, the quality of the test is dependent on how well the test scripts have been written and the test scripts can vary hugely in complexity. The tests could vary from very simple build verification tests through to complex regression tests.

Types of Testing

At its most simplest, testing can be build verification and at its most complex, testing can be user acceptance testing. But to get a true feel of how complex they are and how often you should use them, we should refer to a testing tree.

test tree

As we can see, the wider the segment the more frequently we should employ it and, as we work our way up the pyramid, the more complex the type of testing becomes. For the remainder of this blog post, I’m going to briefly expand on the following types of tests:

  • Build Verification
  • Unit
  • Integration
  • Regression

Build Verification Tests

A build verification test is using a tool like MS Build to answer the question: does my code compile? If it does compile, the test has passed. If it doesn’t compile, then the test has failed. This can be used in the local development environment, through Visual Studio or it can be conducted as a task in a Build Pipeline within Azure DevOps. These types of tests are extremely cheap to automate and maintain; and very quick to run.

Unit Tests

Unit tests are low level tests, meaning that they are close to the source of the product. They should be written with the aim of testing individual methods and functions for a given code base, using a unit test framework to support the authoring and execution of a test. As a developer, you would typically author the unit tests in a development tool like Visual Studio; you’d run them locally to ensure that the tests pass; and then they would be executed on a regular basis as a task in a Build Pipeline within Azure DevOps. Unit Tests are cheap to automate and should be quick to run.

Integration Tests

We know that individual units of code work, due to unit tests, but how can we be sure that those units work together? Integration tests are intended to verify that the units of code and the services used in a product work together. As a result, they are more expensive to automate and maintain than unit tests; and can take considerably longer to run. Whilst unit tests can be run without dependencies of other parts of the product being available, integration tests often require multiple parts of the product – including infrastructure – to be up and running so that the integrations between units and services can be tested. Because integration tests might require infrastructure to be available, and certainly multiple parts of the product available, integration tests are best run as part of a Release Pipeline in Azure DevOps.

Regression Tests

We’ve verified that individual elements of the product work; and we’ve verified that the individual elements of the product work together; what happens if we change elements of the product? This is where regression testing comes in – to verify that newly developed code into a deployed product does not regress expected results. We’ll still need to go through the process of unit testing and integration testing; but do we want to go through the rigmarole of manual testing to check if a change has changed more than what it was meant to? That’s something that we would like to avoid, so we have regression testing to alleviate that need. Like integration tests, they do need multiple parts of the product available so would need to be executed as part of a Release Pipeline in Azure DevOps. Regression testing is expensive to automate and maintain; and slow to run – but that doesn’t mean that they should be avoided. The add a layer of confidence to a newly changed code base which is about to be deployed. However, because we are testing targeted elements, perhaps the entire solution at once, we don’t want to run all regressions tests all the time because they would take a very long time to complete.

Summary

We know why we’re do testing; we are aware of some high-level approaches; and we’ve gone through some types of automated tests in brief detail. This post is first in a series on testing, future posts will include:

As always, do let me know if you have any feedback or questions in the comments section.

Integration Testing Overview

In a previous post, I touched on the point of testing and briefly talked about integration testing. In this post, I will be going into more detail about what integration testing is and why it’s important to do it.

In the previous post, I said that Integration Tests are:

intended to verify that the units of code and the services used in a product work together. As a result, they are more expensive to automate and maintain than unit tests; and can take considerably longer to run. Whilst unit tests can be run without dependencies of other parts of the product being available, integration tests often require multiple parts of the product – including infrastructure – to be up and running so that the integrations between units and services can be tested. Because integration tests might require infrastructure to be available, and certainly multiple parts of the product available, integration tests are best run as part of a Release Pipeline in Azure DevOps.

To expand on this, integration tests are written for each integration point for a solution. But what do we mean by “integration point”? An integration point is typically where two or more units of code interact with each other, or two or more services interact with each other – verifying that the individual parts or components of a solution works as intended together with other parts. How do we define an integration point?

Integration Points

We define an integration point by whiteboarding each component of our solution with the aim to document how they interact with each other. We can highlight the integration point by drawing a circle around it.

Consider the following architecture:

image

It’s a fairly typical modern data warehouse solution. We’re ingesting data from a variety of sources and storing it in a data lake. We’re then transforming and processing that data into our warehouse schema before presenting it in a data warehouse; processing it in an analysis services model so that it can be reported on. That’s the architecture, but the components used might be very different and interact differently with the architecture.

For the ingestion, our integration points are going to be between the following components:

sourceToRawIntegration

For the transformation piece, our integration points are going to look like:

rawToCuratedIntegration

Finally, for processing our data into the semantic model, the integration points look like:

curatedToSemanticIntegration

As you can see, the integration points do not align perfectly with the architecture – bear in mind that every solution is different, so your integration points will definitely look different even if the broad architecture is the same.

Integration Testing

We’ve documented our integration points and now we need to write some integration tests. Integration tests are executed for the various integration points that exist in a solution. Like most forms of testing, integration tests follow a pattern of:

  • Initialise system under test
  • Call functionality under test
  • Assert expected outcome against result of method

Generally, integration tests will be executed after deployment as they often require the infrastructure to exist. Most of the time, integration tests should not be dependent on data. However, if data does need to exist, this must be created at the time of setting up the tests. Most of the time, you can automate integration tests using a unit test framework such as Pester or NUnit.

Some Best Practices

To get you going, I’m going to set out some best practices that you should aim to follow:

  • Only create integration tests you need
  • Don’t depend on data being available. If you have tests that depend on data – create that data before execution, as part of the test setup
  • Multiple asserts per test. You might have dependencies on external resources that you’d like to keep open or you want a fast running set of tests. Multiple asserts help with all of these.
  • Choose unit tests over integration tests when feasible. Don’t duplicate effort.

Further reading

My colleague Ben has written an excellent blog on SQL Integration Testing using NUnit.

I’ll add another post soon about how to do Integration Testing using Pester.


Unit Testing Overview

In a previous post, I touched on the point of testing and briefly talked about unit testing. In this post, I will be going into more detail about what unit testing is and why it’s important to do it.

In the previous post, I said that Unit Tests are:

low level tests, meaning that they are close to the source of the product. They should be written with the aim of testing individual methods and functions for a given code base, using a unit test framework to support the authoring and execution of a test. As a developer, you would typically author the unit tests in a development tool like Visual Studio; you’d run them locally to ensure that the tests pass; and then they would be executed on a regular basis as a task in a Build Pipeline within Azure DevOps. Unit Tests are cheap to automate and should be quick to run.

To expand on this, unit tests are written by a developer to apply to a unit of code. But what do we mean by “unit of code”? A unit of code is the smallest testable part of a solution – verifying that the individual part or component of a solution works as intended, independently from other parts. A unit could be a C# method; a PowerShell function; a T-SQL Stored Proc, and many others. Like most forms of testing, unit tests follow a pattern of:

  • Initialise system under test
  • Call method under test
  • Assert expected outcome against result of method

A best practice would be to write the unit test before the writing any code, but if you’ve not got to that level of maturity with your test approach - writing tests after code is still good practice.

How do you write a good unit test?

Keep it simple

  • A unit test shouldn’t replicate the code it is intended to test.
  • You’ll be writing lots of them, so make them quick and easy to write.

Readable

  • By keeping it simple, the test should also be readable. Making it easy to know what method is being tested and the expected behaviour of the method.
  • By making it readable, you can easily address any failures that may surface.

Reliable and Repeatable

  • Unit tests should only fail if there are bugs in the system, not because there are bugs in the tests. Keeping it simple and readable will avoid that issue.
  • Unit tests need to be run many times, sometimes multiple times throughout the course of a day, so they need to be executed quickly in a repeatable manner. Keeping it simple helps achieve this aim.

How do you write a unit test?

We’ve got an understanding of what a unit test is, but how do we write one? For this example, we’ll be writing our code and tests using C#.

Our application is a very simple calculator, which adds two numbers together.

Calculator

Simply, to add a new Unit Test, we can right-click on the method and select Create Unit Tests. Because we’ve not built any unit tests before, we can use it to create a new unit test project using a framework of choice. If we already had a unit test project, we could add the new test to the existing project.

createUnitTest

Using this method, it creates a skeleton of a unit test from which we can amend for our needs.

unitTestNew

As you can see, this doesn’t contain what we need, so we amend the test so that it reflects our requirements, as in the below.

unitTestAmended

To run a Unit Test, you can either right-click on the test method and click on Run Test(s) or open up the Test Explorer window, navigate to the desired test and click on Run Selected Tests.

Unit Tests in Azure DevOps

We’ve written our unit tests and have run them locally, but how do we make it repeatable? We utilise the power of Azure DevOps to have repeatable tests run against a changing code base as part of the Build or Continuous Integration process.

image

The process is:

  1. Install NuGet on the Build Agent
  2. Restore any packages from NuGet that your application requires
  3. Build solution
  4. Run tests
  5. Publish tests
  6. Copy successfully built and tested artifacts to a staging directory
  7. Publish those artifacts

Using Azure DevOps, or another CI tool, we can rely upon our tests in a repeatable manner.

Additional Reading

There’s a good post by Sergey Kolodiy on the importance of writing good code and how unit testing encourages good behaviour.

My colleague Jon has also written a post on the subject: Setup Unit Testing with NUnit and NBi.


Introduction to Kubernetes

Kubernetes is an orchestrator for containerised applications. This post will aim to give a high-level overview of what Kubernetes is.

According to the team at Kubernetes, Kubernetes provides a container-centric management environment. It orchestrates computing, networking, and storage infrastructure on behalf of user workloads. This provides much of the simplicity of Platform as a Service (PaaS) with the flexibility of Infrastructure as a Service (IaaS), and enables portability across infrastructure providers.

Where PaaS operates at a hardware, Kubernetes sits at the container level which means that you don’t get a full PaaS offering – but you do get some features such as ease of deployment, scalability, load balancing, logging and monitoring. Unlike IaaS, it’s not a monolithic solution – each solution is optional and pluggable, providing a platform to build upon, like Lego bricks, preserving choice and flexibility where required.

It is also not just an orchestrator. Most orchestrators use workflow: Do this, then that etc., whereas Kubernetes is a set of independent control processes to drive the current state to the desired state. Traditional orchestration can be viewed as the means justify the end, whereas Kubernetes can be viewed as the end justifies the means.

You can think of Kubernetes as one of a few things. Either a container platform; a microservices platform; or a portable cloud platform. There are probably more applications for Kubernetes, but those are the three broad and dominant uses of it.

Why Containers?

Without containers, the way to deploy an application was to install the application on the host system using the OS package manager. It entangles the application with the host OS. Rollback is difficult, but possible. However rollback would often be restoring a VM image – which is heavy-duty and non-portable.

Containers virtualise the operating system rather than virtualise the hardware, like a VM does. They’re isolated from each other and the host. They have their own file systems and their resource usage can be bound. Because they are decoupled from the infrastructure and the host OS, they are portable across different operating systems and between on-prem and cloud distributions.

image


Working with Kubernetes

To interact with Kubernetes, you interact with the Kubernetes API objects. These objects describe the cluster’s desired state. Effectively, what applications or work loads do you want to run; the container image they should use; the number of replicas; the resources to make available – to name but a few. The desired state is set by creating objects using the API, typically using a command line interface called kubectl. Once this desired state has been set the Control Plane works to make the current state match the desired state. The process of doing this, Kubernetes manages automatically, but it does so through a collection of processes that run on a cluster. These are:

  • The Kubernetes Master, which is a collection of three processes (kube-apiserver, kube-controller-manager, kube-scheduler) that run on a single node in the cluster. When you interact with a Kubernetes cluster through kubectl, you’re interacting with the master.
  • A worker node will run two processes – kubelet, which communicates with the master node; and kube-proxy, which is a network proxy for the node. A worker node is a machine that runs the workload. The master controls each node.

Kubernetes Objects

There are several Kubernetes objects. As a basic set, these objects are:

  • Pod – like DNA, a Pod is the basic building block of Kubernetes. A Pod represents a process running on a cluster. It encapsulates a container and the resources it needs and the behaviour for how it should run. A Pod represents a unit of deployment: a single instance of Kubernetes, which may contain one or many tightly coupled containers. Docker is the most container runtime used in a Pod.
  • Service – a Service is a logical abstraction for a set of Pods and a policy by which to access them.
  • Volume – a Volume is similar to a shared disk but are vital to resolving issues that arise with containers. On-disk, containers are temporary. They are mortal. If a container crashes, it will be restarted but files that it had within are lost. Similarly, if you run many containers in a Pod it can be necessary to share files between the containers. Volume solves these problems.

The Control Plane

The Control Plane maintains a record of all Kubernetes objects and runs continuous maintenance loops to check that each objects matches the desired state.

At a high-level, that is Kubernetes. Be on the look out for more posts around Kubernetes.

UPDATE: This post was updated on the 20/03/2018 to give more detail to what Kubernetes is

Automating The Deployment of Azure Data Factory Custom Activities

Custom Activities in Azure Data Factory (ADF) are a great way to extend the capabilities of ADF by utilising C# functionality. Custom Activities are useful if you need to move data to/from a data store that ADF does not support, or to transform/process data in a way that isn't supported by Data Factory, as it can be used within an ADF pipeline.

Deploying Custom Activities to ADF is a manual process, which requires many steps. Microsoft’s documentation lists them as:

  • Compile the project. Click Build from the menu and click Build Solution.
  • Launch Windows Explorer, and navigate to bin\debug or bin\release folder depending on the type of build.
  • Create a zip file MyDotNetActivity.zip that contains all the binaries in the \bin\Debug folder. Include the MyDotNetActivity.pdb file so that you get additional details such as line number in the source code that caused the issue if there was a failure.
  • Create a blob container named customactivitycontainer if it does not already exist
  • Upload MyDotNetActivity.zip as a blob to the customactivitycontainer in a general purpose Azure blob storage that is referred to by AzureStorageLinkedService.

The number of steps means that it can take some time to deploy Custom Activities and, because it is a manual process, can contain errors such as missing files or uploading to the wrong storage account.

To avoid that errors and delays caused by a manual deployment, we want to automate as much as possible. Thanks to PowerShell, it’s possible to automate the entire deployment steps.

The script to do this is as follows:

Login-AzureRmAccount

# Parameters
$SourceCodePath = "C:\PathToCustomActivitiesProject\"
$ProjectFile ="CustomActivities.csproj"
$Configuration = "Debug"
#Azure parameters
$StorageAccountName = "storageaccountname"
$ResourceGroupName = "resourcegroupname"
$ContainerName = "blobcontainername"


# Local Variables
$MsBuild = "C:\Program Files (x86)\MSBuild\14.0\Bin\MSBuild.exe";           
$SlnFilePath = $SourceCodePath + $ProjectFile;           
                            
# Prepare the Args for the actual build           
$BuildArgs = @{           
     FilePath = $MsBuild           
     ArgumentList = $SlnFilePath, "/t:rebuild", ("/p:Configuration=" + $Configuration), "/v:minimal"           
     Wait = $true           
     }         

# Start the build           
Start-Process @BuildArgs
# initiate a sleep to avoid zipping up a half built project
Sleep 5

# create zip file

$zipfilename = ($ProjectFile -replace ".csproj", "") + ".zip"

$source = $SourceCodePath + "bin\" + $Configuration
$destination = $SourceCodePath + $zipfilename
if(Test-path $destination) {Remove-item $destination}
Add-Type -assembly "system.io.compression.filesystem"
[io.compression.zipfile]::CreateFromDirectory($Source, $destination)

#create storage account if not exists
$storageAccount = Get-AzureRmStorageAccount -ErrorAction Stop | where-object {$_.StorageAccountName -eq $StorageAccountName}      
if  ( !$storageAccount ) {
     $StorageLocation = (Get-AzureRmResourceGroup -ResourceGroupName $ResourceGroupName).Location
     $StorageType = "Standard_LRS"
     New-AzureRmStorageAccount -ResourceGroupName $ResourceGroupName  -Name $StorageAccountName -Location $StorageLocation -Type $StorageType
}

#create container if not exists
$ContainerObject = Get-AzureStorageContainer -ErrorAction Stop | where-object {$_.Name -eq $ContainerName}
if (!$ContainerObject){
$storagekey = Get-AzureRmStorageAccountKey -ResourceGroupName $ResourceGroupName -Name $StorageAccountName
$context = New-AzureStorageContext -StorageAccountName $StorageAccountName -StorageAccountKey $storagekey.Key1 -Protocol Http
New-AzureStorageContainer -Name $ContainerName -Permission Blob -Context $context
}

# upload to blob
#set default context
Set-AzureRmCurrentStorageAccount -StorageAccountName $StorageAccountName -ResourceGroupName  $ResourceGroupName
Get-AzureRmStorageAccount -ResourceGroupName $ResourceGroupName -Name $StorageAccountName
# Upload file
Set-AzureStorageBlobContent –Container $ContainerName -File $destination


By removing the manual steps in building, zipping and deploying ADF Custom Activities, you remove the risk of something going wrong and you add the reassurance that you have a consistent method of deployment which will hopefully speed up your overall development and deployments.

As always, if you have any questions or comments, do let me know.