Ben Jarvis' Blog

Standardising Azure Resource Naming Conventions Using ARM Templates


I’ve recently been looking into linked ARM templates and how they can simplify templates and allow us to apply the single responsibility principle to the templates that define our infrastructure as we do in the code we write for our applications; linked templates are a very easy way to allow us to apply these principles and I’ve found them extremely easy to get set up both running locally and when deploying them through Azure DevOps.

One common thing that tends to come up in all projects is naming conventions for Azure Resources, in our case the client would typically have their own naming convention that we’d need to adhere to or in some cases the client would decide to adopt our convention. Now, the question is how do we manage these naming conventions across all of our ARM templates and make them easy to change? There’s a couple of typical scenarios I’ve seen when working with ARM templates, the first one is that resource names will be hardcoded in the parameter files and the second one is that the resource names are generated within the templates; both of these scenarios mean that our naming conventions are controlled at the individual template level rather than sharing the same logic across all templates created in the organisation.

In this blog post I’m going to run through a solution I’ve defined whereby a separate linked template is created to output the name of each resource we are creating, below I’ll run through a quick demo of the solution and show how easy it is to implement.

The Solution

Typically when using linked ARM templates you will have a parent template that usually represents the resources within your resource group, you will then have multiple deployments within the template that reference your linked templates that you have created to represent each type of resource; this allows you to create a standard set of templates for each resource type and share those templates across multiple projects promoting reusability and standardisation.

In the example below we want to deploy a set of resources to a resource group, which includes:

  • A Function App and its associated Storage Account
  • A Key Vault

To deploy this we would have 3 ARM templates:

  • armdeploy.json – my parent template to represent my resource group
  • functionApp.json – to represent my function app and it’s storage account
  • keyVault.json – to represent my key vault

Represented as a diagram my deployment looks something like below:


Each of the templates that represents a resource type will have parameters to allow us to specify the names of the resources to be created e.g. the template for my function app will have a parameter called functionAppName and a parameter called storageAccountName. This allows us to inject the resource names from my parent template getting us one step towards applying the single responsibility principal as the templates to generate each resource type are no longer responsible for defining the name.

To get us to our ideal solution whereby we assign resource names from a central place we can create another linked template called namingConventions.json that contains the following:

This template accepts the following parameters:

  • clientName
  • projectAbbreviation
  • environment
  • resourceType
  • instanceNumber

It then takes those parameters and generates a name for the resource in the format:


If we pass the template the following parameters for a deployment to a resource group in West Europe:

  • clientName = Adatis
  • projectAbbreviation = Test
  • environment = dev
  • resourceType = functionApp
  • instanceNumber = 01

We’d get the following output in the resourceName property:


The template does not deploy any resources and it’s only responsibility is to generate resource names and output them to the parent template. We can use the output of the template in a subsequent deployment using an expression like below:

Using this solution we can standardise items such as the abbreviation used for each resource and the abbreviations used for each Azure region and ensure that all resources follow the same convention. In our use case we can keep a standard set of ARM templates to use across projects and adjust the naming conventions file to reflect the naming conventions used by the client we are working for. This means we can avoid situations we see daily where a mixture of naming conventions are in use for resources across a tenant.

To pull all of this together we can reference our naming conventions template in our parent deployment and inject the resource names that are returned into our subsequent deployments creating something like the diagram below:


An example of the naming conventions template being used in a deployment can be found on my GitHub at


As we’ve shown above, linked ARM templates are a good first step towards applying the single responsibility principle to our ARM templates however, if we enhance the pattern slightly and add in a naming conventions template we can centralise the generation of names for our Azure resources and ensure that standards are applied across all of our ARM templates.

As always, this solution is something I have implemented to solve a problem I’ve faced however, I’m interested to see how others are solving this so if you’ve got a different approach then feel free to contact me on Twitter @BenJarvisBI or comment below.

Azure Functions Key Vault Integration


A really useful feature that recently came in to public preview for Azure Functions is native integration with Key Vault when retrieving application settings. This feature allows you to source applications settings directly from Key Vault without making any changes to your function code and without having to implement any additional infrastructure code to handle connecting to Key Vault and retrieving secrets. The original blog post announcing this feature can be found at

The Key Vault integration leverages another recent Azure Functions feature called Managed Identities. When creating a function app you can easily create a system assigned managed identity by enabling it through the Azure Portal or including a property within the ARM template (see for more information); this allows you to interact with many Azure services using Azure AD authentication without the need to worry about managing service principals and the inevitable downtime that occurs when their keys expire.

As the feature is currently in preview there is a limitation in that it doesn’t support rotation of the secrets meaning only a single version of the secret is supported however, I’ll run through a workaround for that issue in this post.

Now that we’ve covered the basics of what the feature does I’ll run through a quick demo to demonstrate how it can help us when we’re developing function apps. This post isn’t intended to be an introduction to Azure Functions or Azure Key Vault so I have assumed a basic knowledge of each and only covered the elements of the set up that are related to the Key Vault integration.


The first step is to create our resources in Azure. For the demo we need to create new resource group that contains a function app with its associated storage account and an instance of Azure Key Vault:


Next, we need to enable the system assigned managed identity on our function app, we do this by navigating to the function app in the portal, clicking on the “Platform Features” tab and then clicking on “Identity”:


From there we can set the status of the system assigned managed identity to “On” and click save to apply the change:


Next, we need to grant our function app permissions to retrieve secrets from Key Vault; we do this by navigating to Key Vault in the portal and clicking on “Access Policies”:


Now we need to click on “Add New” and click on the “Select Principal” panel that is displayed, which brings up a new blade where we can search for the managed identity created for our function app in Azure AD:


Now we’ve selected our principal we need to give it the appropriate permissions. To retrieve secrets our function app only needs “Get” permissions on secrets so we can select that option and click ok:


The Key Vault UI isn’t the most intuitive as most people forget the next step but we need to click the save button to commit the changes to our access policies:


Now that we’ve created our Azure resources and set Azure Functions up so that it can communicate with Key Vault we need to add a secret to Key Vault and get our function app to retrieve it. To do this I have created the following script that will add the given secret to Key Vault and then set the required application settings on our function app to allow the value to be retrieved:

Once we run the script our secret will be added to Key Vault and the relevant application setting that acts as a pointer to the secret is added to our function app meaning that if we always use this script to deploy our secrets our function app will always be using the latest version of the secret meaning we are able to work around the current limitation that means the url in the application setting for our function app must include the version of the secret. After running the script we can take a look at the application settings for our function app and see that the reference to the secret is added:


Now we’ve added our secret to Key Vault and created the reference to the secret in our function app we can test that our functions are able to retrieve the secret. To do this we can create a new HTTP triggered function that has the following code to use the Environment.GetEnvironmentVariable function to retrieve the value of our application setting:

When we run the function the result returned is as follows, which matches the value we added to Key Vault! Obviously, in the real world we wouldn’t want to expose the value of our secret through outside of our function but this allows us to see the value that was returned by the app setting.


To conclude, we’ve shown how easy it is to integrate Azure Functions with Azure Key Vault. With the newly released integration we can leverage managed identities to access Key Vault without the need to write any additional code or have the overhead of managing service principals, this means we can ensure that our secrets are stored safely in Key Vault thereby improving the security of our serverless application.

Building REST APIs on Top of Azure Cosmos DB Using Azure Functions

Azure Functions are fast becoming one of my favourite pieces of technology, they are perfect when used in an event driven architecture and their ease of development, deployment, their ability to scale on demand and consumption based pricing make them perfect for a number of use cases. Azure Cosmos DB is also an extremely versatile data store that is extremely easy to get up and running, making it a perfect companion for Azure Functions.

One use case that I haven’t really explored before is using Azure Functions to create REST APIs. The HTTP trigger allows us to trigger a function through a HTTP request so wouldn’t it be brilliant if we could build a REST API using Azure Functions where we would only be charged for each individual request against the API rather than deploying to an App Service that is going to have an ongoing monthly cost associated with it regardless of usage?

Within Azure Functions and even ASP.NET Web API there is always a certain amount of boilerplate code that usually ends up being repeated for each and every method e.g. a typical HTTP triggered Azure Function looks like the below:

Wouldn’t it be great if there were a way to remove this boilerplate and declare our APIs using a simple fluent, code-based API? This is where Function Monkey ( from James Randall ( comes in. We’ll also be using Cosmonaut ( from Nick Chapsas ( to simplify our data access against Cosmos DB.

In this example I’d like to create a REST API to perform CRUD operations against a Cosmos DB collection called “process” that will be responsible for keeping track of when a process begins and completes. Our API will need to expose the following endpoints:

  • POST: /process
  • PUT: /process/{id}/succeeded
  • PUT: /process/{id}/failed
  • GET: /process/{id}
  • GET: /process?status={status}

The implementation of the API is described below and all code can be found on my GitHub at

What is Function Monkey?

Function Monkey provides a fluent API around Azure Functions to allow us to easily define functions without being tightly coupled to the Azure Functions runtime itself. The framework allows us to implement a commanding pattern where we define commands (or queries) with handlers and then have a mediator dispatch the commands to their handlers as they are received. The beauty of this is our handlers, where our logic sits, are decoupled from the underlying trigger meaning we could easily configure our application so the handler is triggered by a queue rather than a HTTP request or even take our code out of Azure Functions and put it into a full-blown ASP.NET Web API with minimal changes.

Azure Functions doesn’t currently have dependency injection built in natively (there are some workarounds available) however, Function Monkey has this functionality baked in using the ASP.NET Core DI library meaning we can easily make our code testable. It’s also easy to configure validation using the built-in integration with the popular FluentValidation library.

The Code


Commands provide the input to our functions so they are a logical place to begin.

Each command implements the ICommand marker interface and defines a set of properties that will be passed in to the functions and provided to our command handlers. They are POCOs so there is no complication within them, the object I have defined as my CreateProcessCommand is given as an example below:


Our command handlers are where the magic happens! Their responsibility is to take the command that is passed into the function and process it.

Each handler implements the ICommandHandler<T> interface with T being the type of the command that the handler is handling. An example of the handler for the CreateProcessCommand is shown below:

The handler supports dependency injection so in our case the constructor accepts an instance of ICosmosStore<Process> which is our Cosmos DB “repository” that is provided by Cosmonaut to allow us to perform operations against Cosmos DB.

The command itself is processed in the ExecuteAsync method, in this case we are creating an instance of our Process entity and adding it to Cosmos DB using the cosmos store provided by Cosmonaut.

Bringing It All Together - Configuration

All of the configuration of Function Monkey is done through a class that implements the IFunctionAppConfiguration interface; this class is used by the Function Monkey compiler to generate the required C# assembly that will contain our functions. My FunctionAppConfiguration class is below:

The IFunctionHostBuilder that is passed into our configuration class exposes a fluent API that allows us to define our functions.

The first step is to call the Setup method which allows us to configure our dependencies for dependency injection using the provided service collection and register our command handlers; in my case I am using the Discover method provided by the command registry to locate and register any command handlers in my assembly.

The next method (OpenApiEndpoint) configures the Open API definition for our API and configures the Swagger user interface that will allow us to explore and use our API later on.

The final step is to define the configuration for our functions. This blog post is only covering HTTP functions however, other triggers are described in the documentation at

The API we are building in this example only covers a single resource so we are defining a single route called “/api/v1/process”; within this route we can define multiple functions. Each function is linked to a command and we can specify the HTTP method the function will represent (GET, POST, PUT etc) and a custom route for the function. We can also specify route parameters, which are useful when we want to pass through an ID or other parameter (see the function definition for GetProcessByIdQuery as an example).

As you can see, the configuration is extremely simple with minimal code and no complicated JSON configuration files to deal with. The fluent API makes things clear and easy to understand.


One thing I came across that’s worth noting is that the class that implements IFunctionAppConfiguration isn’t suitable for retrieving any configuration that needs to be retrieved at runtime as the class is called at build time.

As an example, when configuring Cosmonaut I originally defined my FunctionAppConfiguration class like below where I used Environment.GetEnvironmentVariable to get the values for the Cosmos DB database name, endpoint and auth key. However, when I tried to debug the application the Function Monkey compiler failed because GetEnvironmentVariable returned null. This isn’t the behaviour I wanted because I wanted the values to be retrieved at runtime similar to if they were defined in the ASP.NET Startup class.

To get around this I created a static CosmosStoreFactory class with a method to retrieve an instance of CosmosStore, using Environment.GetEnvironmentVariable to get the configuration values.

I then registered CosmosStore in the DI container like below, using CosmosStoreFactory to instantiate it:

GetCosmosStore is only called at runtime so the function configuration has been loaded at that point meaning the correct connection details are retrieved.

Running Our Application

To run my functions locally I can simply hit F5 and Function Monkey will build my functions and the functions emulator will start.

We configured an Open API endpoint for our function app so if I navigate to http://localhost:7071/openapi/ I get access to Swagger UI where I can see the endpoints that have been defined and also try calling some of the endpoints.


Now, if I want to try out the POST method I can select it and click “Try it Out”, from there I can fill in the details for my request like below:


Once I click the “Execute” button a request will be fired off to the server and I can see that a 200 (OK) HTTP response is returned meaning the operation was successful:


Now, if I go to the document explorer in my Cosmos DB emulator at https://localhost:8081/_explorer/index.html I can see that the following document has been created:

We can also call our API outside of Swagger UI e.g. if I go to Chrome and navigate to http://localhost:7071/api/v1/process?status=in-progress I get the following response:

As you can see, the code involved in creating our API is extremely minimal when compared to creating an API in Azure Functions without Function Monkey. The local development / debugging experience is great and it would be extremely easy to get this solution deployed to Azure, with the only difference being that we would need to pass an authentication key when calling our API endpoints.


To conclude, Azure Functions are an excellent choice when developing a REST API if you want a service that is easy to scale and cheap to run. Combined with Cosmos DB as a data store it’s extremely easy to have something simple working in a CRUD scenario within a matter of hours. Function Monkey makes the whole process painless and allows you to decouple your commands and handlers from the underlying triggers making them easy to test and giving more flexibility if commands need to be processed differently as the application grows.

As always, if there are any questions then please comment or below or contact me on Twitter via @BenJarvisBI.

SQL Server Database Integration Testing Using NUnit

On a current project I have a set of Azure Functions that process files and write various pieces of audit information to an Azure SQL Database to be used for monitoring and further processing. The C# application implements the repository pattern to encapsulate the data access logic and uses Dapper ( as a micro ORM to simplify the code and execute the stored procedures that perform the CRUD operations against the database.

I wanted a way to automate the integration testing of my repositories and stored procedures so I developed the solution described below using NUnit as the test framework and SQL Server LocalDB as the database to run my tests against.

I had the following requirements for my solution which NUnit has been able to satisfy:

  • Quick – tests should run quickly and not require massive amounts of set up / tear down
  • Independent – all tests should be independent from one another and responsible for their own set up / tear down
  • Simple – the test code should be simple to understand and easy to work with when writing new tests.
  • Work Everywhere – the tests should be able to work anywhere and not require huge dependencies like a full SQL Server instance, they should be able to work with SQL LocalDB

The full solution can be found on my GitHub at

Solution Layout

The solution has the following projects:

  • NUnitSQLIntegrationTesting.Database – SSDT database project that includes tables, stored procedures and reference data.
  • NUnitSQLIntegrationTesting.Core – class library that includes the repositories that are used by the C# code to perform data access.
  • NUnitSQLIntegrationTesting.IntegrationTests – NUnit test project


The test database has two tables: dbo.Customer and dbo.CustomerType. CustomerType is a lookup table so the data is static and maintained within our SSDT project, Customer is the table that we want to perform read and write operations against from within our C# application so the NUnitSQLIntegrationTesting.Core project has a CustomerRepository class within it to call our stored procedures and map any returned objects to our domain model.

The code in our repository class is included below:

As above, the repository has the following methods:

  • CreateCustomerAsync – inserts a new row into dbo.Customer using the supplied Customer object
  • GetCustomerAsync – returns a customer object for the supplied CustomerId
  • SetCustomerEmailAsync – updates the EmailAddress column in dbo.Customer for the supplied CustomerId
  • GetCustomersByTypeAsync – returns a collection of customer objects that have the specified CustomerTypeId

The code in the repository is pretty simple with the only complication being some additional code required to map the nested CustomerType object when retrieving a Customer object.

One key element of our repository implementation is that all SQL connections are created using the ISqlServerConnectionFactory implementation that is injected into our repository. In our test scripts we are able to mock this interface so it instead provides a connection to our test database.

Test Infrastructure

The following elements can be found in the NUnitSQLIntegrationTesting.IntegrationTests project.


TestDatabase represents an instance of our test database and provides methods for creating and dropping the database.

The initialise method (below) makes use of the APIs provided in the Microsoft.SqlServer.DacFx package to take the dacpac for the database that sits within our solution and deploy it to our SQL LocalDB instance.

We then have some other methods that are explained below:

  • Drop – drops the database
  • RunSetUpScript – executes the provided SQL against the test database (used for setting up data to be used within a test).
  • RunCleanUpScript – executes a script to delete test data from each table to clean up the database in preparation for running the next test. Currently this is hardcoded but the DatabaseDeleter script from Jimmy Bogard ( could easily be adapted to work here.
  • GetSqlConnection – creates a new SQL connection for use in our test scripts.


This class is implemented as an NUnit SetUpFixture with a SetUp method and a TearDown method, each being decorated with the NUnit OneTimeSetUp and OneTimeTearDown attributes respectively. This means that the SetUp method is called once at the beginning of a test run and the TearDown method is called once at the end of a test run.

This class provides a singleton instance of our TestDatabase meaning each test is connecting to the same database which prevents us having to deploy a new instance of the test database for each test. If we did deploy an instance of our test database for each test the run time of our tests would increase significantly.


This class contains our SQL specific assertions and is responsible for comparing our actual and expected results.

An example of one of the assertions is below, this method takes an expected result as a dynamic object and retrieves the actual result by querying the SQL database.

The method uses a library called Dynamitey ( to retrieve the list of properties from the objects and then iterates through each property comparing the actual value with the expected value.


This class is the base class that all of our integration tests inherit from.

This class again provides SetUp and TearDown methods that are decorated with NUnit SetUp and TearDown attributes however, the difference this time is that these methods will be called at the beginning and end of each test.

One of my requirements was that each test should be able to run independently of the others with no dependencies between tests, to allow this I need to run each test in serial so only a single test is using my database at any one time. To implement this I have used a SemaphoreSlim (see with a maximum request count of 1. The SetUp method calls the Wait method on the semaphore which blocks the thread if another thread is already executing and only allow the thread to continue once the TearDown method has been called by the other thread to release the semaphore. Each test class shares the same instance of the semaphore meaning our tests run in serial; this negates the need for using ordered tests or another method that adds further complication. The TearDown method also executes any clean up scripts against our test database to ensure no test data is left behind.

This class also provides a method called GetFixture which makes use of a library called AutoFixture and it’s integration with Moq (see this article from The TrainLine for more information on how this is useful) to provide a fixture that can be used to get an instance of our repository in our tests.


Now we’ve run through the infrastructure required by our tests we can look at the actual tests themselves.

The CustomerRepositoryTests class found in the Data folder of our NUnitIntegrationTesting.IntegrationTests project contains the tests for our CustomerRepository.

I won’t look at each test in this post as they all follow the same format however, the SetCustomerEmailAddress_WhenGivenAValidCustomerIdAndEmail_DatabaseRowIsUpdated test provides a good example as it includes some set up.

As above, we first call GetFixture to get our AutoFixture Fixture that we then use to create a new instance of our CustomerRepository, injecting our mocked instance of the ISqlServerConnectionFactory along the way.

We then execute the RunSetUpScript method to generate some test data in our database, I’m adding two rows in this case, one that I want to be updated and another that I don’t so I can confirm that my stored procedure is only updating the row I want it to. Once the database is set up we can call our SetCustomerEmailAsync method on our repository to set the email for customer id 1 to

At this point the database row should be updated so we now call Helpers.Assert.SqlResultMatches to verify that the customer id for the customer with the email address is 1.

Our stored procedure is included below so the code should do what we expect and set the email address to for customer id 1.

As below, when we run our tests in visual studio they all pass:


Now, if we change our stored procedure slightly to update the email address where the customer id is the one passed in as a parameter + 1, and run our test again we get a different result:


If we look at the details we can see the expected customer id was 1 but what was actually returned is 2:


Now we can easily see if a change we’ve made to our stored procedures or our repository code has caused any regression in functionality.


As seen above it is extremely easy to get NUnit to provide a simple way to integration test C# repositories and SQL code. The solution now gives us peace of mind when making changes to the repository code or SQL stored procedures and means that any bugs can be picked up at the development stage rather than further along the SDLC or after the code is deployed to production.

In this example I am testing C# repositories that are calling the database however, it would be extremely easy to apply this to a typical BI scenario where you need to test stored procedures on their own.

As always, if there are any questions please comment below or contact me on Twitter via @BenJarvisBI.

Cosmos DB Bulk Executor Library – Performance Comparison

One of the many exciting announcements made at MSBuild recently was the release of the new Cosmos DB Bulk Executor library that offers massive performance improvements when loading large amounts of data to Cosmos DB (see for details on how it works). A project I’ve worked on involved copying large amounts of data to Cosmos DB using ADF and we observed that the current Cosmos DB connector doesn’t always make full use of the provisioned RU/s so I am keen to see what the new library can offer and look to see if our clients can take advantage of these improvements.

In this post I will be doing a comparison between the performance of the Cosmos DB connector in ADF V1, ADF V2 and an app written in C# using the Bulk Executor library. As mentioned in the Microsoft announcement, the new library has already been integrated into a new version of the Cosmos DB connector for ADF V2 so the tests using ADF V2 are also using the Bulk Executor library.

All tests involved copying 1 million rows of data from an Azure SQL DB to a single Cosmos DB, the scaling settings used for the resources involved are:

  • Cosmos DB Collection – partitioned by id, 50000 RU/s
  • Azure SQL DB – S2 (50 DTUs)

Each test was repeated 3 times to enable an average time taken to be calculated.

The document inserted into each the database looked like the below:

Test 1 – ADF V1

To set up the ADF V1 test I used the Copy Data Wizard to generate a pipeline that would copy data from my Azure SQL DB to Cosmos DB. I then executed the pipeline 3 times, recreating the Cosmos DB collection each time.

Each test behaved in a similar way, the process slowly ramped up to 100,000 requests per minute and sustained that throughput until completion. The tests only consumed around 3,000 RU/s out of the 10,000 RU/s provisioned to each partition in the collection (each collection was created with 5 partitions).


The results of the test were:

  • Run 1 – 677 seconds
  • Run 2 – 641 seconds
  • Run 3 – 513 seconds
  • Average – 610 seconds

The performance increased after each run with run 3 taking 513 seconds, nearly 3 minutes quicker than the first test. I can’t explain the differences in time taken however, it seems that ADF progressively ramped up the throughput quicker after each run so it may be down to scaling within the ADF service itself.

Test 2 – ADF V2

To set up the ADF V2 test I again used the Copy Data Wizard to generate the pipeline. I then executed the pipeline 3 times, recreating the Cosmos DB collection each time.

ADF V2 represents a massive improvement over V1 with a 75% increase in performance. The connector used more than the provisioned throughput meaning some throttling occurred however, this means that the collection could have been scaled even further to obtain higher performance.

Interestingly the ADF V2 requests didn’t appear on the number of requests metric shown in the Azure Portal so I’m unable to look at how many requests ADF V2 was able to sustain. I’m unsure of the reason for this however, it could be something like ADF using the direct connection mode to Cosmos DB rather than connecting through the gateway meaning the requests aren’t counted.

The results of the test were:

  • Run 1 – 163 seconds
  • Run 2 – 158 seconds
  • Run 3 – 156 seconds
  • Average – 159 seconds

The performance of ADF V2 was more consistent that V1 and remained reasonably steady across all tests.

Test 3 – C# w/ Bulk Executor Library

To set up the C# test I wrote a quick C# console application that uses the Bulk Executor library, the application was running on my local machine rather than within Azure so there will obviously be a performance hit from the additional network latency. The source code for the application can be found at

The results of the test were:

  • Run 1 – 240 seconds
  • Run 2 – 356 seconds
  • Run 3 –  352 seconds
  • Average –  316 seconds

The performance of the C# application is less consistent however, this is probably due to the network from my local machine to Azure. My application is also not very scalable as it is loading the whole dataset into memory rather than streaming it, as would be required with a larger dataset. The actual code itself is probably also not as optimised as it could be.

Overall however, the performance of my C# application was still 50% better than ADF V1.


The final results of the tests are below:


As above, ADF V2 is the clear winner in the test offering a 75% increase in performance when compared to ADF V1. This represents a huge performance gain and could provide some significant costs savings for users that are loading large amounts of data into Cosmos DB. My C# application offered 50% better performance than ADF V1, running outside of Azure without any optimisation so the performance benefits of the new library are significant.

The new library is an exciting development for Cosmos DB and allows us to fully utilise the capabilities it has to offer when dealing with large amounts of data. I look forward to making use of these benefits in projects, especially the significant improvements in ADF V2!

As always, if you have any questions or comments please let me know.

ADF V2 Issue With File Extension After Decompressing Files

On a current client project we are taking files from an on-prem file server and uploading them to Azure Blob Storage using ADF V2. The files are compressed on-prem using GZip compression and need to be decompressed before they are placed in blob storage where some other processes will pick them up.

ADF V2 natively supports decompression of files as documented at With this functionality ADF should change the extension of the file when it is decompressed so 1234_567.csv.gz would become 1234_567.csv however, I’ve noticed that this doesn’t happen in all cases.

In our particular case the file names and extensions of the source files are all uppercase and when ADF uploads them it doesn’t alter the file extension e.g. if I upload 1234_567.CSV.GZ I get 1234_567.CSV.GZ in blob storage rather than 1234_567.CSV.

If I upload 1234_567.csv.gz the functionality works correctly and I get 1234_567.csv in blob storage.This means that the file extension replace is case sensitive when it should be case insensitive.

This bug isn’t a major issue for us as the file is decompressed and we can change the extension when we process the file further however, it’s something that stumped me for a while.

I’ve raised a bug at to get this fixed so please vote and I’ll update the post once the issue has been resolved.

Uploading Files from On-Prem File System to Azure Blob Storage Using ADF V2


Recently we had a client requirement whereby we needed to upload some files from an on-prem file server to Azure Blob Storage so they could be processed further. The file system connector in ADF V2 with a self-hosted integration runtime was the perfect solution for this so this blog post will discuss the process for getting a basic set up working to test the functionality.

Test Data

The first task is to generate some data to upload. I did this by executing the below PowerShell script to create 100 files, with each being 10 MB in size.

$numberOfFiles = 100
$directory = "C:\Temp\Files"
$fileSize = 1000000 * 10 # 10 MB

Remove-Item $directory -Recurse
New-Item $directory -ItemType Directory

for ($i = 0; $i -lt $numberOfFiles + 1; $i++) 
    fsutil file createnew "$directory\Test_$i.txt" $fileSize

Now we’ve got some files to upload it’s time to create our resources in Azure.

Azure Configuration

The following PowerShell script uses the Azure RM PowerShell cmdlets to create our resources. This script will provision the following resources in our Azure subscription:

  • Resource Group
  • ADF V2 Instance
  • Storage Account

$location = "West Europe"
$resouceGroupName = "ADFV2FileUploadResourceGroup"
$dataFactoryName = "ADFV2FileUpload"
$storageAccountName = "adfv2fileupload"

# Login to Azure
$credential = Get-Credential
Login-AzureRmAccount -Credential $credential

# Create Resource Group
New-AzureRmResourceGroup -Name $resouceGroupName -Location $location

# Create ADF V2
$setAzureRmDataFactoryV2Splat = @{
    Name = $dataFactoryName
    ResourceGroupName = $resouceGroupName
    Location = $location
Set-AzureRmDataFactoryV2 @setAzureRmDataFactoryV2Splat 

# Create Storage Account
$newAzureRmStorageAccountSplat = @{
    Kind = "Storage"
    Name = $storageAccountName
    ResourceGroupName = $resouceGroupName
    Location = $location
    SkuName = "Standard_LRS"
New-AzureRmStorageAccount @newAzureRmStorageAccountSplat

Our Azure Portal should now show the following:


Now we’ve got our resources provisioned it’s time to configure ADF to access our on-prem data.

ADF Configuration

Self-Hosted Integration Runtime

In ADF V2 the integration runtime is responsible for providing the compute infrastructure that carries out data movement between data stores. A self-hosted integration runtime is an on-premise version of the integration runtime that is able to perform copy activities to and from on-premise data stores.

When we configure a self-hosted integration runtime the data factory service, that sits in Azure, will orchestrate the nodes that make up the integration runtime through the use of Azure Service Bus meaning our nodes that are hosted on-prem are performing all of our data movement and connecting to our on-premises data sources while being triggered by our data factory pipelines that are hosted in the cloud. A self-hosted integration runtime can have multiple nodes associated with it, which not only caters for high availability but also gives an additional performance benefit as ADF will use all of the available nodes to perform processing.

To create our self-hosted integration runtime we need to use the following PowerShell script:

$credential = Get-Credential
Login-AzureRmAccount -Credential $credential

$resouceGroupName = "ADFV2FileUploadResourceGroup"
$dataFactoryName = "ADFV2FileUpload"
$integrationRuntimeName = "SelfHostedIntegrationRuntime"

# Create integration runtime
$setAzureRmDataFactoryV2IntegrationRuntimeSplat = @{
    Type = 'SelfHosted'
    Description = "Integration Runtime to Access On-Premise Data"
    DataFactoryName = $dataFactoryName
    ResourceGroupName = $resouceGroupName
    Name = $integrationRuntimeName
Set-AzureRmDataFactoryV2IntegrationRuntime @setAzureRmDataFactoryV2IntegrationRuntimeSplat

# Get the key that is required for our on-prem app
$getAzureRmDataFactoryV2IntegrationRuntimeKeySplat = @{
    DataFactoryName = $dataFactoryName
    ResourceGroupName = $resouceGroupName
    Name = $integrationRuntimeName
Get-AzureRmDataFactoryV2IntegrationRuntimeKey @getAzureRmDataFactoryV2IntegrationRuntimeKeySplat

The above script creates the integration runtime then retrieves the authentication key that is required by the application that runs on our integration runtime node.

The next step is to configure our nodes, to do this we need to download the integration runtime at

Once we run the downloaded executable the installer will run and install the required application, the application will then open on the following screen asking for our authentication key:


We can enter the authentication key retrieved in the previous step into the box and click register to register our integration runtime with the data factory service. Please note that if your organisation has a proxy you will need to enter the details to ensure the integration runtime has the connectivity required.

After clicking register we receive the following prompt, click finish on this and allow the process to complete.


Once the process is complete our integration runtime is ready to go.

More information on creating the self-hosted integration runtime can be found at

ADF Pipeline

Now we’ve got our self-hosted integration runtime set up we can begin configuring our linked services, datasets and pipelines in ADF. We’ll do this using the new ADF visual tools.

Linked Services

The first step is to create our linked services. To do this we open up the visual tools, go to the author tab and select connections; we can then create a new linked service to connect to Azure Blob Storage:


Next we need to create a linked service for our on-prem file share.

First create a new linked service and select the file system connector, we can then fill in the relevant details.


There are a couple of things to note with the above configuration, the first is we have selected our self-hosted integration runtime for the “Connect via integration runtime” option. We have also specified an account that has access to the folder we would like to pick files up from, in my case I have configured a local account however, if you were accessing data from a file share within your domain you would supply some domain credentials.


For our pipeline we will need to create two datasets, one for our file system that will be our source and another for blob storage that will be our sink.

Firstly, we will create our file system dataset. To do this we will create a new dataset and select the file system connector, we will then configure our connection tab, like below:


As above, we have selected our linked service we created earlier, added a filter to select all files and opted to perform a binary copy of the files in the directory.

Next, we need to create a dataset for blob storage. To do this we will create a new dataset using the Azure Blob Storage connector, we will then configure our connection tab, like below:


This time we have entered the container we’d like to copy our files to as the directory name in our file path, we have also opted for a binary copy again.


The final step is to create our pipeline that will tie everything together and perform our copy.

First, create a new pipeline with a single copy activity. The source settings for the copy activity are as follows:


The settings for the sink are:


In this situation we have set the copy behaviour to “Preserve Hierarchy”, which will preserve the folder structure from the source folder when copying the files to blob storage.

Now we’ve got everything set up we can select “Publish All” to publish our changes to the ADF service.


We’ve completed all of the necessary configuration so we can now trigger our pipeline to test that it works. In the pipeline editor click the trigger button to trigger off the copy. We can then go to the monitor tab to monitor the pipeline as it runs:


The pipeline will take a couple of minutes to complete and on completion we can see that the files we selected to upload are now in our blob container:



To conclude, ADF V2 presents a viable solution when there is a need to copy files from an on-prem file system to Azure. The solution is relatively easy to set up and gives a good level of functionality and performance out of the box.

When looking to deploy the solution to production there are some more considerations such as high availability of the self-hosted integration runtime nodes however, the documentation given on MSDN helps give a good understanding of how to set this up by adding multiple nodes to the integration runtime.

Adatis SQLBits Photo Booth

This year at SQLBits Adatis decided to create two technical demos to fit in with the magic theme, these were the Adatis Sorting Hat and the Adatis Photo Booth.

My blog post will look at the Adatis Photo Booth which was designed to allow us to print a wizarding student ID for participants; I’ll give an overview of the technology stack that was used and then do some analysis of the data we collected using Power BI.


The Requirement

The first place to start when designing a technical solution are the business requirements. In our case the requirements were:

  • The user should have an app running on a tablet that will ask them for some basic details such as name, email address and twitter account before allowing them to take a selfie to be printed on their card.
  • The image of the user should be uploaded to the Azure Cognitive Services Face API to capture their emotions.
  • The final data to be placed on the card should be added to a database, in the required format, for our card printing software to pick up.
  • The solution should require the smallest amount of coding possible.
  • The solution should be fully automated and require the least amount of interaction from the Adati on the stand.

The Technology

To satisfy our requirements we chose to use Power Apps for the front-end and Logic Apps for the backend making our solution completely serverless.

A brief overview of the steps in the process and the tech used at each stage is below:

Architecture Diagram

At a high level the technology being used is:

  • Power Apps – allows us to build a cross-platform, front-end application without writing any code.
  • Azure SQL Database – allows us to store the data captured from the application. Power Apps natively integrates with Azure SQL DB.
  • Azure Blob Storage – cheap to use and easy to integrate with our solution. Power Apps doesn’t natively integrate with Blob storage so we quickly created a custom API using the instructions at to upload our images.
  • Azure Cognitive Services – a set of APIs that allowed us to very easily add some intelligence to our application. The face API was easy to integrate and we were using machine learning to map the emotions of the face within minutes.

    The Technical Challenges

    While building the solution we came across a couple of limitations that are useful to raise, these are:

    • There is currently no integration between Power Apps and blob storage meaning you need to roll your own API to allow the images from the camera control to be uploaded. The process is simple and the blog post at gives you details of what needs to be done.
    • The cognitive services connector for Logic Apps doesn’t currently return the emotion data returned by the API. To get around this we made a web request within the logic app and parsed the JSON returned to extract the attributes we needed.

    Apart from the API to upload the images no coding was required to get this solution to function making Power Apps and Logic Apps a very good choice when you need to create a solution that needs to be delivered quickly.

    The Data

    Now we’ve looked at the technical solution we can use Power BI to look at some of the data we collected at SQL Bits.

    To analyse the data I connected my Power BI desktop to the Azure SQL Database we were using, I then created some measures and visuals against the data to give the below result:


    The visuals we have are:

    • Top Left - tree map that look at the count of students by their primary emotion (the emotion that scored highest)
    • Top Middle – % of happy (where primary emotion is happiness) and % of unhappy (where primary emotion is anger, sadness or disgust) over each 3 hour period.
    • Top Right – total number of students we produced cards for, overall % of happy and % unhappy students.
    • Bottom – primary emotion logged over time

    As you can see, we did quite well with over 46% of our students having a primary emotion of happy, with only 13% being unhappy. It seems that over Thursday and Friday we had some unhappy students but on Saturday everybody was happy!


    To conclude, we created our tech demos for SQLBits so everybody can have some fun on the stands however, as you’ve seen, the demos are backed by the cloud technology that we use all day, everyday. Power Apps and Logic Apps are an excellent set of tools and allow you to build applications quickly and easily while connecting various services together; all while being part of the Azure PaaS and SaaS offering without the headache of provisioning servers. The Cognitive Services APIs also present a great offering and allow you to quickly and easily add some intelligence to your application without needing to roll your own solution.

    As always, if you have any questions or comments then please let me know.

    Analysis of Spatial Data Using Cosmos DB


    Recently, while researching Cosmos DB, I came across the in-built capabilities for managing spatial data.

    Cosmos DB is Microsoft’s globally distributed, multi-model database. It has the capability to store various types of data, such as document, graph and key-value and can elastically scale to cope with varying needs. The piece of Cosmos DB that this post will be discussing is the spatial capabilities of the document model.

    The problem I have chosen to solve using the spatial functionality is working out which airfields are within the range of an aircraft when given its true airspeed and fuel endurance in hours; with the range being calculated by multiplying the airspeed by the endurance.

    The Data

    The first step was to find a data set containing a list of all of the world’s airfields, this was found on GitHub at The data set contains the details we need, which are:

    • ICAO code – this is the unique identifier for an airport
    • Airport Name
    • Latitude and Longitude of the Airport

    The next step was to create a Cosmos DB account in the Azure Portal and write a C# app to do some transformations on the data and load the documents into our Cosmos DB collection.

    I first created a C# object that matched the structure of the JSON data:

    using Newtonsoft.Json;
    namespace LoadAirportData
        public class Airport
            public string Id => this.Icao;
            public string Icao { get; set; }
            public string Iata { get; set; }
            public string Name { get; set; }
            public string City { get; set; }
            public string State { get; set; }
            public string Country { get; set; }
            public int Elevation { get; set; }
            public double Latitude { get; set; }
            public double Longitude { get; set; }
            public string TimeZone { get; set; }

    I then created a C# object that matched the structure of the document I wanted to insert into Cosmos DB. The “Point” data type is used to serialize the latitude and longitude into the GeoJSON structure that Cosmos DB supports:

    using Microsoft.Azure.Documents.Spatial;
    using Newtonsoft.Json;
    namespace LoadAirportData
        public class AirportDocument
            public AirportDocument(Airport airport)
                Id = airport.Icao;
                Iata = airport.Iata;
                Name = airport.Name;
                Location = new Point(airport.Longitude, airport.Latitude);
            public string Id { get; set; }
            public string Iata { get; set; }
            public string Name { get; set; }
            public Point Location { get; set; }       

    Finally I created a method that dropped the Cosmos DB database, recreated the database and the document collection then loaded the documents into the collection:

    using Microsoft.Azure.Documents.Client;
    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    using System.Threading.Tasks;
    using Microsoft.Azure.Documents;
    using Newtonsoft.Json;
    using System.Net;
    using System.Collections.Concurrent;
    namespace LoadAirportData
        public class CosmosDbImporter
            private const string ENDPOINT_URL = "<YOUR ENDPOINT>";
            private const string PRIMARY_KEY = "<YOUR KEY>";
            private const string DATABASE_NAME = "airports";
            private const string COLLECTION_NAME = "airportData";
            private const string IMPORT_PATH = @"C:\Temp\Airports.json";
            public async Task ImportData()
                var documentClient = new DocumentClient(new Uri(ENDPOINT_URL), PRIMARY_KEY);
                // Delete the database if it exists
                    await documentClient.DeleteDatabaseAsync(UriFactory.CreateDatabaseUri(DATABASE_NAME));
                catch (DocumentClientException ex)
                    if (ex.StatusCode != HttpStatusCode.NotFound)
                // Create the Database
                await documentClient.CreateDatabaseIfNotExistsAsync(new Database() { Id = DATABASE_NAME });
                // Create the collection and switch on spatial indexing
                DocumentCollection collection = new DocumentCollection() { Id = COLLECTION_NAME };
                collection.IndexingPolicy = new IndexingPolicy(new SpatialIndex(DataType.Point));
                await documentClient.CreateDocumentCollectionIfNotExistsAsync(UriFactory.CreateDatabaseUri(DATABASE_NAME), collection);
                // Read the file and deserialize to our Airport object
                var data = System.IO.File.ReadAllText(IMPORT_PATH);
                var airports = JsonConvert.DeserializeObject<Dictionary<string, Airport>>(data);
                // Upload documents to CosmosDB            
                await Task.WhenAll(
                    from partition in Partitioner.Create(airports.Values).GetPartitions(50)
                    select Task.Run(async delegate
                        using (partition)
                            while (partition.MoveNext())
                                Console.WriteLine($"Processing {partition.Current.Icao}");
                                var airportDocument = new AirportDocument(partition.Current);
                                await documentClient.CreateDocumentAsync(UriFactory.CreateDocumentCollectionUri(DATABASE_NAME, COLLECTION_NAME), airportDocument);

    One thing to note is the above code enables spatial indexing when creating the collection, without this enabled performance is extremely poor when performing spatial queries.

    The beauty of Cosmos DB is that it is able to elastically scale to the performance level specified by the user through the number of RUs (request units) that are allocated to the collection. While loading the data into Cosmos DB I wanted to scale up my database to take advantage of the multithreading in my C# app and speed up my processing so I just went in to the Azure Portal and adjusted the number of RUs allocated to the collection, the change was almost instant and my process instantly sped up. Once I had finished importing the data I was able to scale my database back down so I’m no longer paying for any unused capacity.



    Now the data is in Cosmos DB we can begin to play around with some spatial queries.

    To query airfields within a certain distance of a specified point I can run the following query which returns all of the airfields within 25km of Blackbushe airport. As you can see, the SQL syntax for querying Cosmos DB is very similar to T-SQL meaning it’s very easy to re-use your SQL Server skills:

   AS Name,
            ST_DISTANCE(airports.location, {"type": "Point", "coordinates": [-0.8475000262,51.3238983154]}) AS Distance
    FROM    airports 
    WHERE   ST_DISTANCE(airports.location, {"type": "Point", "coordinates": [-0.8475000262,51.3238983154]}) < 25000

    The above query returns the following results, which are the 7 airfields that are within 25km of Blackbushe:

            "ICAO": "EGHL",
            "Name": "Lasham Airport",
            "Distance": 19964.7890768588
            "ICAO": "EGVO",
            "Name": "RAF Odiham",
            "Distance": 11985.957064869535
            "ICAO": "EGTF",
            "Name": "Fairoaks Airport",
            "Distance": 20229.321025944442
            "ICAO": "EGLF",
            "Name": "Farnborough Airport",
            "Distance": 7286.035340157135
            "ICAO": "EGLK",
            "Name": "Blackbushe Airport",
            "Distance": 0
            "ICAO": "EGLM",
            "Name": "White Waltham Airfield",
            "Distance": 20312.693531316185
            "ICAO": "EGLP",
            "Name": "Brimpton Airfield",
            "Distance": 23311.94703537874

    The App

    The next step is to create an application that uses the functionality to find the airfields within the range of an aircraft. To do this I created a basic ASP.NET MVC application that has a form with the following fields:


    When the form is submitted the following C# code is executed:

    public async Task Index(AirportFinderModel model)
    	var documentClient = new DocumentClient(new Uri(ENDPOINT_URL), PRIMARY_KEY);
    	var baseAirfield = await documentClient.ReadDocumentAsync(UriFactory.CreateDocumentUri(DATABASE_NAME, COLLECTION_NAME, model.BaseAirfield));
    	var availableDistanceMeters = (model.EnduranceHours * model.TrueAirspeed) * 1852;
    	var result =
    		.CreateDocumentQuery(UriFactory.CreateDocumentCollectionUri(DATABASE_NAME, COLLECTION_NAME))
    		.Where(a => a.Location.Distance(baseAirfield.Document.Location) <= availableDistanceMeters)
    	return View("Result", new AirportFinderResultModel()
    		MapPoints = JsonConvert.SerializeObject(result.Select(a => new MapPoint()
    			Title = a.Id,
    			Description = a.Name,
    			Longitude = a.Location.Position.Longitude,
    			Latitude = a.Location.Position.Latitude

    The above code connects to Cosmos DB and retrieves the details for the base airfield that was specified, it then calculates the range of the aircraft in meters by multiplying the endurance (in hours) by the true airspeed in knots (nautical miles per hour) and then multiplying that my 1852 (number of meters in a nautical mile). A Linq query is then run against Cosmos DB using the built-in spatial functions to find airfields within the specified distance. The result is then converted into a JSON array that can be understood by the Google Maps API that is being used on the client side.

    The client side uses the Google Maps API to plot the airfields on a map, giving us a view like the one below when given a base airfield of Blackbushe (EGLK), a true airspeed of 100kts and an endurance of 4.5 hours:


    The current functionality of the app is extremely basic but could easily be expanded to make the range calculation more accurate by looking at wind and other factors that can affect range. This could be done by creating a polygon representing our range and then using the ST_WITHIN function to find all airfields within that polygon. The functionality could also be enhanced to take into account other attributes of the airfield by deciding if an airfield is suitable based on runway length and other facilities.


    As you can see, using Cosmos DB it is extremely easy to analyse spatial data and the elastic scaling capability allows you to easily increase capacity of your application to cope with increased usage and large amounts of data. The spatial capabilities of Cosmos DB can make it a viable alternative to other databases such as SQL Server that aren’t always as easy to scale on demand and don’t have the flexibility that a document model can give.

    As always, if you have any questions or comments then let me know.