Ust

Ust Oldfield's Blog

Azure Active Directory Authentication and Azure Data Catalog

In a previous post I introduced Azure Data Catalog. Because it’s great for data discovery and for data asset management, it makes sense to automate, as much as possible, the process of registering new data assets, and allowing users to discover data in a more natural, perhaps conversational, way. In order to automate the registration of data assets or to allow discovery through other tools, it’s necessary to look at how Azure Data Catalog authenticates users using Azure Active Directory (AAD). This post is going to explore some of the options the Azure Data Catalog uses for authentication and a walkthrough of a code example to make authentication work without user input.

Azure Active Directory Authentication

If you have interacted with Azure Data Catalog before, you will find that there are two ways of doing so. First, there’s the web application that allows you to conduct data discovery and data asset management. Then there’s the native application that sits on your local machine that can be used for registering data assets. These use different methods of authenticating using Azure Active Directory. The first one uses Web Browser to Web Application authentication. The second uses Native Application to Web API authentication.

Web Browser to Web Application

What is involved with Web Browser to Web Application authentication? Simply put, the web application directs the user’s browser to get them to sign-in AAD. AAD then returns a token which authenticates the user to use the web application. In practice, it’s a bit more complex, so here’s a diagram to help explain it.

image

In a bit more detail, the process it follows is:

1) A user visits the application and needs to sign in, they are redirected via a sign-in request to the authentication endpoint in AAD.

2) The user signs in on the sign-in page.

3) If authentication is successful, AAD creates an authentication token and returns a sign-in response to the application’s Reply URL that was configured in the Azure Portal. The returned token includes claims about the user and AAD that are required by the application to validate the token.

4) The application validates the token by using a public signing key and issuer information available at the federation metadata document for Azure AD. After the application validates the token, Azure AD starts a new session with the user. This session allows the user to access the application until it expires.

This method of authentication is used by Azure Data Catalog when discovering data through the browser.

Native Application to Web API

What’s the process of Native Application to Web API authentication? Simply put, the application will ask you to sign-in to AAD, so that it can acquire a token in order to access resources from the Web API. In practice, it’s a bit more complex, so here’s a diagram to help explain it.

image

In a bit more detail, the process it follows is:

1) The native application makes a request to the authorisation endpoint in AAD, but using a browser pop-up. This request includes the Application ID and redirect URI of the native application (see the following article for native applications and registering them in Azure) and the Application ID URI of the Web API. The user is then requested to sign-in.

2) AAD authenticates the user. AAD then issues an authorisation code response back to the application’s redirect URI.

3) The Application then stops the browser activity and extracts the authorisation code from the response. Using the authorisation code, the Application then requests an access token from AAD. It also uses details about the native application and the desired resource (Web API).

4) The authorisation code and details are checked by AAD, which then returns an access token and a refresh token.

5) The Application then uses the access token to add to the authorisation header in its request to the Web API. Which returns the requested resource, based on successful authentication.

6) When the access token expires, the refresh token is used to acquire a new access token without requiring the user to sign-in again.

This method of authentication is used by Azure Data Catalog when registering data assets via the desktop application.

Automated Interaction with Azure Data Catalog

In both of the examples above, they require the user to interact in order to provide sign-in credentials. This is not ideal if we want to automate the registration of data assets or conduct data discovery outside of the browser. Therefore we’ll need to use a different method of authentication. This is the Server Application to Web API authentication method. Simply, it assumes that the server has already required a user to login and therefore has the user’s credentials. It then uses those credentials to request the access and refresh tokens from AAD.

image

In a bit more detail, the process it follows is:

1) The Server Application makes a request to AAD’s Token Endpoint, bypassing the Authentication Endpoint, providing the credential, Application ID and Application URI.

2) AAD authenticates the application and returns an access token that can be used to call the Web API.

3) The Application uses the access token to add to the authorisation header in its request to the Web API. Which returns the requested resource, based on successful authentication.

This method is what we’re going to use to automate our interaction with Azure Data Catalog.

From an authentication aspect, the code for Server Application to Web API is simple and this example will take us to the point of returning that token, from which we can then use to request resources from the Azure Data Catalog API. The full code can be found my GitHub repo.

We are going to use the Client Id and Secret from an application we’ve registered in AAD (full process can be found in this Microsoft article on Integrating Applications with AAD).

private static string clientId = "ApplicationId";

private static string secret = "ApplicationKey";

Then, we’re going to make sure we’re connecting to the correct AAD instance

private static string authorityUri = string.Format("https://login.windows.net/{0}", tenantId);

So we can create an authorisation context

AuthenticationContext authContext = new AuthenticationContext(authorityUri);

In order to acquire a token

authResult = await authContext.AcquireTokenAsync(resourceUri, new ClientCredential(clientId, secret));

Which can then be used in an authorisation header in requests to the Azure Data Catalog API. In the next related post, we’ll explore how to make a call to the API using this authentication method.

Introduction to Kubernetes

Kubernetes is an orchestrator for containerised applications. This post will aim to give a high-level overview of what Kubernetes is.

According to the team at Kubernetes, Kubernetes provides a container-centric management environment. It orchestrates computing, networking, and storage infrastructure on behalf of user workloads. This provides much of the simplicity of Platform as a Service (PaaS) with the flexibility of Infrastructure as a Service (IaaS), and enables portability across infrastructure providers.

Where PaaS operates at a hardware, Kubernetes sits at the container level which means that you don’t get a full PaaS offering – but you do get some features such as ease of deployment, scalability, load balancing, logging and monitoring. Unlike IaaS, it’s not a monolithic solution – each solution is optional and pluggable, providing a platform to build upon, like Lego bricks, preserving choice and flexibility where required.

It is also not just an orchestrator. Most orchestrators use workflow: Do this, then that etc., whereas Kubernetes is a set of independent control processes to drive the current state to the desired state. Traditional orchestration can be viewed as the means justify the end, whereas Kubernetes can be viewed as the end justifies the means.

You can think of Kubernetes as one of a few things. Either a container platform; a microservices platform; or a portable cloud platform. There are probably more applications for Kubernetes, but those are the three broad and dominant uses of it.

Why Containers?

Without containers, the way to deploy an application was to install the application on the host system using the OS package manager. It entangles the application with the host OS. Rollback is difficult, but possible. However rollback would often be restoring a VM image – which is heavy-duty and non-portable.

Containers virtualise the operating system rather than virtualise the hardware, like a VM does. They’re isolated from each other and the host. They have their own file systems and their resource usage can be bound. Because they are decoupled from the infrastructure and the host OS, they are portable across different operating systems and between on-prem and cloud distributions.

image


Working with Kubernetes

To interact with Kubernetes, you interact with the Kubernetes API objects. These objects describe the cluster’s desired state. Effectively, what applications or work loads do you want to run; the container image they should use; the number of replicas; the resources to make available – to name but a few. The desired state is set by creating objects using the API, typically using a command line interface called kubectl. Once this desired state has been set the Control Plane works to make the current state match the desired state. The process of doing this, Kubernetes manages automatically, but it does so through a collection of processes that run on a cluster. These are:

  • The Kubernetes Master, which is a collection of three processes (kube-apiserver, kube-controller-manager, kube-scheduler) that run on a single node in the cluster. When you interact with a Kubernetes cluster through kubectl, you’re interacting with the master.
  • A worker node will run two processes – kubelet, which communicates with the master node; and kube-proxy, which is a network proxy for the node. A worker node is a machine that runs the workload. The master controls each node.

Kubernetes Objects

There are several Kubernetes objects. As a basic set, these objects are:

  • Pod – like DNA, a Pod is the basic building block of Kubernetes. A Pod represents a process running on a cluster. It encapsulates a container and the resources it needs and the behaviour for how it should run. A Pod represents a unit of deployment: a single instance of Kubernetes, which may contain one or many tightly coupled containers. Docker is the most container runtime used in a Pod.
  • Service – a Service is a logical abstraction for a set of Pods and a policy by which to access them.
  • Volume – a Volume is similar to a shared disk but are vital to resolving issues that arise with containers. On-disk, containers are temporary. They are mortal. If a container crashes, it will be restarted but files that it had within are lost. Similarly, if you run many containers in a Pod it can be necessary to share files between the containers. Volume solves these problems.

The Control Plane

The Control Plane maintains a record of all Kubernetes objects and runs continuous maintenance loops to check that each objects matches the desired state.

At a high-level, that is Kubernetes. Be on the look out for more posts around Kubernetes.

UPDATE: This post was updated on the 20/03/2018 to give more detail to what Kubernetes is

Tabular Automation and NuGet

In a recent blog post, I wrote about processing an Azure Analysis Services tabular model using Azure Functions. In it, there’s a lengthy process of downloading some DLLs and uploading them to the Azure Function. Handily, the Analysis Services team at Microsoft have released the Analysis Services NuGet package, which means that the necessary DLLs can be automatically installed to an Azure Function without much hassle. This blog is going to go through the steps of adding the NuGet package to your Azure Function.

Add a new file to your function called project.json

image

Input the following code in the newly created file

{
   "frameworks": {
     "net46":{
       "dependencies": {
         "Microsoft.AnalysisServices.retail.amd64": "15.0.2"
       }
     }
    }
}

Then save the Azure Function to proceed with the NuGet restore and compile your function. You should see the following logs in your log window.

image

That is the entire process. Much easier than documented previously!

Process an Azure Analysis Services Tabular Model from an Azure Function

A couple of weeks ago I wrote a blog post detailing how to process a tabular model from a stored procedure. The challenge there was to have a way of processing a tabular model on demand from a web application. The challenge still exists, but how do you achieve the same in Azure using Platform as a Service (PaaS) objects which do not have the benefit of a full server and items, such as msdb and SQL Agent, to aid in the process?

In this post, I’m going to show you how to process a tabular model only using Azure PaaS offerings. Not only am I going to show you how to do process a tabular model on-demand, but also how to process a tabular model on a schedule. This post has taken inspiration and part of the code base from the a Microsoft blog: Automating Azure Analysis Services processing with Azure Functions.

Azure Functions

Before we begin properly, it’s worth spending some time introducing Azure Functions. According to Microsoft, Azure Functions are:

…a solution for easily running small pieces of code, or "functions," in the cloud. You can write just the code you need for the problem at hand, without worrying about a whole application or the infrastructure to run it. Functions can make development even more productive, and you can use your development language of choice, such as C#, F#, Node.js, Java, or PHP. Pay only for the time your code runs and trust Azure to scale as needed. Azure Functions lets you develop server less applications on Microsoft Azure.

They are super useful for extending the capabilities of any solution and not just limited to what we’re going to cover here.

On-Demand Refresh

The use of Azure Functions creates a trigger for the on-demand refresh of a tabular model from the web application or web hook, this is to make sure that selected elements of data in a tabular model, for example hot partitions, are always up to date. The following describes the process that Azure Functions will be involved in this scenario:

image

The steps that are needed to create an Azure Function for On-Demand Refresh are as follow:

1) Create an Azure Function App

Navigate to the Azure Portal and create a Function App (the name changes quite a bit, so don’t be concerned if it’s not exactly displayed as it is below)

image

image

2) Create a new Function

After you’ve created the Function App, we need to add a new Webhook + API function, which we’ll use as the basis for our on-demand refresh. Click on the + button next to Functions, select Webhook + API, choose C# as your language and click Create this function.

image

3) Configure the Function

Download the latest client libraries for Analysis Services. This needs to be done to your local machine so you can then copy these files to your Azure Function App. After you’ve downloaded the client libraries, the DLLs can be found in C:\Program Files (x86)\Microsoft SQL Server\140\SDK\Assemblies. The two files you need are:

C:\Program Files (x86)\Microsoft SQL Server\140\SDK\Assemblies\Microsoft.AnalysisServices.Core.DLL
C:\Program Files (x86)\Microsoft SQL Server\140\SDK\Assemblies\Microsoft.AnalysisServices.Tabular.DLL

The documentation references the 130 assemblies, which is not correct and will not work. You need the assemblies in 140.

In order to add these assemblies to your function, you’ll need to add a folder called “bin”. To do this, select your Function App, click Platform features, and under Development Tools, click Advanced tools (Kudu).

image

In Kudu, click Debug console and select Cmd. Navigate to the site\wwwroot\OnDemandHttpTrigger folder and add the “bin” folder here by clicking the + button.

image

Once you’ve added the “bin” folder, go back over to the Azure portal and select your function OnDemandHttpTrigger.

On the right under View files, navigate to the bin folder. Then click the Upload button to add the two previously mentioned DLLs to the bin folder.

image

You should see the two DLLs in your bin folder now.

image

4) Add Azure Analysis Services Connection String to the Function App

This step is going to add the connection string to the Azure Analysis Services (AAS) service to the entire Function App, not just individual functions.

Click the name of your Function App, then select Platform features. Select Application settings under General Settings.

image

Now we need to add our connection string under the Connection strings section. You’ll need your AAS server name and a user ID and password that has access to the AAS database.

You can find your AAS server name by navigating to your AAS database in the Azure portal and copying the value constructed after clicking Show server connection strings:

image

Your connection string should look like this:

Provider=MSOLAP;Data Source=<your aas server>; Initial Catalog=<aas database name>;User ID=<your username>;Password=<your password>

Back in the screen for the Function App, fill in the Name textbox with a name for your connection string and paste your connection string in the Value text box:

image

Click Save at the top to save these settings for your Functions.

5) Time for Code

Our Function App has been configured, now we need to add code to the function. The function comes with a working function, for which to test out the functionality, but we don’t need everything that is on offer.

image

We’re going to programmatically process the tabular model. In doing so, we’ll leverage Analysis Services Management Objects (AMO). If you’re new to AMO, start here.

Paste in the following code (all code referenced can also be download from my GitHub Repo):

#r "Microsoft.AnalysisServices.Tabular.DLL"

#r "Microsoft.AnalysisServices.Core.DLL"

#r "System.Configuration"

using System;

using System.Configuration;

using Microsoft.AnalysisServices.Tabular;

using System.Net;

public static async Task<HttpResponseMessage> Run(HttpRequestMessage req, TraceWriter log)

{

log.Info("C# HTTP trigger function processed a request.");

// parse query parameter

string status = req.GetQueryNameValuePairs()

.FirstOrDefault(q => string.Compare(q.Key, "status", true) == 0)

.Value;

if (status == null)

{

// Get request body

dynamic data = await req.Content.ReadAsAsync<object>();

status = data?.status;

}

if (status == "execute")

{log.Info($"C# trigger function started at: {DateTime.Now}"); 

try

            {

Microsoft.AnalysisServices.Tabular.Server asSrv = new Microsoft.AnalysisServices.Tabular.Server();

var connStr = ConfigurationManager.ConnectionStrings["AASTabular"].ConnectionString;

asSrv.Connect(connStr);

Database db = asSrv.Databases["azureadventureworks"];

Model m = db.Model;

//db.Model.RequestRefresh(RefreshType.Full);     // Mark the model for refresh

m.RequestRefresh(RefreshType.Full);     // Mark the model for refresh

//m.Tables["Date"].RequestRefresh(RefreshType.Full);     // Mark only one table for refresh

db.Model.SaveChanges();     //commit  which will execute the refresh

asSrv.Disconnect();

            }

catch (Exception e)

            {

log.Info($"C# trigger function exception: {e.ToString()}");

            }

log.Info($"C# trigger function finished at: {DateTime.Now}");

}

return status == "execute"

?req.CreateResponse(HttpStatusCode.OK, "Successfully Processed Tabular Model ")

:req.CreateResponse(HttpStatusCode.BadRequest, "Please pass a status on the query string or in the request body");

}

Click the Save button at the top.

6) Test, Test, Test

Click the Run button at the top to test the function

image

The function can also be tested in a web browser, and be called by a Web App using the POST HTTP method.

image

Now we have a fully functioning method of refreshing an Azure Analysis Services tabular model on-demand.

Scheduled Refresh

The use of Azure Functions creates a trigger for the scheduled refresh of a tabular model, this is to make sure that the entire tabular model has the latest data and is always up to date. The following describes the process that Azure Functions will be involved in this scenario:

image

The steps that are needed to create an Azure Function for Scheduled Refresh are as follow:

1) Create a Function

We’ve created our Function App, and now we need to add a new Timer Trigger function, which we’ll use as the basis for our scheduled refresh. Click on the + button next to Functions, select Timer, choose C# as your language and click Create this function.

2) Configure the Timer

What use is a timer without a schedule? To give the timer a schedule, click Integrate, set the schedule and click Save.

image

The Schedule text box expects a CRON expression to define the days and times that the function should execute. Click the little Documentation button on the screen above to read about CRON expressions. The schedule I’ve set is to run once everyday at 09:30AM.

3) Configure the Function

See step 3 of the On-Demand Function for detailed steps. You’ll need to create the bin folder and upload the DLLs to the bin folder.

4) Time for Code

We’ve configured our function, so now it’s time to add the code. The code base is much simpler than the On-Demand code base, mainly because it’s doing fewer tasks. But the AMO section is exactly the same. Paste in the following code:

#r "Microsoft.AnalysisServices.Tabular.DLL"

#r "Microsoft.AnalysisServices.Core.DLL"

#r "System.Configuration"

using System;

using System.Configuration;

using Microsoft.AnalysisServices.Tabular;

public static void Run(TimerInfo myTimer, TraceWriter log)

{

log.Info($"C# Timer trigger function started at: {DateTime.Now}"); 

try

            {

Microsoft.AnalysisServices.Tabular.Server asSrv = new Microsoft.AnalysisServices.Tabular.Server();

var connStr = ConfigurationManager.ConnectionStrings["AASTabular"].ConnectionString;

asSrv.Connect(connStr);

Database db = asSrv.Databases["azureadventureworks"];

Model m = db.Model;

//db.Model.RequestRefresh(RefreshType.Full);     // Mark the model for refresh

m.RequestRefresh(RefreshType.Full);     // Mark the model for refresh

//m.Tables["Date"].RequestRefresh(RefreshType.Full);     // Mark only one table for refresh

db.Model.SaveChanges();     //commit  which will execute the refresh

asSrv.Disconnect();

            }

catch (Exception e)

            {

log.Info($"C# Timer trigger function exception: {e.ToString()}");

            }

log.Info($"C# Timer trigger function finished at: {DateTime.Now}");

}

Click the save button at the top.

5) Test, Test, Test

Click the Run button at the top to test the function

image


Now we have a fully functioning method of refreshing an Azure Analysis Services tabular model on-demand.

Conclusion

I have shown you how simple it is to invoke two methods of refreshing a tabular model using Azure Functions: an On-Demand refresh and a refresh by Schedule. I hope that you take inspiration and use these methods in your use of both Azure Analysis Services and Azure Functions.