Ust

Ust Oldfield's Blog

Process an Azure Analysis Services Tabular Model from an Azure Function

A couple of weeks ago I wrote a blog post detailing how to process a tabular model from a stored procedure. The challenge there was to have a way of processing a tabular model on demand from a web application. The challenge still exists, but how do you achieve the same in Azure using Platform as a Service (PaaS) objects which do not have the benefit of a full server and items, such as msdb and SQL Agent, to aid in the process?

In this post, I’m going to show you how to process a tabular model only using Azure PaaS offerings. Not only am I going to show you how to do process a tabular model on-demand, but also how to process a tabular model on a schedule. This post has taken inspiration and part of the code base from the a Microsoft blog: Automating Azure Analysis Services processing with Azure Functions.

Azure Functions

Before we begin properly, it’s worth spending some time introducing Azure Functions. According to Microsoft, Azure Functions are:

…a solution for easily running small pieces of code, or "functions," in the cloud. You can write just the code you need for the problem at hand, without worrying about a whole application or the infrastructure to run it. Functions can make development even more productive, and you can use your development language of choice, such as C#, F#, Node.js, Java, or PHP. Pay only for the time your code runs and trust Azure to scale as needed. Azure Functions lets you develop server less applications on Microsoft Azure.

They are super useful for extending the capabilities of any solution and not just limited to what we’re going to cover here.

On-Demand Refresh

The use of Azure Functions creates a trigger for the on-demand refresh of a tabular model from the web application or web hook, this is to make sure that selected elements of data in a tabular model, for example hot partitions, are always up to date. The following describes the process that Azure Functions will be involved in this scenario:

image

The steps that are needed to create an Azure Function for On-Demand Refresh are as follow:

1) Create an Azure Function App

Navigate to the Azure Portal and create a Function App (the name changes quite a bit, so don’t be concerned if it’s not exactly displayed as it is below)

image

image

2) Create a new Function

After you’ve created the Function App, we need to add a new Webhook + API function, which we’ll use as the basis for our on-demand refresh. Click on the + button next to Functions, select Webhook + API, choose C# as your language and click Create this function.

image

3) Configure the Function

Download the latest client libraries for Analysis Services. This needs to be done to your local machine so you can then copy these files to your Azure Function App. After you’ve downloaded the client libraries, the DLLs can be found in C:\Program Files (x86)\Microsoft SQL Server\140\SDK\Assemblies. The two files you need are:

C:\Program Files (x86)\Microsoft SQL Server\140\SDK\Assemblies\Microsoft.AnalysisServices.Core.DLL
C:\Program Files (x86)\Microsoft SQL Server\140\SDK\Assemblies\Microsoft.AnalysisServices.Tabular.DLL

The documentation references the 130 assemblies, which is not correct and will not work. You need the assemblies in 140.

In order to add these assemblies to your function, you’ll need to add a folder called “bin”. To do this, select your Function App, click Platform features, and under Development Tools, click Advanced tools (Kudu).

image

In Kudu, click Debug console and select Cmd. Navigate to the site\wwwroot\OnDemandHttpTrigger folder and add the “bin” folder here by clicking the + button.

image

Once you’ve added the “bin” folder, go back over to the Azure portal and select your function OnDemandHttpTrigger.

On the right under View files, navigate to the bin folder. Then click the Upload button to add the two previously mentioned DLLs to the bin folder.

image

You should see the two DLLs in your bin folder now.

image

4) Add Azure Analysis Services Connection String to the Function App

This step is going to add the connection string to the Azure Analysis Services (AAS) service to the entire Function App, not just individual functions.

Click the name of your Function App, then select Platform features. Select Application settings under General Settings.

image

Now we need to add our connection string under the Connection strings section. You’ll need your AAS server name and a user ID and password that has access to the AAS database.

You can find your AAS server name by navigating to your AAS database in the Azure portal and copying the value constructed after clicking Show server connection strings:

image

Your connection string should look like this:

Provider=MSOLAP;Data Source=<your aas server>; Initial Catalog=<aas database name>;User ID=<your username>;Password=<your password>

Back in the screen for the Function App, fill in the Name textbox with a name for your connection string and paste your connection string in the Value text box:

image

Click Save at the top to save these settings for your Functions.

5) Time for Code

Our Function App has been configured, now we need to add code to the function. The function comes with a working function, for which to test out the functionality, but we don’t need everything that is on offer.

image

We’re going to programmatically process the tabular model. In doing so, we’ll leverage Analysis Services Management Objects (AMO). If you’re new to AMO, start here.

Paste in the following code (all code referenced can also be download from my GitHub Repo):

#r "Microsoft.AnalysisServices.Tabular.DLL"

#r "Microsoft.AnalysisServices.Core.DLL"

#r "System.Configuration"

using System;

using System.Configuration;

using Microsoft.AnalysisServices.Tabular;

using System.Net;

public static async Task<HttpResponseMessage> Run(HttpRequestMessage req, TraceWriter log)

{

log.Info("C# HTTP trigger function processed a request.");

// parse query parameter

string status = req.GetQueryNameValuePairs()

.FirstOrDefault(q => string.Compare(q.Key, "status", true) == 0)

.Value;

if (status == null)

{

// Get request body

dynamic data = await req.Content.ReadAsAsync<object>();

status = data?.status;

}

if (status == "execute")

{log.Info($"C# trigger function started at: {DateTime.Now}"); 

try

            {

Microsoft.AnalysisServices.Tabular.Server asSrv = new Microsoft.AnalysisServices.Tabular.Server();

var connStr = ConfigurationManager.ConnectionStrings["AASTabular"].ConnectionString;

asSrv.Connect(connStr);

Database db = asSrv.Databases["azureadventureworks"];

Model m = db.Model;

//db.Model.RequestRefresh(RefreshType.Full);     // Mark the model for refresh

m.RequestRefresh(RefreshType.Full);     // Mark the model for refresh

//m.Tables["Date"].RequestRefresh(RefreshType.Full);     // Mark only one table for refresh

db.Model.SaveChanges();     //commit  which will execute the refresh

asSrv.Disconnect();

            }

catch (Exception e)

            {

log.Info($"C# trigger function exception: {e.ToString()}");

            }

log.Info($"C# trigger function finished at: {DateTime.Now}");

}

return status == "execute"

?req.CreateResponse(HttpStatusCode.OK, "Successfully Processed Tabular Model ")

:req.CreateResponse(HttpStatusCode.BadRequest, "Please pass a status on the query string or in the request body");

}

Click the Save button at the top.

6) Test, Test, Test

Click the Run button at the top to test the function

image

The function can also be tested in a web browser, and be called by a Web App using the POST HTTP method.

image

Now we have a fully functioning method of refreshing an Azure Analysis Services tabular model on-demand.

Scheduled Refresh

The use of Azure Functions creates a trigger for the scheduled refresh of a tabular model, this is to make sure that the entire tabular model has the latest data and is always up to date. The following describes the process that Azure Functions will be involved in this scenario:

image

The steps that are needed to create an Azure Function for Scheduled Refresh are as follow:

1) Create a Function

We’ve created our Function App, and now we need to add a new Timer Trigger function, which we’ll use as the basis for our scheduled refresh. Click on the + button next to Functions, select Timer, choose C# as your language and click Create this function.

2) Configure the Timer

What use is a timer without a schedule? To give the timer a schedule, click Integrate, set the schedule and click Save.

image

The Schedule text box expects a CRON expression to define the days and times that the function should execute. Click the little Documentation button on the screen above to read about CRON expressions. The schedule I’ve set is to run once everyday at 09:30AM.

3) Configure the Function

See step 3 of the On-Demand Function for detailed steps. You’ll need to create the bin folder and upload the DLLs to the bin folder.

4) Time for Code

We’ve configured our function, so now it’s time to add the code. The code base is much simpler than the On-Demand code base, mainly because it’s doing fewer tasks. But the AMO section is exactly the same. Paste in the following code:

#r "Microsoft.AnalysisServices.Tabular.DLL"

#r "Microsoft.AnalysisServices.Core.DLL"

#r "System.Configuration"

using System;

using System.Configuration;

using Microsoft.AnalysisServices.Tabular;

public static void Run(TimerInfo myTimer, TraceWriter log)

{

log.Info($"C# Timer trigger function started at: {DateTime.Now}"); 

try

            {

Microsoft.AnalysisServices.Tabular.Server asSrv = new Microsoft.AnalysisServices.Tabular.Server();

var connStr = ConfigurationManager.ConnectionStrings["AASTabular"].ConnectionString;

asSrv.Connect(connStr);

Database db = asSrv.Databases["azureadventureworks"];

Model m = db.Model;

//db.Model.RequestRefresh(RefreshType.Full);     // Mark the model for refresh

m.RequestRefresh(RefreshType.Full);     // Mark the model for refresh

//m.Tables["Date"].RequestRefresh(RefreshType.Full);     // Mark only one table for refresh

db.Model.SaveChanges();     //commit  which will execute the refresh

asSrv.Disconnect();

            }

catch (Exception e)

            {

log.Info($"C# Timer trigger function exception: {e.ToString()}");

            }

log.Info($"C# Timer trigger function finished at: {DateTime.Now}");

}

Click the save button at the top.

5) Test, Test, Test

Click the Run button at the top to test the function

image


Now we have a fully functioning method of refreshing an Azure Analysis Services tabular model on-demand.

Conclusion

I have shown you how simple it is to invoke two methods of refreshing a tabular model using Azure Functions: an On-Demand refresh and a refresh by Schedule. I hope that you take inspiration and use these methods in your use of both Azure Analysis Services and Azure Functions.

Process a Tabular model from a Stored Procedure

The Challenge

Recently, at a client, I was challenged to create a stored procedure that would process a tabular model. This stored procedure would then be executed from a web application. The process behind it being: a user enters data into a web application, which gets written to a database. That data then needs to be immediately surfaced up into reports, with additional calculations and measures along the way. Therefore the tabular model, which does all the additional calculation and measures, needs to be processed by a user from the web application.

image

That’s the challenge – process a tabular model quickly, but should be processed by users on-demand.

The Solution

Part one: Method of process

There’s quite a few methods to process a tabular model: use an SSIS package, PowerShell, SQL Agent Job and others. I opted for SQL Agent Job because it was the most simple method of execution without having to reconfigure the server or introduce technologies and languages that weren’t already in use.

First things first, create a SQL Agent Job, I called mine ProcessTabular. Then create a Step. The Type should be SQL Server Analysis Services Command, input the server address and input the refresh script. I’m using SQL Server 2016, so using Tabular Model Scripting Language (TMSL) for my command. XMLA commands also work for older versions. A full list of commands for processing a tabular database can be found here.

image

Part two: Start the Agent Job

Now that we have a SQL Agent job, we’ll need to start that job using SQL. Luckily, there’s a system stored procedure that can start agent jobs: msdb.dbo.sp_start_job

Method for calling it is

EXEC msdb.dbo.sp_start_job 'ProcessTabular'

producing the following successful message

image

Part three: The Stored Procedure

sp_start_job works, but it doesn’t accommodate for providing a completion message, or informing a user that a process is in progress.

Introducing the code for the stored procedure:

CREATE PROCEDURE [dbo].[ProcessTabularDatabase]
 
AS
 
DECLARE @JobName nvarchar(50) = 'ProcessTabular',
@ResultCheck INT
 
IF NOT EXISTS(    
         SELECT 1
         FROM msdb.dbo.sysjobs_view job 
         JOIN msdb.dbo.sysjobactivity activity ON job.job_id = activity.job_id
         WHERE 
             activity.run_Requested_date IS NOT NULL 
         AND activity.stop_execution_date IS NULL 
         AND job.name = @JobName
         )
BEGIN     
     PRINT 'Starting job ''' + @JobName + '''';
     EXEC msdb.dbo.sp_start_job @JobName;
 
  WHILE (1 = 1)
  BEGIN
   SELECT @ResultCheck =  IIF(stop_execution_date IS NULL,1,0)
      FROM msdb.dbo.sysjobactivity AS sja
      JOIN msdb.dbo.sysjobs AS sj ON sja.job_id = sj.job_id
      WHERE sj.name = @JobName
   IF @ResultCheck = 0 BREAK;
  END
 
  PRINT 'Successfully Processed Tabular Database'
END
ELSE
BEGIN
     PRINT 'Job ''' + @JobName + ''' is already started ';
END


Conclusion

This stored procedure can be executed by the web application, enabling users to process a tabular database on-demand and get feedback as to the success of the task.











Data Source Permissions and On-Premises Data Gateway: SQL Server and Analysis Services

In Microsoft’s documentation surrounding the On-Premises Data Gateway, the advice on permissions for the account used to authenticate the Data Source in the Power BI Service can be concerning for most, especially DBAs.

In the Analysis Services section of the documentation, the advice is:

The Windows account you enter must have Server Administrator permissions for the instance you are connecting to. If this account’s password is set to expire, users could get a connection error if the password isn’t updated for the data source.

Server Administrator permissions…? What happened to the principle of least-privilege?

In a practical sense, the On-Premises Data Gateway has to deal with two very different implementations of Analysis Services: Multidimensional and Tabular. Each are setup and configured differently from the other, and the nature of their security models are also different. As a one size fits all approach, it works. As we will soon see, the permissions do not have to be set as Server Admin

The SQL section of the documentation, on the other hand, doesn’t actually specify what permissions are required for the Data Source to be established in the Power BI Service.

Permissions

Exactly what permissions are required for these common data sources, I hear you ask. As data sources are established at a database level, so too are the permissions set.

Data Source

Minimum Permissions Level

SQL Server Database

db_datareader

SSAS Tabular Database

Process database and Read

SSAS Multidimensional Database

Full control (Administrator)

Principle of least-permissions is now restored.

Though there still are the curious incidents of Analysis Services data sources requiring permissions in addition to read. I am unsure, I have my suspicions, and have tried to find out. If you know, please leave a comment below!



Archiving the Data Lake

In a blog introducing the Data Lake Framework, keen readers will be aware that in the diagram there’s a box titled “ARCHIVE” but it has not been brought up since. The reason why the Archive layer in the data lake has not been discussed is because we’ve been waiting for the Archive Tier in Blob Storage.

To remind readers of the framework and where the archive layer sits in it, here it is again with the archive layer highlighted.

image

The Archive Blob

The Archive access tier in blob storage was made generally available today (13th December 2017) and with it comes the final piece in the puzzle to archiving data from the data lake.

Where Hot and Cool access tiers can be applied at a storage account level, the Archive access tier can only be applied to a blob storage container. To understand why the Archive access tier can only be applied to a container, you need to understand the features of the Archive access tier. It is intended for data that has no or low SLAs for availability within an organisation and the data is stored offline (Hot and Cool access tiers are online). Therefore, it can take up to 15 hours for data to be made online and available. Brining Archive data online is a process called rehydration (fitting for the data lake). If you have lots of blob containers in a storage account, you can archive them and rehydrate them as required, rather than having to rehydrate the entire storage account.

Archive Pattern

An intended use for the Archive access tier is to store raw data that must be preserved, even after it has been fully processed, and does not need to be accessed within 180 days.

Data gets loaded into the RAW area of the data lake, is fully processed through to CURATED, and a copy of the raw data is archived off to a blob container with a Cool access tier applied to it. When the archive cycle comes about, a new Cool access tiered blob container is created and the now old container has its access tier changed to Archive.

For example, our Archive cycle is monthly and we have a Cool access tiered blob container in our storage account called “December 2017”. When data has finished being processed in the Azure Data Lake, the Raw data is archived to this blob container. January comes around, we create a new blob container called “January 2018” with Cool access tier settings and change the access tier of “December 2017” from Cool to Archive.

This data has now been formally achieved and is only available for disaster recovery, auditing or compliance purposes. 




One-way Domain Trust and Power BI

I ran into a problem setting up on-premises data gateways on a client recently, whereby they had two domains but with a one-way trust. The result was that when authenticating within the Power BI Service to retrieve data from on-premises data sources in the untrusted domain it would throw an error.

At this point it is worth spending some time explaining the architecture.

The Architecture

The architecture might be familiar to many who use Power BI and the on-premises data gateway, with a little caveat. Domain 1 is the main domain. Domain 2 is the secondary domain and trusts Domain 1. Domain 1, on the other hand, doesn’t trust Domain 2.

image

A user in Domain 1 can access data sources in both Domain 1 and Domain 2. They can create their Power BI reports with a live connection or direct query and publish them to the Power BI Service. In order to use the reports in the service, on-premises data gateways need to be established to provide a messaging service between on-premises and the cloud. In this example, each domain has a domain controller, a tabular server and an on-premises data gateway for each tabular server.

The Problem

When a user logged-on to the Power BI Service tries to access data from Domain 2, their credentials are passed down to the on-premises data gateway, checked against the domain controller in Domain 2 and returns an error to the Power BI Service.

What I think happens is that the user (User.One@Domain1.com) will have their credentials passed down through the on-premises data gateway to the domain controller in Domain 2. Either the domain controller will not be able to find the user, it is the untrusted domain, and will not be able to pass the short name (DOMAIN1\USERONE) to the tabular server, or it tries to check with the domain controller in Domain 1 and encounters the dreaded Kerberos and cannot perform a double hop to return the short name. Either way, the result is the same in that the short name cannot be passed to the tabular server.

image

The Solution

As you can imagine, there are a few solutions to the problem.

  • If it is a Kerberos related issue, then Kerberos will have to be configured separately
  • Make Domain 2 a trusted domain
  • User mapping in Power BI Service

This latter approach is the one I opted for because it was guaranteed to work and would not change the current domain and network configuration.

In the gateways settings in the Power BI Service, I went to the Users tab under my data source and clicked on Map user names. In there I mapped users in Domain 1 to users in Domain 2.

image

If you have a large number of users, individual mapping might not be preferable or feasible, which is why you can replace the Domain names in part of the user string, as in example 3. This, however, does rely upon users in Domain 1 having an equivalent account in Domain 2. This is not always the case, for which the wildcard to service account would work, as shown in example 4.







LETS Process Data–Modern ETL for the Data Lake

At the PASS Summit this year, I attended a session by Michael Rys. In this session he introduced the concept of LETS as an approach to process data in the data lake. If you are familiar with data lake, then you will be familiar of having to apply a schema to the data held within. The LETS approach is purpose design for schematization.

Where ETL stands for Extract, Transform, Load or ELT stands for Extract, Load, Transform – LETS stands for Load, Extract, Transform, Store.

Data are Loaded into the data lake

Data are Extracted and schematized

Data are Transformed in rowsets

Data are Stored in a location, such as the Catalog in Azure Data Lake Analytics, Azure Data Warehouse, Azure Analysis Services, for analysis purposes.

image

I really like this approach as it makes sense for how data are handled in the data lake. It’s something that I will be advocating and using, and I hope you do too!



Geek in a Lift

This is probably my first proper non-technical post and I’m posting it because life at Adatis isn’t always glamourous and filled with exciting tech. Sometimes you get stuck in a lift in Coventry. That’s what happened to me.

The Context

There’s not an awful lot of context to this…. I’ve been up in Coventry all week on a client’s site, and staying at one of the Premier Inn’s in the city. For most of the week, it has been a pleasant stay, getting to know Coventry better (the city has some incredible restaurants for very little money) but otherwise uneventful. That is, until last night…

The Elephant In The Room

Well, said elephant is probably me as I caused a lift, which can hold up to 8 people, in a Premier Inn to breakdown. My first thought was “FFS”. After getting hold of the lift company, my first action was to tell my girlfriend (no sympathy there), so my second port of call were my work colleagues. This is where the fun begins…

“I’m stuck in the lift” – and the reply is not sympathy but elevator related puns, or encouragement to crack out some code, or general enjoyment at my plight.

WhatsApp Image 2017-10-05 at 18.33.59

Not impressed

After half an hour in the lift, hunger set in. And so the pictures of food flooded in to make the hunger worse. Elevator puns were still coming in from the late joiners.

Despair set in.

WhatsApp Image 2017-10-05 at 18.36.55

With the encouragement of the guys at Adatis, I was tweeting my ordeal using the #GeekInALift hashtag on both Twitter and Instagram.

Over an hour and a half after I got in the lift, I was rescued by the engineer.

WhatsApp Image 2017-10-05 at 19.32.26

FREEDOM

Lessons Learnt

The obvious lesson that I learnt was that I shouldn’t be so reliant upon machinery and technology. Sometimes they break.

The most rewarding lesson was that I can always count on everyone at Adatis to lift my spirits and keep me sane when the current situation is bleak. It’s something that I have taken for granted in the past, but no longer. It genuinely makes me proud to be an Adati.





Automating The Deployment of Azure Data Factory Custom Activities

Custom Activities in Azure Data Factory (ADF) are a great way to extend the capabilities of ADF by utilising C# functionality. Custom Activities are useful if you need to move data to/from a data store that ADF does not support, or to transform/process data in a way that isn't supported by Data Factory, as it can be used within an ADF pipeline.

Deploying Custom Activities to ADF is a manual process, which requires many steps. Microsoft’s documentation lists them as:

  • Compile the project. Click Build from the menu and click Build Solution.
  • Launch Windows Explorer, and navigate to bin\debug or bin\release folder depending on the type of build.
  • Create a zip file MyDotNetActivity.zip that contains all the binaries in the \bin\Debug folder. Include the MyDotNetActivity.pdb file so that you get additional details such as line number in the source code that caused the issue if there was a failure.
  • Create a blob container named customactivitycontainer if it does not already exist
  • Upload MyDotNetActivity.zip as a blob to the customactivitycontainer in a general purpose Azure blob storage that is referred to by AzureStorageLinkedService.

The number of steps means that it can take some time to deploy Custom Activities and, because it is a manual process, can contain errors such as missing files or uploading to the wrong storage account.

To avoid that errors and delays caused by a manual deployment, we want to automate as much as possible. Thanks to PowerShell, it’s possible to automate the entire deployment steps.

The script to do this is as follows:

Login-AzureRmAccount

# Parameters
$SourceCodePath = "C:\PathToCustomActivitiesProject\"
$ProjectFile ="CustomActivities.csproj"
$Configuration = "Debug"
#Azure parameters
$StorageAccountName = "storageaccountname"
$ResourceGroupName = "resourcegroupname"
$ContainerName = "blobcontainername"


# Local Variables
$MsBuild = "C:\Program Files (x86)\MSBuild\14.0\Bin\MSBuild.exe";           
$SlnFilePath = $SourceCodePath + $ProjectFile;           
                            
# Prepare the Args for the actual build           
$BuildArgs = @{           
     FilePath = $MsBuild           
     ArgumentList = $SlnFilePath, "/t:rebuild", ("/p:Configuration=" + $Configuration), "/v:minimal"           
     Wait = $true           
     }         

# Start the build           
Start-Process @BuildArgs
# initiate a sleep to avoid zipping up a half built project
Sleep 5

# create zip file

$zipfilename = ($ProjectFile -replace ".csproj", "") + ".zip"

$source = $SourceCodePath + "bin\" + $Configuration
$destination = $SourceCodePath + $zipfilename
if(Test-path $destination) {Remove-item $destination}
Add-Type -assembly "system.io.compression.filesystem"
[io.compression.zipfile]::CreateFromDirectory($Source, $destination)

#create storage account if not exists
$storageAccount = Get-AzureRmStorageAccount -ErrorAction Stop | where-object {$_.StorageAccountName -eq $StorageAccountName}      
if  ( !$storageAccount ) {
     $StorageLocation = (Get-AzureRmResourceGroup -ResourceGroupName $ResourceGroupName).Location
     $StorageType = "Standard_LRS"
     New-AzureRmStorageAccount -ResourceGroupName $ResourceGroupName  -Name $StorageAccountName -Location $StorageLocation -Type $StorageType
}

#create container if not exists
$ContainerObject = Get-AzureStorageContainer -ErrorAction Stop | where-object {$_.Name -eq $ContainerName}
if (!$ContainerObject){
$storagekey = Get-AzureRmStorageAccountKey -ResourceGroupName $ResourceGroupName -Name $StorageAccountName
$context = New-AzureStorageContext -StorageAccountName $StorageAccountName -StorageAccountKey $storagekey.Key1 -Protocol Http
New-AzureStorageContainer -Name $ContainerName -Permission Blob -Context $context
}

# upload to blob
#set default context
Set-AzureRmCurrentStorageAccount -StorageAccountName $StorageAccountName -ResourceGroupName  $ResourceGroupName
Get-AzureRmStorageAccount -ResourceGroupName $ResourceGroupName -Name $StorageAccountName
# Upload file
Set-AzureStorageBlobContent –Container $ContainerName -File $destination


By removing the manual steps in building, zipping and deploying ADF Custom Activities, you remove the risk of something going wrong and you add the reassurance that you have a consistent method of deployment which will hopefully speed up your overall development and deployments.

As always, if you have any questions or comments, do let me know.

Instant Bot: deploying a Bot in minutes with Azure Bot Service

I had been playing around with the Bot Framework for a while but hadn’t really got anywhere, largely due to having enough time to create something worthwhile, when I came across the Azure Bot Service whilst I was trawling through the documentation of the Bot Framework. The Azure Bot Service is currently in preview and allowed me to quickly author and deploy a basic bot for the purpose of this post and walkthrough.

Creating the Bot

Like most services in Azure, creating the bot is easy and requires the following input

image

Configuring the Bot

Setting up the Bot is a bit more involved. Once the bot has been created, you’ll be presented with the following screen

image

The App ID and Password are auto-generated by Microsoft, but you will need to make note of the password and store it securely as it is only displayed once in the app registration process.

Next you want to select the language in which the bot is developed and deployed. You have the choice of C# or NodeJS. I opted for C# as it’s a language I am most familiar with.

image

Choose your template, accept the T’s & C’s and your bot is ready to be deployed!

Deploying the Bot

The Bot has been created and configured, displaying its source code which can be further tweaked in the browser or Visual Studio. You can also embed your Bot in a number of existing apps, websites and services.

image


Chatting with the Bot

The basic bot isn’t the most stimulating of conversational partners but it is satisfying to see your creation talk back, even if it repeats what you have just told it.

image

The Bot Framework opens up many possibilities to make the services you offer engaging in a conversational way. The Azure Bot Service makes the Bot Framework that much more accessible to quickly deploy bots and have them out there, engaging with users.


Data Flow Pt 2: Vertexes In Azure Data Lake

Following on from my previous post on Job Execution in the Azure Data Lake, this post will explore the concept of Extents and how they are utilised by Vertexes in the actual Job processing in Azure.

The U-SQL script that has been authored, compiled and deployed is the logical plan of how the author intends to transform input data into output data. This creates a total amount of work – essentially the amount of data it has to process – which is decomposed into a set of vertexes. Each vertex will process a subset of data, or extents (see Azure Data Lake Storage for more information) and represent a fraction of the total.

Vertexes are displayed, or grouped, in a super vertex, also known as a stage. Vertexes in each stage are doing the same operation on a different part of the same data. The number of vertexes in a stage indicates the maximum theoretical parallelisation of that stage. The containers requested for the job will be allocated to complete each vertex. Say there is a 10GB file. This file will be split into 40 Extents and allocated to at least 10 Vertexes. If one wants to process all of the file in parallel then requesting 10 containers will allow for concurrent parallelism.

All this is visualised as the job graph.

image

If you have multiple transformations going on in your USQL script this will create multiple stages, and the output of one vertex becomes the input of another vertex in a dependent stage. As a result, dependent stages can begin processing even if preceding stages haven’t completed processing.

image

If dependency does exist, it will create a critical path – which is the dependency chain of vertexes which keep the job running to the very end because the vertex on the bottom depends on the output of the vertex on the top. This can be seen in the Vertex Execution View in the Visual Studio Azure Data Lake Tools view. It’s useful for optimising job, by re-positioning or re-writing elements of your script, by checking which vertex takes the longest.


It may not be possible to use all the reserved parallelism during a stage if there are fewer vertexes than Azure Data Lake Analytics Units (ADLAUs) available.

image

For example, if I have 10 ADLAUs – it’s great for an early stage as all ADLAUs will be allocated a vertex. But with later stages, more and more ADLAUs will be idle. So by the last stage only 2 of the 14 are utilised. Unfortunately it is not currently possible to dynamically de-allocate ADLAUs during a job.


In conclusion, understanding how vertexes interact with extents and can influence parallelism, you should be able to provision adequate resources for your jobs as well as know where to start the investigation for any long running queries you might have with the Vertex Execution View. In the next post of the series I will be looking into resource costs and optimising spend and time taken to process data.