ZachStagers

Zach Stagers

Deploying Multiple U-SQL Procedures with PowerShell


If you’d like to deploy multiple U-SQL procedures without having to open each one in Visual Studio and submit the job to Data Lake Analytics manually, here’s a PowerShell script which you can point at a folder location containing your .USQL files to loop through them and submit them for you.

This method relies on the Login-AzureRmAccount command with a service principal, which you can learn more about here.

The Script

$azureAccountName = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
$azurePassword = ConvertTo-SecureString "xxxxxxxxxxxxxxxxxxxx/xxxxxxxxxxxxxxx/xxxxxx=" -AsPlainText -Force
$psCred = New-Object System.Management.Automation.PSCredential($azureAccountName, $azurePassword)
$psTenantID = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
$adla = "zachstagersadla"
$fileLocation = "C:\SourceControl\USQL.Project\*.usql"

Login-AzureRmAccount -ServicePrincipal -TenantID $psTenantID -Credential $psCred | Out-Null

ForEach ($file in Get-ChildItem -Path $fileLocation)
	{
		$scriptContents = [IO.File]::ReadAllText($file.FullName)
		Submit-AzureRmDataLakeAnalyticsJob `
			-AccountName $adla `
			-Name $file.Name `
			-Script $scriptContents | Out-Null;
		
		Write-Host "`n" $file.Name "submitted."
	} 

Parameter Configuration

Where these various GUID’s can be found within the Azure portal is liable to change, but at time of writing I have provided a path to follow to find each of them.

$azureAccountName – This is the Application Id of your Enterprise Application, which can be found by navigating to Azure Active Directory > Enterprise Applications > All Applications > Selecting your application > Properties > Application Id.

$azurePassword – This is the secret key of your Enterprise Application, and would have been generated during application registration. If you’ve created your application, but have not generated a secret key, you can do so by navigating to: Enterprise Applications > New Application > Application You’re Developing > OK, take me to App Registration > Change the drop down from ‘My Apps’ to ‘All Apps’ (may not be required) > Select your application > Settings > Keys > Fill in the details and click save. Note the important message about the key only being available to copy until before navigating away from the page!

$psTenantID – From within the Azure Portal, go to Azure Active Directory, open the Properties blade, and copy the ‘Directory ID’.

$adla – The name of the data lake analytics resource you’re deploying to.

$fileLocation – The file location on your local machine which contains the USQL scripts you wish to deploy. Note the “\*.usql” on the end in the example, this is a wildcard search for all files ending in .usql.

Permissions

The service principal needs to be given the following permissions to successfully execute against the lake:

· Owner of the Data Lake Analytics resource you’re deploying to. This is configured via the Access Control blade.

· Owner of the Data Lake Store resource associated to the Analytics resource you’re deploying to. This is configured via the Access Control blade.

· Read, Write, and Execute permissions against the sub-folders within the Data Lake Store. Ensure you select ‘This folder and all children’ and ‘An access permission entry and a default permission entry’. This is configured by entering the Data Lake Store, selecting Data Explorer, then Access.

Bulk updating SSIS package protection level with DTUTIL

I was recently working with an SSIS project with well over 100 packages. It had been developed using the ‘Encrypt Sensitive with User Key’ package and project protection level, project deployment model, and utilizing components from the Azure Feature Pack.

The project was to be deployed to a remote server within another domain using a different user to the one which developed the packages, which caused an error with the ‘Encrypt Sensitive with User Key’ setting upon deployment.

As a side note, regarding package protection level and the Azure Feature Pack, ‘Don’t Save Sensitive’ cannot be used as you’ll receive an error stating: “Error when copying file from Azure Data Lake Store to local drive using SSIS. [Azure Data Lake Store File System Task] Error: There is an internal error to refresh token. Value cannot be null. Parameter name: input”. There seems to be an issue with the components not accepting values from an environment variable if ‘Don’t Save Sensitive’ is used.

With project deployment model, the project and all packages within it need to have the same protection level, but there’s no easy way from within visual studio to apply the update to all packages, and I didn’t want to have to open 100 packages individually to change the setting manually! SQL and DTUTIL to the rescue…

Step One

Check the entire project and all packages out of source control, making sure you have the latest version. Close visual studio.

Step Two

Change the protection level at the project level by right clicking the project and selecting properties. A dialogue box should open, on the project tab you should see a Security section, under which you can change the protection level.

Step Three

Using SQL against the SSISDB (assuming the project is deployed locally), you can easily produce a list of all the packages you need to update within a DTUTIL string to change the encryption setting:

USE SSISDB

DECLARE @FolderPath VARCHAR(1000) = 'C:\SourceControl\FolderX'
DECLARE @DtutilString VARCHAR(1000) = 
	'dtutil.exe /file "'+ @FolderPath +'\XXX" /encrypt file;"'+ @FolderPath +'\XXX";2;Password1! /q'

SELECT DISTINCT
       REPLACE(@DtutilString, 'XXX', [name])
FROM internal.packages
WHERE project_id = 2

This query will produce you a string like the below for each package in your project:

dtutil.exe /file "C:\SourceControl\FolderX\Package1.dtsx" /encrypt file;"C:\SourceControl\FolderX\Package1.dtsx";2;Password1! /q

This executes DTUTIL for the file specified, encrypting the package using ‘Encrypt with password’ (2) and the password Password1! in this case. The /q tells the command to execute “quietly” – meaning you won’t be prompted with an “Are you sure?” message each time. More information about DTUTIL can be found here.

If the project isn’t deployed, the same can be achieved through PowerShell against the folder the packages live in.

Step Four

Copy the list of packages generated by Step Three, open notepad, and paste. Save the file as a batch file (.bat).

Run the batch file as an administrator (right click the file, select run as admin).

A command prompt window should open, and you should see it streaming through you packages with a success message.

Step Five

The encryption setting for each package is also stored in the project file, and needs to be changed here too.

Find the project file for the project you’re working with (.dtproj), right click it and select Open with, then notepad or your preferred text editor.

Within the SSIS:PackageMetaData node for each package, there’s a nested <SSIS:Property SSIS:Name=”ProtectionLevel”> element containing the integer value for its corresponding protection level.

Run a find replace for this full string, swapping only the integer value to the desired protection level. In this example we’re going from ‘Don’t Save Sensitive’ (0) to ‘Encrypt with password’ (2):

Capture

Save and close the project file.

Step Six

Re-open visual studio and your project, spot check a few packages to ensure the correct protection level is now selected. Build your project, and check back in to source control once satisfied.

SSIS Azure Data Lake Store Destination Mapping Problem

The Problem

My current project involves Azure’s Data Lake Store and Analytics. We’re using the SSIS Azure Feature Pack’s Azure Data Lake Store Destination to move data from our clients on premise system into the Lake, then using U-SQL to generate a delta file which goes on to be loaded into the warehouse. U-SQL is a “schema-on-read” language, which means you need a consistent and predictable format to be able to define the schema as you pull data out.

We ran in to an issue with this schema-on-read approach, but once you understand the issue, it’s simple to rectify. The Data Lake Store Destination task does not use the same column ordering which is shown in the destination mapping. Instead, it appears to rely on an underlying column identifier. This means that if you apply any conversions to a column in the data flow, this column will automatically be placed at the end of file– taking away the predictability of the file format, and potentially making your schema inconsistent if you have historic data in the Lake.

An Example

Create a simple package which pulls data from a flat file and moves it into the Lake.

clip_image002


Mappings of the Destination are as follows:

clip_image004


Running the package, and viewing the file in the Lake gives us the following (as we’d expect, based on the mappings):

clip_image006


Now add a conversion task – the one in my package just converts Col2 to a DT_I4, update the mappings in the destination, and run the package.

clip_image008


Open the file up in the Lake again, and you’ll find that Col2 is now at the end and contains the name of the input column, not the destination column:

clip_image010

The Fix

As mention in my “The Problem” section, the fix is extremely simple – just handle it in your U-SQL by re-ordering the columns appropriately during extraction! This article is more about giving a heads up and highlighting the problem, than a mind-blowing solution.