Zach Stagers

Naming downloaded files with Azure Data Lake Store File System Task

I was recently working on a hybrid project where we download files from a lake and transform the data with SSIS. I was stunned to find that there’s no native ability to name the file you download from the lake! Even more frustrating, the downloaded file was inconsistently named as Data.<GUID>, rendering the SSIS File System Task useless in this case also. PowerShell to the rescue…

Using an execute process task to call the following PowerShell script, we were able to overcome this challenge.

param([string] $NewFileName, [string] $LocalFolder, [string] $FileNameFilter)

$file = Get-ChildItem -Path $LocalFolder -Filter $FileNameFilter | ? { $_.LastWriteTime -gt (Get-Date).AddSeconds(-15) } | select -Last 1

Move-Item -Path $file.FullName -Destination $LocalFolder"\"$NewFileName -Force

The script accepts three parameters:

  • $NewFileName – What you want to name the file to, including the file extension.
  • $LocalFolder – The local folder in which the file resides.
  • $FileNameFilter – A mask to apply for searching for the downloaded file. In this case, we used Data.* where * is a wildcard for the GUID

Get-ChildItem is used to obtain the details of the latest file written to our $LocalFolder within the last 15 seconds. This just adds an element of security, minimizing risk of the script being used outside of the SSIS process and renaming files it shouldn’t.

Move-Item is used instead of Rename-Item, as in our case we wanted to overwrite the file if it already existed.

If you have multiple packages using this script, which are called in parallel by a master package, I would highly recommend adding a completion constraint between all of the Execute Package Tasks to ensure no file accidentally renamed inappropriately by another package running at the same time. If removing parallelism isn’t an option for performance reasons, you could set up a different local folder per package.

Bulk updating SSIS package protection level with DTUTIL

I was recently working with an SSIS project with well over 100 packages. It had been developed using the ‘Encrypt Sensitive with User Key’ package and project protection level, project deployment model, and utilizing components from the Azure Feature Pack.

The project was to be deployed to a remote server within another domain using a different user to the one which developed the packages, which caused an error with the ‘Encrypt Sensitive with User Key’ setting upon deployment.

As a side note, regarding package protection level and the Azure Feature Pack, ‘Don’t Save Sensitive’ cannot be used as you’ll receive an error stating: “Error when copying file from Azure Data Lake Store to local drive using SSIS. [Azure Data Lake Store File System Task] Error: There is an internal error to refresh token. Value cannot be null. Parameter name: input”. There seems to be an issue with the components not accepting values from an environment variable if ‘Don’t Save Sensitive’ is used.

With project deployment model, the project and all packages within it need to have the same protection level, but there’s no easy way from within visual studio to apply the update to all packages, and I didn’t want to have to open 100 packages individually to change the setting manually! SQL and DTUTIL to the rescue…

Step One

Check the entire project and all packages out of source control, making sure you have the latest version. Close visual studio.

Step Two

Change the protection level at the project level by right clicking the project and selecting properties. A dialogue box should open, on the project tab you should see a Security section, under which you can change the protection level.

Step Three

Using SQL against the SSISDB (assuming the project is deployed locally), you can easily produce a list of all the packages you need to update within a DTUTIL string to change the encryption setting:


DECLARE @FolderPath VARCHAR(1000) = 'C:\SourceControl\FolderX'
DECLARE @DtutilString VARCHAR(1000) = 
	'dtutil.exe /file "'+ @FolderPath +'\XXX" /encrypt file;"'+ @FolderPath +'\XXX";2;Password1! /q'

       REPLACE(@DtutilString, 'XXX', [name])
FROM internal.packages
WHERE project_id = 2

This query will produce you a string like the below for each package in your project:

dtutil.exe /file "C:\SourceControl\FolderX\Package1.dtsx" /encrypt file;"C:\SourceControl\FolderX\Package1.dtsx";2;Password1! /q

This executes DTUTIL for the file specified, encrypting the package using ‘Encrypt with password’ (2) and the password Password1! in this case. The /q tells the command to execute “quietly” – meaning you won’t be prompted with an “Are you sure?” message each time. More information about DTUTIL can be found here.

If the project isn’t deployed, the same can be achieved through PowerShell against the folder the packages live in.

Step Four

Copy the list of packages generated by Step Three, open notepad, and paste. Save the file as a batch file (.bat).

Run the batch file as an administrator (right click the file, select run as admin).

A command prompt window should open, and you should see it streaming through you packages with a success message.

Step Five

The encryption setting for each package is also stored in the project file, and needs to be changed here too.

Find the project file for the project you’re working with (.dtproj), right click it and select Open with, then notepad or your preferred text editor.

Within the SSIS:PackageMetaData node for each package, there’s a nested <SSIS:Property SSIS:Name=”ProtectionLevel”> element containing the integer value for its corresponding protection level.

Run a find replace for this full string, swapping only the integer value to the desired protection level. In this example we’re going from ‘Don’t Save Sensitive’ (0) to ‘Encrypt with password’ (2):


Save and close the project file.

Step Six

Re-open visual studio and your project, spot check a few packages to ensure the correct protection level is now selected. Build your project, and check back in to source control once satisfied.

SSIS Azure Data Lake Store Destination Mapping Problem

The Problem

My current project involves Azure’s Data Lake Store and Analytics. We’re using the SSIS Azure Feature Pack’s Azure Data Lake Store Destination to move data from our clients on premise system into the Lake, then using U-SQL to generate a delta file which goes on to be loaded into the warehouse. U-SQL is a “schema-on-read” language, which means you need a consistent and predictable format to be able to define the schema as you pull data out.

We ran in to an issue with this schema-on-read approach, but once you understand the issue, it’s simple to rectify. The Data Lake Store Destination task does not use the same column ordering which is shown in the destination mapping. Instead, it appears to rely on an underlying column identifier. This means that if you apply any conversions to a column in the data flow, this column will automatically be placed at the end of file– taking away the predictability of the file format, and potentially making your schema inconsistent if you have historic data in the Lake.

An Example

Create a simple package which pulls data from a flat file and moves it into the Lake.


Mappings of the Destination are as follows:


Running the package, and viewing the file in the Lake gives us the following (as we’d expect, based on the mappings):


Now add a conversion task – the one in my package just converts Col2 to a DT_I4, update the mappings in the destination, and run the package.


Open the file up in the Lake again, and you’ll find that Col2 is now at the end and contains the name of the input column, not the destination column:


The Fix

As mention in my “The Problem” section, the fix is extremely simple – just handle it in your U-SQL by re-ordering the columns appropriately during extraction! This article is more about giving a heads up and highlighting the problem, than a mind-blowing solution.