Zach Stagers

Naming downloaded files with Azure Data Lake Store File System Task

I was recently working on a hybrid project where we download files from a lake and transform the data with SSIS. I was stunned to find that there’s no native ability to name the file you download from the lake! Even more frustrating, the downloaded file was inconsistently named as Data.<GUID>, rendering the SSIS File System Task useless in this case also. PowerShell to the rescue…

Using an execute process task to call the following PowerShell script, we were able to overcome this challenge.

param([string] $NewFileName, [string] $LocalFolder, [string] $FileNameFilter)

$file = Get-ChildItem -Path $LocalFolder -Filter $FileNameFilter | ? { $_.LastWriteTime -gt (Get-Date).AddSeconds(-15) } | select -Last 1

Move-Item -Path $file.FullName -Destination $LocalFolder"\"$NewFileName -Force

The script accepts three parameters:

  • $NewFileName – What you want to name the file to, including the file extension.
  • $LocalFolder – The local folder in which the file resides.
  • $FileNameFilter – A mask to apply for searching for the downloaded file. In this case, we used Data.* where * is a wildcard for the GUID

Get-ChildItem is used to obtain the details of the latest file written to our $LocalFolder within the last 15 seconds. This just adds an element of security, minimizing risk of the script being used outside of the SSIS process and renaming files it shouldn’t.

Move-Item is used instead of Rename-Item, as in our case we wanted to overwrite the file if it already existed.

If you have multiple packages using this script, which are called in parallel by a master package, I would highly recommend adding a completion constraint between all of the Execute Package Tasks to ensure no file accidentally renamed inappropriately by another package running at the same time. If removing parallelism isn’t an option for performance reasons, you could set up a different local folder per package.