Nigel Meakins

Nigel Meakins' Blog

Azure Data Factory Custom Activity Development–Part 2: Encapsulating Common Functionality

This is the second post in a series on Azure Data Factory Custom Activity Development.

The Theory

In accordance with the DRY principle, or “Don’t Repeat Yourself”, we should avoid writing the same piece of code twice. The reasons behind this are pretty obvious from a maintenance, testing and general productivity perspective, so I won’t try and hammer this point home. In C# we tend to apply this idea using class inheritance.

A Simple Custom Activity Class Hierarchy

For reference within our discussion, we have a class hierarchy defined as below.

image

You’ll notice that our ActivityBase itself derives from a parent class, which is the subject of another post. We could have simply derived from IDotNetActivity for the purposes of this exercise.

Basic Custom Activity Functionality Requirements

When developing Custom Activities within Azure Data Factory (ADF) there are a lot of common tasks we find ourselves needing to do. We can start by putting the most common requirements in our ActivityBase class.

ActivityBase

One set of functionality we will be using time and again is validating our Activity’s Extended Properties. This allows us to avoid bugs in our code that only get discovered when attempting to reference Extended Properties that we’ve neglected to add. In order to encapsulate this, we create an ActivityBase class from which we will be inheriting for additional functionality. In this we create a method to validate our set of Extended Properties.

		/// 
		/// Confirms the extended properties required for the Key Vault connection.
		/// 
		/// The extended properties.
		/// Name of the activity.
		/// 
		public void ValidateExtendedProperties(Activity activity, List<string> requiredProperties, IActivityLogger logger)
		{
			logger.Write("ValidateExtendedProperties");
			IDictionary<string, string> extendedProperties = ((DotNetActivity)activity.TypeProperties).ExtendedProperties;
			List<string> missingProps = new List<string>();
			//must contain the keyvault properties used to determine the key vault to use
			requiredProperties.ForEach(kvp =>
			{
				if (!extendedProperties.ContainsKey(kvp))
					missingProps.Add(kvp);
			});
			if (missingProps.Count > 0)
				throw new Exception(string.Format("The following required extended properties were not found on the activity {0}: {1}", activity.Name, string.Join(", ", missingProps)));
		}

It takes as input a list of strings contained the names of the properties we will require. The derived Activity class can then have a list of these properties defined, which we can then use to confirm that the Activity JSON defined within our pipeline does indeed contain the required Extended Properties. For example:

	public class HiveDataValidation : KeyVaultActivity
	{
		#region Private Fields

		//static strings are required in order to add to the requiredProps List.
		private static string ADLSACCOUNTNAMEPROPERTYNAME = Properties.Settings.Default.ADLSAccountNamePropertyName;

		private static string ADLSROOTDIRPATHPROPERTYNAME = Properties.Settings.Default.ADLSRootDirPathPropertyName;
		private static string DATAVALIDATIONRULESETPROPERTYNAME = Properties.Settings.Default.DataValidationRuleSetIdPropertyName;
		private static string DOCUMENTDBAUTHKEYPROPERTYNAME = Properties.Settings.Default.DocumentDbAuthKeyPropertyName;
		private static string DOCUMENTDBCOLLECTIONPROPERTYNAME = Properties.Settings.Default.DocumentDbCollectionPropertyName;
		private static string DOCUMENTDBPROPERTYNAME = Properties.Settings.Default.DocumentDbNamePropertyName;
		private static string DOCUMENTDBURIPROPERTYNAME = Properties.Settings.Default.DocumentDbUriPropertyName;
		private static string HDICLUSTERJOBOUTPUTSUBDIRPROPERTYNAME = Properties.Settings.Default.HDIClusterJobOutputSubDirPropertyName;
		private static string HDICLUSTERNAMEPROPERTYNAME = Properties.Settings.Default.HDIClusterNamePropertyName;
		private static string HDICLUSTERPASSWORDPROPERTYNAME = Properties.Settings.Default.HDIClusterPasswordPropertyName;
		private static string HDICLUSTERUSERNAMEPROPERTYNAME = Properties.Settings.Default.HDIClusterUserNamePropertyName;
		//Extended Properties required for the activity
		private List<string> requiredProps = new List<string>() { ADLSACCOUNTNAMEPROPERTYNAME, ADLSROOTDIRPATHPROPERTYNAME, DOCUMENTDBURIPROPERTYNAME, DOCUMENTDBCOLLECTIONPROPERTYNAME, DOCUMENTDBAUTHKEYPROPERTYNAME, DATAVALIDATIONRULESETPROPERTYNAME, DOCUMENTDBPROPERTYNAME, HDICLUSTERNAMEPROPERTYNAME, HDICLUSTERUSERNAMEPROPERTYNAME, HDICLUSTERPASSWORDPROPERTYNAME, HDICLUSTERJOBOUTPUTSUBDIRPROPERTYNAME };

Within the Execute method for the Custom Activity we can then call the above method,

		public override IDictionary<string, string> Execute(ActivityContext context, IActivityLogger logger)
		{
			IDictionary<string, string> extendedProperties = UpdateExtendedPropertySecrets(activity);		

We can add to ActivityBase class as required in order to extend our common functionality.

Activity Authentication

To avoid writing a lot of boiler plate code for authenticating our activity against various services within Azure, we can derive them from an class that contains this logic for us, which we’ll call AuthenticateActivity.

AuthenticateActivity

The base class calls a simple library that does the actual authentication process for us. We often need the credentials for the executing ADF Application in order to carry out some privileged action that requires an object of type Microsoft.Rest.ServiceClientCredential (or derived from it). A simple method within our AuthenticateActivity makes this available at a snip.

		/// 
		/// Gets the service credentials.
		/// 
		/// The activity.
		/// 
		public ServiceClientCredentials GetServiceClientCredentials(Activity activity)
		{
			ClientCredentialProvider ccp = new ClientCredentialProvider(adfApplicationId, adfApplicationSecret);
			return ccp.GetServiceClientCredentials(domain);
		}

We can then use this functionality within any activities that will derive from our AuthenticateActivity class whenever they need the ServiceClientCredential.

Retrieving Key Vault Secrets

The Azure Key Vault can be used for storing secret values, such as passwords and other keys that are required to be secure. Using an idea from the ADFSecurePublish project, we can embed placeholders within our custom activity extended properties for values that we would like to populate from Key Vault secrets. For example, if we need to reference a Key Vault secret called “docDBPrimaryKey”, we could add the following extended property with a placeholder.

"documentDbAuthKey": "<KeyVault:docDBPrimaryKey>"

We can then replace this within the code for our activity with the respective secret value at runtime, thereby avoiding any secret values being included in our ADF pipeline code base. The ADFSecurePublish project includes code to do this in the form of a KeyVaultResolver class, which allows authentication against the Key Vault using various means, and then the fetching of the Key Vault secret string value for the identifier required. Again this is a very common scenario, so we create another derived class KeyVaultActivity, this time with AuthenticateActivty as the base class, so as to be make the parent code available.

KeyVaultActivity

The code within our KeyVaultActivity is relatively straight forward, with a simple iteration over our activity’s extended properties, replacing the key vault placeholder values where required.

/// 
		/// Updates the property secret placeholders within the extended properties of the activity with their values from Key Vault.
		/// 
		/// The activity.
		/// The key vault resolver.
		/// 
		protected IDictionary<string, string> UpdateExtendedPropertySecrets(Activity activity)
		{
			IDictionary<string, string> extendedProperties = ((DotNetActivity)activity.TypeProperties).ExtendedProperties;
			//copy the dictionary to a list so that can iterate over the list and modify the dictionary at the same time (iterating and modifying the dictionary raises exception)
			List<KeyValuePair<string, string>> extPropList = new List<KeyValuePair<string, string>>(extendedProperties);

			foreach (KeyValuePair<string, string> item in extPropList)
			{
				//update the dictionary for the corresponding list key value.
				extendedProperties[item.Key] = ReplacePlaceholderWithSecret(item.Value);
			}
			return extendedProperties;
		}
		
		/// 
		/// Replaces the KeyValue placeholder in the target string with the respective secret value from KeyVaut.
		/// 
		/// The target string containing the KeyVault placeholder.
		/// 
		protected string ReplacePlaceholderWithSecret(string target)
		{
			return keyVaultResolver.ReplacePlaceholderWithSecret(target, KEYVAULTPLACEHOLDERREGEX);
		}

We can now derive our custom activity from this KeyVaultActivity class and use the encapsulated functionality as desired. So in the example of a HiveDataValidation activity, we simply use.

public class HiveDataValidation : KeyVaultActivity

The amended dictionary of extended property values can then be easily referenced within our activity.

IDictionary<string, string> extendedProperties = UpdateExtendedPropertySecrets(activity);

string documentDbAuthKey = extendedProperties["documentDbAuthKey"];

In Summary

As you can see this simple inheritance exercise makes developing ADF Custom Activities a whole lot easier. We can soon build up a library of classes based on these to assist with extending Azure Data Factory through Custom Activities.

Coming Soon…

Up shortly, the next instalment in the series, Part 3: Debugging Custom Activities.