Jeremy Kashel

Jeremy Kashel's Blog

Data Quality Services Leading Values

The Leading Values checkbox is one of the options available when you setup a domain within DQS Domain Management. I’ve run a couple of courses and demos recently on DQS and one of the questions that cropped up was what does the Leading Values checkbox do? This blog post will explain the Leading Values option by showing a few examples.

Domain Management

In this case we’re going to setup a domain called Country, which will be fairly simple for the purposes of this blog post. All of the defaults have been left, aside from the Use Leading Values, which is unchecked.

image 

Knowledge Discovery

The next step is to add some knowledge to the Knowledge Base (KB), which we’ll do via knowledge discovery. To assist in this process I have a simple Excel file that contains a list of countries – this needs to be fed into the KB:

image

The first step within knowledge discovery is to pick a data source. In this case the data source is Excel and the Source Column of Name has been mapped to the domain of Country:

image

Once we get to the part of managing the Domain Values, then here is where we can have a look at the data in order to spot valid and invalid values. Towards the bottom of the list, we can see that there are values for both USA and for United States of America:

image

We can set these as synonyms, as shown above. If we go back into Domain Management afterwards, then we can see that USA is a synonym of United States of America:

image

Data Quality Projects

If we move on and create a data quality project to clean some data, then we’ll see these values in action. The new source file has some new countries, but also has both USA and United States of America. If we run the new source file through the Data Quality Project then we’ll see the following output:

image

As we’ve turned off “Use Leading Values”, USA has been deemed a correct “Domain Value” as its a synonym of United States of America.

To contrast this – lets take a look at what would have happened if we left the default setting – which is “Use Leading Values” checked. Therefore with the exact same setup, but with “Use Leading Values” checked, then we will get the following output:

image

This time USA is now on the Corrected tab. Although its value is “correct” as far as the domain is concerned, because we’ve specified to Use Leading Values, then DQS has altered USA to its leading value. The reason for the correction is very clear – “Corrected to leading value”.

I’ve personally not had a need to alter the setting, as I’ve always wanted the leading value to be the primary output. But its good that DQS does give an element of flexibility in its setup to suit different requirements.

Loading