As I’m sure you’ve heard, CTP3 of SQL Server Denali was released yesterday, and can be downloaded here. Denali includes Data Quality Services (DQS), Microsoft’s new data cleansing and matching component that’s based on the Zoomix acquisition that occurred a couple of years back. Data Quality Services didn’t make it into the first CTP, but is now available, so I though it would be worth a quick blog post.
Installing Data Quality Services
Data Quality Services is an option in the main install, but when you go to run the Data Quality Client, you’ll get a message stating that DQS is not installed. As far as I can tell, DQS needs to be installed manually, by running the DQSInstaller.exe, which you can find in the SQL Server Binn directory. This will create two SQL Server databases:
Data Quality Client
Once DQS is configured, you’ll be in a position to use the Data Quality Client, which is a windows application, available in 32 or 64 bit. Once you connect, you’ll get the following screen:
The knowledge base is the key to how DQS works, being utilised to determine the data quality of your source data. You can create your own knowledge base, using your own data, or even cloud-based data. For this example, I’m going to use the built in knowledge base called DQS Data.
As an example, I’ve created some data that I want to cleanse. It’s adventure works country data that I’ve put into a separate table that I’ve called dbo.Country.
My task now is is to clean this data, which I can do by creating a data quality project:
I’ve called my project Countries, and I’ve picked the built-in DQS Data knowledge base, which I noticed contains reference data for countries. The activity that I’ve selected is Cleansing, and then I’ve clicked Create.
DQS will then prompt for mapping the source data to one of the domains in the Knowledge Base. Here I’ve mapping my country name to the Country domain from the knowledge base:
After clicking the Start button on the next screen, the cleaning process starts, which gives the following results:
I put a couple of typos into my source data, such as changing Algeria to ‘Algerian’ and Albania to ‘Albana’. These were picked up by DQS, along with a few others, but a user has the opportunity to approve or reject, via the radio buttons shown above. I chose to approve the first two, and then clicked next. In the final screen, DQS allows you to output the cleaned data to a separate table, as shown below:
If you go into the table that DQS has created, you will see that there is a Name_Status column, which holds the cleaning status on a per record basis:
This can be used to update the source data, if required, and therefore address the data quality issues.
This has been a quick introduction to cleaning data with Data Quality Services. There’s plenty more to look at, and I hope to find time to do so at some point - In particular I’m keen to take a look at how DQS works with Master Data Services. I’ve noticed that there’s also a new MS DQS blog (http://blogs.msdn.com/b/dqs/) – I’m sure that will be a great resource for DQS learning.
One question that I get asked from time to time is how to add a calculated field to an Excel Pivot Table that’s connected to an Analysis Services cube. I’ve been aware of a workaround to do this for a while, but a quick Bing :-) revealed that the common answer given to this question is that it’s not possible.
It definitely is possible, it’s just not that obvious and is perhaps a little limited. Here’s how it can be done in Excel 2010:
First of all, connect to a cube and produce a pivot table report. I’ve connected to the 2008 R2 version of the Adventure Works DW cube and have produced the following report, using the Internet Sales Amount measure:
Adding a Calculated Field
If you now open the Field List, you will see that Internet Sales Amount is selected of course. Here’s the trick – you can select the measure twice by dragging it into the Values area, which will give the following:
Now click the drop down next to Internet Sales Amount2, which will display the following window:
You can change the name of the field – I’ve called mine Percent Share. In the Show Values as drop-down, I’ve picked % of Grand Total, so that I can see the share of the sales in each country across all years. So now the report is as follows:
A better approach is to put as many calculations in the cube as possible, but as some calculations requested by users are report-specific, this feature of Pivot Tables allows users to produce the reports that they want.
The calculations that you can carry out are only those in the ‘Show values as’ drop-down. There’s quite a few in there, especially in Excel 2010, as mentioned here. If that’s not enough, then an alternative is to use the OLAP Pivot Table Extensions, which is available to download on Codeplex.