As I’m sure you’ve heard, CTP3 of SQL Server Denali was released yesterday, and can be downloaded here. Denali includes Data Quality Services (DQS), Microsoft’s new data cleansing and matching component that’s based on the Zoomix acquisition that occurred a couple of years back. Data Quality Services didn’t make it into the first CTP, but is now available, so I though it would be worth a quick blog post.
Installing Data Quality Services
Data Quality Services is an option in the main install, but when you go to run the Data Quality Client, you’ll get a message stating that DQS is not installed. As far as I can tell, DQS needs to be installed manually, by running the DQSInstaller.exe, which you can find in the SQL Server Binn directory. This will create two SQL Server databases:
Data Quality Client
Once DQS is configured, you’ll be in a position to use the Data Quality Client, which is a windows application, available in 32 or 64 bit. Once you connect, you’ll get the following screen:
The knowledge base is the key to how DQS works, being utilised to determine the data quality of your source data. You can create your own knowledge base, using your own data, or even cloud-based data. For this example, I’m going to use the built in knowledge base called DQS Data.
As an example, I’ve created some data that I want to cleanse. It’s adventure works country data that I’ve put into a separate table that I’ve called dbo.Country.
My task now is is to clean this data, which I can do by creating a data quality project:
I’ve called my project Countries, and I’ve picked the built-in DQS Data knowledge base, which I noticed contains reference data for countries. The activity that I’ve selected is Cleansing, and then I’ve clicked Create.
DQS will then prompt for mapping the source data to one of the domains in the Knowledge Base. Here I’ve mapping my country name to the Country domain from the knowledge base:
After clicking the Start button on the next screen, the cleaning process starts, which gives the following results:
I put a couple of typos into my source data, such as changing Algeria to ‘Algerian’ and Albania to ‘Albana’. These were picked up by DQS, along with a few others, but a user has the opportunity to approve or reject, via the radio buttons shown above. I chose to approve the first two, and then clicked next. In the final screen, DQS allows you to output the cleaned data to a separate table, as shown below:
If you go into the table that DQS has created, you will see that there is a Name_Status column, which holds the cleaning status on a per record basis:
This can be used to update the source data, if required, and therefore address the data quality issues.
This has been a quick introduction to cleaning data with Data Quality Services. There’s plenty more to look at, and I hope to find time to do so at some point - In particular I’m keen to take a look at how DQS works with Master Data Services. I’ve noticed that there’s also a new MS DQS blog (http://blogs.msdn.com/b/dqs/) – I’m sure that will be a great resource for DQS learning.
One question that I get asked from time to time is how to add a calculated field to an Excel Pivot Table that’s connected to an Analysis Services cube. I’ve been aware of a workaround to do this for a while, but a quick Bing :-) revealed that the common answer given to this question is that it’s not possible.
It definitely is possible, it’s just not that obvious and is perhaps a little limited. Here’s how it can be done in Excel 2010:
First of all, connect to a cube and produce a pivot table report. I’ve connected to the 2008 R2 version of the Adventure Works DW cube and have produced the following report, using the Internet Sales Amount measure:
Adding a Calculated Field
If you now open the Field List, you will see that Internet Sales Amount is selected of course. Here’s the trick – you can select the measure twice by dragging it into the Values area, which will give the following:
Now click the drop down next to Internet Sales Amount2, which will display the following window:
You can change the name of the field – I’ve called mine Percent Share. In the Show Values as drop-down, I’ve picked % of Grand Total, so that I can see the share of the sales in each country across all years. So now the report is as follows:
A better approach is to put as many calculations in the cube as possible, but as some calculations requested by users are report-specific, this feature of Pivot Tables allows users to produce the reports that they want.
The calculations that you can carry out are only those in the ‘Show values as’ drop-down. There’s quite a few in there, especially in Excel 2010, as mentioned here. If that’s not enough, then an alternative is to use the OLAP Pivot Table Extensions, which is available to download on Codeplex.
If you didn’t make it to Tech Ed in North America last week, then you can download session recordings and take part in the labs at the Tech Ed site here. The session recordings are also available for streaming or download on Channel 9.
Today I wanted to highlight in particular John McAllister’s session titled Managing Master Data with MDS and Microsoft Excel. We’ve heard for a while about the use of Excel with the new version of MDS, so it’s great to finally see it. The key points from the session for me are:
- The new Excel add-in:
- Allows the editing of Master Data (even lets you use Excel formulas) and allows you to publish back to MDS. It seems to provide the same functionality as the UI that we’re used to – it’s aware of the domain attributes, giving you drop downs in the relevant cells, and is aware of the business rules for an entity.
- Also lets you actually build an MDS model from Excel.
- Improvements to the current web-based UI (Master Data Manager). The design has changed in several places, for example it seems that the grid now has a side bar to display the member that you’ve clicked on. This “side bar” looks to have better validation capability than the current R2 member editing experience.
- Integration with Data Quality Services. It’s a little unclear exactly what this will be, as it’s not shown in the video, so I think we’ll have to wait a while to find out.
- The database now contains a staging table per entity, as opposed to a single shared staging table.
- Improved Performance – this is in several areas of the product apparently, but it’s certainly visible in the video that the Master Data Manager UI is quicker.
I’ll be doing a series of blog posts on the new features once the new CTP/beta of Denali is released.
For the past seven months, Adatis have been working on new book for Master Data Services. The book, titled SQL Server 2008 R2 Master Data Services, will be published by Packt Publishing and will be released on 30th June.
For those of you who don’t know (I certainly didn’t), Packt operate something called the RAW program. This allows you to download an electronic copy of the book in draft format, as the chapters are made available. If you want, you can also feedback any changes that you’d like to see in the book, which means you get a chance to shape the book with the content that you want to include. What’s great is that when the book is finished, you get to download a copy of the e-book for free.
We’re almost done, just another month or so of late nights and early mornings before the June release date! Until then, the book is available on Packt’s site here, and is also available for pre-order on Amazon here.
|There is now a second edition of the Kimball Microsoft Data Warehouse Toolkit, which contains updated material for SQL Server 2008 R2. The book is out now in the US, and will be released in the UK in towards the end of March. I’ve got hold of a copy, so I thought it would be worth giving it a review. |
As with the previous edition, the new version of the book starts off by covering the Business Requirements, before explaining the core concepts associated with the Kimball approach, such as surrogate keys and slowly changing dimensions, for example.
An overview of the Microsoft BI stack is given, before explaining how to build each part of the data warehouse in turn, covering the database engine, SSIS, SSAS, SSRS and SharePoint. There are also new sections on both Master Data Services and PowerPivot – personally I found the PowerPivot chapter to be really interesting – there’s a good explanation of how PowerPivot can compliment the data warehouse, as well as explaining how to use it of course.
The latter part of the book explains management of the data warehouse, covering security, backups, metadata, maintenance. The final chapter, “Present Imperatives and Future Outlook” includes a narrative on common problems in DW projects, as well as an honest appraisal of the MS BI stack which you can’t help but agree with.
So the burning question is should you buy this book? That needs to be answered from two perspectives – 1)If you’re an owner of the original and 2)If you don’t own the original. If you do own the original, then you’ll find that there’s a huge amount of content that’s the same across the two editions. There is new content – particularly around MDS, PowerPivot, Change Data Capture and Compression, for example, as well as updates to some of the old content. £30 is quite a lot to pay for a book that’s similar to a book that you already own, but overall I think its worth going for.
If you don’t own the original then you will find this book to be an extremely useful asset in implementing MS data warehousing solutions, and I wouldn’t hesitate to buy it. One point I will make is that this doesn’t represent an exhaustive how to for each of the components (E.g. SSAS), which I personally think is fine. I know a handful of people made this gripe at the first edition, but realistically there’s only so much you can cram into 648 pages on the huge subject of data warehousing AND then cover each element of the MS BI stack.
I’m pleased to say that I’m delivering a couple of Master Data Services sessions at Microsoft events over the next few months.
Firstly, as I’m sure you’ve heard, SQL Bits 8 is running in Brighton from 7th – 9th April. I’m delivering a session on the Saturday that will show how Master Data Services business rules work, which will include looking at how to create and use a SharePoint workflow.
Although last I heard it was getting pretty full, you can hopefully still register for SQL Bits here.
Tech Days 2011
Tech Days is running again this year, from the 23rd – 25th May. There are a number of sessions running for Tech Days in London over the 3 days, covering areas such as Windows Azure, Windows Phone and of course SQL Server. The session that I’m doing is to show how we’ve implemented Master Data Services at one of clients, EMI, and how they benefit from using MDS.
You can register for Tech Days 2011 here.
Derived Hierarchies in Master Data Services are hierarchies created from the domain-based attribute relationships between entities, as explained in the following documentation, resulting in a fixed-depth hierarchy. A less obvious feature is that they can also be recursive, which by its very nature implies a ragged, variable-depth hierarchy. This is a characteristic that’s shared by Explicit Hierarchies, so it got me thinking that it would be worth a blog post to contrast Explicit Hierarchies and Recursive Hierarchies.
Explicit Hierarchies use Consolidated Members to group other Consolidated and Leaf Members in order to make a hierarchy. The structure can be completely ragged, with each Consolidated Member having an unlimited number of Leaf or Consolidated Members beneath it.
An example of using Explicit Hierarchies is in the sample Product model that comes with MDS, which has an Explicit Hierarchy called Product Management. This groups members as shown in the (slightly edited) image below:
Note that the hierarchy is ragged, and that the Consolidated Members (PL, SL, CL, AC, ACC and CPT) can all have different attributes to the leaf members.
Recursive Hierarchies allow an Entity Member to contain an attribute that points to another Member within the same Entity. Essentially this is a classical parent-child structure, which can of course also be ragged.
An example of using recursive hierarchies is in the Employee_Hierarchies sample model which can be downloaded from here. The image below shows one of the recursive hierarchies in the model, for employee managers:
So, as we can see, both can be ragged and have a variable depth, so which one should we use? The following key points may help decide:
- Explicit Hierarchies must use Consolidated Members for the grouping. So a Leaf Member can’t suddenly become a parent. E.g. in an employee manager type hierarchy, where it would be common for an employee to become a manager once they have the relevant experience, with Explicit Hierarchies it would involve converting the Leaf Member to a Consolidated Member, which would be a pain.
- Its slightly easier to work with Recursive Hierarchies as you don’t need to worry about Consolidated Members. It can be a bit awkward to continually switch between Leaf and Consolidated Members in the explorer grid, so good to avoid it if possible.
- It’s not possible to set individual member permissions on a Recursive Hierarchy, whereas it is with an Explicit Hierarchy.
- Explicit Hierarchies can be added to Collections for more flexibility, whereas Recursive Hierarchies cannot.
- You won’t be able to have multiple hierarchies by using a Recursive Hierarchy (unless you go down the root of having multiple self-pointing attributes), whereas you can create multiple Explicit Hierarchies quite easily.
Considering all the above points, in my opinion anyway, a Recursive Hierarchy is more suitable when you have a genuine parent child requirement, e.g. for an employee hierarchy, or a chart of accounts. This is just down to the ease of use. An Explicit Hierarchy is more suitable when you just want to create groupings for your leaf members, but have the requirement to potentially have multiple ragged reporting hierarchies.
For any given entity in Master Data Services, its only the Code attribute that cannot be left blank when entering data in the front end or via the MDS staging tables. For any of the other attributes, you can then use the MDS business rules to enforce that the attribute must have a value, using the Is Required action, for example.
All this works well in my opinion, but I have found a difference in how blanks and NULLs are handled in the front end Vs in the MDS staging process, which has a knock on impact on the business rules.
To explain I’ve created an example model called Supplier, with an entity called Supplier and a test attribute called Attribute1. There is also a simple business rule that will set the value of Attribute1 to the value ‘Name blank’ when the built in Name attribute is left blank:
Adding Data via Master Data Manager
My first step is to add a new member to the Supplier entity, which I will just give a code of 1 and intentionally leave the name blank:
Clicking the save button runs the single business rule that exists and correctly sets Attribute1 to the value of ‘Name blank’
If we determine the entity table for the Supplier entity (which can be done by looking at the [mdm].[viw_SYSTEM_SCHEMA_ENTITY] view) then in my case the entity table is called tbl_24_97_EN. Running a select statement on this table shows the following result:
The name has been entered as NULL, which is fine, as we’ve got the behaviour that we want.
Adding Data via the MDS Staging Tables
Now lets look what happens when we do the same operation via the MDS staging tables. First of all, we need to load a supplier member into the mdm.tblStgMember table, which we can carry out with the following SQL:
INSERT INTO mdm.tblStgMember
--Insert member with no name and a Code of 2
The null has been left in the statement as we’re simulating, for example, not being given a name for the member from our data source. You could then have an MDS business rule set up to handle missing names, perhaps setting them to ‘Unknown Supplier’.
To load from the staging tables into the entity table, we run the staging sweep stored procedure:
DECLARE @ModelName nVarchar(50) = 'Supplier'
DECLARE @Version_ID int
SET @Version_ID = (SELECT MAX(ID)
WHERE Model_Name = @ModelName)
--This will cause the members in the staging tables to be loaded into MDS
EXECUTE mdm.udpStagingSweep 1, @Version_ID, 1
To finish up we will also validate the model, which will cause the business rule to fire:
DECLARE @ModelName nVarchar(50) = 'Supplier'
DECLARE @Model_id int
DECLARE @Version_ID int
SET @Version_ID = (SELECT MAX(ID)
WHERE Model_Name = @ModelName)
SET @Model_ID = (SELECT Model_ID
WHERE Model_Name = @ModelName)
--Cause validation to occur
EXECUTE mdm.udpValidateModel 1, @Model_ID, @Version_ID, 1
If we now do a select from the entity table, we will see that the member with the code of 2 doesn’t have a NULL for the name column, but instead has a blank string:
If we now go and take a look at the Master Data Manager front end, we will see that the business rule condition hasn’t been met for the the member that we added, as its ‘Attribute1’ attribute is still blank:
This has happened because the stored procedure that implements the business rules carries out the ‘Name is equal to Blank’ check that we’re relying on by checking if the name column is NULL.
How this affects you will obviously depend on how you’re using the MDS business rules. It doesn’t affect the other member attributes, only the Name attribute. If you want to handle null Names on entities then a good work around is to use the business rules in the front end, and then to use SSIS to replace null names with ‘Unknown’ or something similar.
For your Master Data services business rules that have a validation action, a notification, if you’ve set one up, will be sent if the rule condition gets satisfied.
I’ve noticed a quirk to the way that the notifications are sent – I was getting unexpected email notifications when creating members – which as it happens is not a bug, but is worth exploring to understand how the MDS rules work.
The above was essentially due to the two step save process that occurs in the MDS front end. When adding a member, you first of all enter the name and the code, then you click the save button. At this point, the business rules are run, and validation issues are created in a table called [mdm].[tblValidationLog], and then displayed to the user at the bottom of the screen, before the user has entered the attributes. This is shown below on the sample Product model, which I have changed to enable notifications on a few of the rules:
Then click the save button, which will display the attributes pane below. As the sample Product model contains a rule called “Required Fields”, with a condition of “None”, then the validation issues will also be displayed:
At this point we’ve got no way to avoid getting the validation issues, as the rule has no condition, and we’ve not had any opportunity yet to alter the attributes that the rule checks. The email notifications behaviour to be aware of at this point is:
- The notifications are sent on an interval, defined by the system setting called “Notification e-mail interval”. Its default is 120 seconds.
- If you rectify the issues that you’ve been warned about in the gap between the last email notification cycle, and the next one, then [mdm].[tblValidationLog] will be updated to say that the validation issue doesn’t need a notification, and you will get no emails.
- On the other hand, if you’re in the middle of a few things and essentially don’t act quick enough, then you will get a notification with the above issues displayed, as the email interval has kicked in during the middle of your save.
Hence I was sometimes getting email notifications as shown below:
But sometimes I was getting no notification for carrying out what I perceived to be exactly the same behaviour. So by no means a bug, but worth explaining to your users how the notifications work.
I’ve just found out that the first CTP for SQL Server codename Denali is now available for download. The link to get both the 32 and 64 bit versions is:
On a related note, Simon Sabin has recently posted here about the Tech-Ed Europe keynote that shows a glimpse of a new Silverlight reporting tool for SQL that’s part of Denali. Well worth watching the keynote video…
Subscription Views are one of the ways to extract data from Master Data Services. Created via the ‘Data Integration’ section of the MDS Master Data Manager front end, they result in the creation of a physical SQL Server view in the underlying database.
All this you will probably know from the MSDN article. The purpose of this post, however, is to explain a few of the more subtle differences between the different types of subscription views that get created, and looking how they can be used to feed a data warehouse.
You can create a subscription view from a derived hierarchy, which will return all the levels from your hierarchy in a de-normalised form. If you choose the sample Product model, then you will see that one of the Derived Hierarchies that you can pick from is called Category. As your Product dimension in a data warehouse would include attributes such as Product Category and Sub Category, then on the surface this view looks to be ideal.
Unfortunately there is a slight issue with subscription views based on derived hierarchies. Users could have altered one of the members that exist within the hierarchy before the data warehouse nightly load has taken place. If the MDS validation of that member hasn’t occurred yet (which it doesn’t in some situations), then the member could potentially contain some incorrect attribute values that we don’t want to expose to the data warehouse.
For example, a user has edited a product, as shown below, but chosen a different Sub Category to that of its adjacent members:
As indicated by the yellow question mark above, the member is awaiting validation. However, when I run a select statement against my view that’s based on Derived Hierarchy Subscription view, then the member that is marked above will still appear in the view results. We would not want to load this change into the data warehouse, as until the business rules have been run, we dont know if the new Sub Category is valid in the context of the other attributes. For reference, the same behaviour exists for members that have explicitly failed validation.
You can also create a subscription view by picking an entity, and then choosing the format of Leaf Attributes, as shown below:
In my case, I’ve set up a subscription view called ProductLeafAtributes, and when I run a select statement against this view in SQL Server, then I get the following results:
As with the subscription view that’s based on a Derived Hierarchy format, I still get all the members returned, but there’s also a column called Validation Status, that shows that one of the members is awaiting revalidation. Therefore, this column can be filtered on in the data warehouse load, just to take through the changes that have definitely passed validation.
Creating the subscription views based on entities, rather than Derived Hierarchies, exposes the validation status of a member, and is therefore the preferred subscription view format to use in this case. The one catch is that the Leaf Attributes format will not return attributes of any related entities, e.g. Product Category. It will return the Sub Category code, so therefore its necessary to join to separate Subscription Views for both Sub Category and Category in order to pick up the missing attributes.
The SQL Bits 7 agenda was published recently, and I’m pleased to say that I will be delivering a session titled ‘End to End Master Data Management with SQL Server Master Data Services’.
The heart of this session will be an end-to-end technical demo that will show the role of MDS in extracting master data from source systems, before demonstrating how data is fed via MDS to subscribing downstream systems. I’ll cover areas such as the MDS staging process, business rules and notifications amongst others.
I’ve also seen from Simon Sabin that the SQLCAT team will be showing some of the features of SQL 11, code name Denali. Well worth going for that alone surely?!
You can register for SQL Bits here.
Back to a bit of Master Data Services this week – there’s been a few learning resources that have appeared here and there for MDS over the past couple of months. I thought that it would be worth rounding up the ones that I’ve noticed:
There was a comment on my last blog post by Peter Eb asking about whether Reporting Services 2008 R2 knows how to render SSAS KPIs automatically. The short answer is that it doesn't unfortunately, but there are only a few steps needed to get it working.
For this blog post I've built a simple report that uses the Adventure Works DW 2008 R2 Analysis Services database as a data source. The first step is to build a data set that contains a query that references the Month attribute and the Channel Revenue KPI:
I'm interested in reporting on the status and trend part of the KPI, so I've added the month and value parts of the KPI, plus blank columns for the status and the trend:
For the status and the trend columns, the next step is then to drag on (or right click and insert) an indicator in each of the cells. This is when the manual steps come in:
- Right click on the indicator to open the properties window;
- Set the Value of the indicator to be the Status part of the Analysis Services KPI;
- Change the States Measurement Unit to Numeric;
- Configure the indicator Start and End properties to be -1, 0 and 1. SSAS KPIs can have an unlimited number of status values, so it may be necessary to add more states here if SSAS returns more than 3 values for the Status.
This changes that are needed to the indicator properties window is shown below:
Applying the same method to the Trend part of the KPI produces the following simple report:
This post wasn't really supposed to compare SSRS Vs PerformancePoint Services, but I've found out that PerformancePoint Services does pick up the SSAS KPI meta data (slight mismatch on the colours only), saving you having to set up the indicator yourself:
It's a shame that the KPIs aren't picked up automatically by SSRS, but then again the above process isn't too difficult to master.
Although you've always been able to deliver KPIs in Reporting Services by using images and expressions, Reporting Services in SQL Server 2008 R2 now includes a feature called Indicators, which assists with producing KPIs out of the box.
This got me thinking how producing KPIs and/or scorecards in Reporting Services compares to doing the same in PerformancePoint Services. This blog post gives a short overview of creating KPIs in each technology, before comparing the pros and cons of each.
Reporting Services 2008 R2 Indicators
The indicators in Reporting Services come as an extra item on your toolbox, and can be dragged onto a Tablix for example. Once you do add the indicator to your report, you'll be prompted to select the shape that you want for your indicator:
So no more messing about with images! Once you click OK your indicator will be added to your report. If you then right click on the properties of the indicator, you can choose what to base the indicator's value on on, and also set its state, both of which essentially control the points at which your indicator changes its colour and shape. What I quite like here is that it's easy to change the colour for each state, and it's also easy to add new states, if you need four colours rather than three for example:
You seem to be able to add a huge number of indicator states here, certainly more than enough to deal with KPIs. Taking some data from AdventureWorks, based around employee sales targets, I've created the following very basic report using the Indicators:
PerformancePoint Services KPIs
The KPI and Scorecard have had a few decent changes in SharePoint 2010 (e.g. drill up/down), but Dashboard Designer is still the tool that's used to create KPIs in PerformancePoint Services, and its a different experience to developing in Visual Studio.
Assuming that you get to the point where you have a data source set up, the first thing to do in order to create a KPI is to pick an indicator. The choice here is very similar to the choice in SSRS:
The alternative here is to pick a blank indicator, where you'll have to decide yourself how many levels you want for your indicator, and what the colours and shapes should be. You can pick an image for an indicator state (as you can with SSRS), although you are limited to a maximum of 10 states. Again, that should be more than enough for KPIs.
The objects within PerformancePoint Services are very modularised, meaning that you create indicators, KPIs and scorecards separately. If you need to re-use an indicator in a new KPI, then you can just point to an existing indicator, or alternatively you can create a new one.
Again, taking some sample data from AdventureWorks, I've produced the following basic scorecard, based on similar data:
The following table lists some KPI/Scorecard features and compares how SSRS and PPS are able to deal with them:
Reporting Services 2008 R2
Out of the box Parent KPIs
No built in parent KPIs
- Two built in scoring methods;
- Weighting concept to assist with scoring.
No built in scoring capability
- Analysis Services;
- SQL Server;
- SharePoint List;
- SQL Server;
- Analysis Services;
- OLE DB, ODBC;
- SharePoint List;
- SAP Netweaver BI;
- KPIs can only exist within Scorecards.
- A scorecard can only contain KPIs and their associated Indicator and data items.
- Indicators can be embedded in other objects or exist by themselves.
- Other objects (e.g. spark lines) can co-exist alongside Indicators in the same Tablix.
- Colours & number formats of the Scorecard grid can be changed;
- Colours/fonts in data cells cannot be changed.
If the Indicator is embedded in a Tablix, then each cell can be altered in terms of Colours and number formats.
Ease of Use
As long as your data is in good shape, its possible for power users to build KPIs and Scorecards with Dashboard Designer relatively easily.
The indicator windows are very intuitive in SSRS, but I'd say that building an SSRS report is harder for a power user than building a PPS dashboard.
Scorecards allow users enter comments against KPI values, which is a great feature.
The standard SSRS expressions allow you to fine tune indicators, states, values and formatting of your KPIs.
In summary, PerformancePoint Services in SharePoint 2010 will allow you to get some great looking KPIs and Scorecards up and running much quicker, as that's what it's designed to do. It is, however, more restrictive than Reporting Services, so SSRS is the better choice if your KPI requirements are complex in terms of layout or formatting.
More Posts « Previous page
- Next page »