As I’m sure you’ve seen, registration for SQLBits 9 – Query Across the Mersey is now open.
This time around I’ll be running a deep-dive training day with Tim Kent on Master Data Services. This will be a full day of training, showing you how MDS can be used to manage the master data in your organisation. We’re going to start by giving an overview of Master Data Management, before moving on to covering the following MDS topics in detail:
- Installing, configuring, and maintaining Master Data Services
- Creating and using models
- Version management
- Business rules and workflow
- Importing data into Master Data Services
- Integrating with other systems
You can register for the MDS Training day, or one of the other 10 training days, by using the following registration page.
Remember, the early bird discount expires at midnight on Friday this week!
Update – We are running another MDS course in February 2012. Click here for the details.
As I’m sure you’ve heard, CTP3 of SQL Server Denali was released yesterday, and can be downloaded here. Denali includes Data Quality Services (DQS), Microsoft’s new data cleansing and matching component that’s based on the Zoomix acquisition that occurred a couple of years back. Data Quality Services didn’t make it into the first CTP, but is now available, so I though it would be worth a quick blog post.
Installing Data Quality Services
Data Quality Services is an option in the main install, but when you go to run the Data Quality Client, you’ll get a message stating that DQS is not installed. As far as I can tell, DQS needs to be installed manually, by running the DQSInstaller.exe, which you can find in the SQL Server Binn directory. This will create two SQL Server databases:
Data Quality Client
Once DQS is configured, you’ll be in a position to use the Data Quality Client, which is a windows application, available in 32 or 64 bit. Once you connect, you’ll get the following screen:
The knowledge base is the key to how DQS works, being utilised to determine the data quality of your source data. You can create your own knowledge base, using your own data, or even cloud-based data. For this example, I’m going to use the built in knowledge base called DQS Data.
As an example, I’ve created some data that I want to cleanse. It’s adventure works country data that I’ve put into a separate table that I’ve called dbo.Country.
My task now is is to clean this data, which I can do by creating a data quality project:
I’ve called my project Countries, and I’ve picked the built-in DQS Data knowledge base, which I noticed contains reference data for countries. The activity that I’ve selected is Cleansing, and then I’ve clicked Create.
DQS will then prompt for mapping the source data to one of the domains in the Knowledge Base. Here I’ve mapping my country name to the Country domain from the knowledge base:
After clicking the Start button on the next screen, the cleaning process starts, which gives the following results:
I put a couple of typos into my source data, such as changing Algeria to ‘Algerian’ and Albania to ‘Albana’. These were picked up by DQS, along with a few others, but a user has the opportunity to approve or reject, via the radio buttons shown above. I chose to approve the first two, and then clicked next. In the final screen, DQS allows you to output the cleaned data to a separate table, as shown below:
If you go into the table that DQS has created, you will see that there is a Name_Status column, which holds the cleaning status on a per record basis:
This can be used to update the source data, if required, and therefore address the data quality issues.
This has been a quick introduction to cleaning data with Data Quality Services. There’s plenty more to look at, and I hope to find time to do so at some point - In particular I’m keen to take a look at how DQS works with Master Data Services. I’ve noticed that there’s also a new MS DQS blog (http://blogs.msdn.com/b/dqs/) – I’m sure that will be a great resource for DQS learning.
One question that I get asked from time to time is how to add a calculated field to an Excel Pivot Table that’s connected to an Analysis Services cube. I’ve been aware of a workaround to do this for a while, but a quick Bing :-) revealed that the common answer given to this question is that it’s not possible.
It definitely is possible, it’s just not that obvious and is perhaps a little limited. Here’s how it can be done in Excel 2010:
First of all, connect to a cube and produce a pivot table report. I’ve connected to the 2008 R2 version of the Adventure Works DW cube and have produced the following report, using the Internet Sales Amount measure:
Adding a Calculated Field
If you now open the Field List, you will see that Internet Sales Amount is selected of course. Here’s the trick – you can select the measure twice by dragging it into the Values area, which will give the following:
Now click the drop down next to Internet Sales Amount2, which will display the following window:
You can change the name of the field – I’ve called mine Percent Share. In the Show Values as drop-down, I’ve picked % of Grand Total, so that I can see the share of the sales in each country across all years. So now the report is as follows:
A better approach is to put as many calculations in the cube as possible, but as some calculations requested by users are report-specific, this feature of Pivot Tables allows users to produce the reports that they want.
The calculations that you can carry out are only those in the ‘Show values as’ drop-down. There’s quite a few in there, especially in Excel 2010, as mentioned here. If that’s not enough, then an alternative is to use the OLAP Pivot Table Extensions, which is available to download on Codeplex.
If you didn’t make it to Tech Ed in North America last week, then you can download session recordings and take part in the labs at the Tech Ed site here. The session recordings are also available for streaming or download on Channel 9.
Today I wanted to highlight in particular John McAllister’s session titled Managing Master Data with MDS and Microsoft Excel. We’ve heard for a while about the use of Excel with the new version of MDS, so it’s great to finally see it. The key points from the session for me are:
- The new Excel add-in:
- Allows the editing of Master Data (even lets you use Excel formulas) and allows you to publish back to MDS. It seems to provide the same functionality as the UI that we’re used to – it’s aware of the domain attributes, giving you drop downs in the relevant cells, and is aware of the business rules for an entity.
- Also lets you actually build an MDS model from Excel.
- Improvements to the current web-based UI (Master Data Manager). The design has changed in several places, for example it seems that the grid now has a side bar to display the member that you’ve clicked on. This “side bar” looks to have better validation capability than the current R2 member editing experience.
- Integration with Data Quality Services. It’s a little unclear exactly what this will be, as it’s not shown in the video, so I think we’ll have to wait a while to find out.
- The database now contains a staging table per entity, as opposed to a single shared staging table.
- Improved Performance – this is in several areas of the product apparently, but it’s certainly visible in the video that the Master Data Manager UI is quicker.
I’ll be doing a series of blog posts on the new features once the new CTP/beta of Denali is released.
For the past seven months, Adatis have been working on new book for Master Data Services. The book, titled SQL Server 2008 R2 Master Data Services, will be published by Packt Publishing and will be released on 30th June.
For those of you who don’t know (I certainly didn’t), Packt operate something called the RAW program. This allows you to download an electronic copy of the book in draft format, as the chapters are made available. If you want, you can also feedback any changes that you’d like to see in the book, which means you get a chance to shape the book with the content that you want to include. What’s great is that when the book is finished, you get to download a copy of the e-book for free.
We’re almost done, just another month or so of late nights and early mornings before the June release date! Until then, the book is available on Packt’s site here, and is also available for pre-order on Amazon here.
|There is now a second edition of the Kimball Microsoft Data Warehouse Toolkit, which contains updated material for SQL Server 2008 R2. The book is out now in the US, and will be released in the UK in towards the end of March. I’ve got hold of a copy, so I thought it would be worth giving it a review. |
As with the previous edition, the new version of the book starts off by covering the Business Requirements, before explaining the core concepts associated with the Kimball approach, such as surrogate keys and slowly changing dimensions, for example.
An overview of the Microsoft BI stack is given, before explaining how to build each part of the data warehouse in turn, covering the database engine, SSIS, SSAS, SSRS and SharePoint. There are also new sections on both Master Data Services and PowerPivot – personally I found the PowerPivot chapter to be really interesting – there’s a good explanation of how PowerPivot can compliment the data warehouse, as well as explaining how to use it of course.
The latter part of the book explains management of the data warehouse, covering security, backups, metadata, maintenance. The final chapter, “Present Imperatives and Future Outlook” includes a narrative on common problems in DW projects, as well as an honest appraisal of the MS BI stack which you can’t help but agree with.
So the burning question is should you buy this book? That needs to be answered from two perspectives – 1)If you’re an owner of the original and 2)If you don’t own the original. If you do own the original, then you’ll find that there’s a huge amount of content that’s the same across the two editions. There is new content – particularly around MDS, PowerPivot, Change Data Capture and Compression, for example, as well as updates to some of the old content. £30 is quite a lot to pay for a book that’s similar to a book that you already own, but overall I think its worth going for.
If you don’t own the original then you will find this book to be an extremely useful asset in implementing MS data warehousing solutions, and I wouldn’t hesitate to buy it. One point I will make is that this doesn’t represent an exhaustive how to for each of the components (E.g. SSAS), which I personally think is fine. I know a handful of people made this gripe at the first edition, but realistically there’s only so much you can cram into 648 pages on the huge subject of data warehousing AND then cover each element of the MS BI stack.
I’m pleased to say that I’m delivering a couple of Master Data Services sessions at Microsoft events over the next few months.
Firstly, as I’m sure you’ve heard, SQL Bits 8 is running in Brighton from 7th – 9th April. I’m delivering a session on the Saturday that will show how Master Data Services business rules work, which will include looking at how to create and use a SharePoint workflow.
Although last I heard it was getting pretty full, you can hopefully still register for SQL Bits here.
Tech Days 2011
Tech Days is running again this year, from the 23rd – 25th May. There are a number of sessions running for Tech Days in London over the 3 days, covering areas such as Windows Azure, Windows Phone and of course SQL Server. The session that I’m doing is to show how we’ve implemented Master Data Services at one of clients, EMI, and how they benefit from using MDS.
You can register for Tech Days 2011 here.
Derived Hierarchies in Master Data Services are hierarchies created from the domain-based attribute relationships between entities, as explained in the following documentation, resulting in a fixed-depth hierarchy. A less obvious feature is that they can also be recursive, which by its very nature implies a ragged, variable-depth hierarchy. This is a characteristic that’s shared by Explicit Hierarchies, so it got me thinking that it would be worth a blog post to contrast Explicit Hierarchies and Recursive Hierarchies.
Explicit Hierarchies use Consolidated Members to group other Consolidated and Leaf Members in order to make a hierarchy. The structure can be completely ragged, with each Consolidated Member having an unlimited number of Leaf or Consolidated Members beneath it.
An example of using Explicit Hierarchies is in the sample Product model that comes with MDS, which has an Explicit Hierarchy called Product Management. This groups members as shown in the (slightly edited) image below:
Note that the hierarchy is ragged, and that the Consolidated Members (PL, SL, CL, AC, ACC and CPT) can all have different attributes to the leaf members.
Recursive Hierarchies allow an Entity Member to contain an attribute that points to another Member within the same Entity. Essentially this is a classical parent-child structure, which can of course also be ragged.
An example of using recursive hierarchies is in the Employee_Hierarchies sample model which can be downloaded from here. The image below shows one of the recursive hierarchies in the model, for employee managers:
So, as we can see, both can be ragged and have a variable depth, so which one should we use? The following key points may help decide:
- Explicit Hierarchies must use Consolidated Members for the grouping. So a Leaf Member can’t suddenly become a parent. E.g. in an employee manager type hierarchy, where it would be common for an employee to become a manager once they have the relevant experience, with Explicit Hierarchies it would involve converting the Leaf Member to a Consolidated Member, which would be a pain.
- Its slightly easier to work with Recursive Hierarchies as you don’t need to worry about Consolidated Members. It can be a bit awkward to continually switch between Leaf and Consolidated Members in the explorer grid, so good to avoid it if possible.
- It’s not possible to set individual member permissions on a Recursive Hierarchy, whereas it is with an Explicit Hierarchy.
- Explicit Hierarchies can be added to Collections for more flexibility, whereas Recursive Hierarchies cannot.
- You won’t be able to have multiple hierarchies by using a Recursive Hierarchy (unless you go down the root of having multiple self-pointing attributes), whereas you can create multiple Explicit Hierarchies quite easily.
Considering all the above points, in my opinion anyway, a Recursive Hierarchy is more suitable when you have a genuine parent child requirement, e.g. for an employee hierarchy, or a chart of accounts. This is just down to the ease of use. An Explicit Hierarchy is more suitable when you just want to create groupings for your leaf members, but have the requirement to potentially have multiple ragged reporting hierarchies.
For any given entity in Master Data Services, its only the Code attribute that cannot be left blank when entering data in the front end or via the MDS staging tables. For any of the other attributes, you can then use the MDS business rules to enforce that the attribute must have a value, using the Is Required action, for example.
All this works well in my opinion, but I have found a difference in how blanks and NULLs are handled in the front end Vs in the MDS staging process, which has a knock on impact on the business rules.
To explain I’ve created an example model called Supplier, with an entity called Supplier and a test attribute called Attribute1. There is also a simple business rule that will set the value of Attribute1 to the value ‘Name blank’ when the built in Name attribute is left blank:
Adding Data via Master Data Manager
My first step is to add a new member to the Supplier entity, which I will just give a code of 1 and intentionally leave the name blank:
Clicking the save button runs the single business rule that exists and correctly sets Attribute1 to the value of ‘Name blank’
If we determine the entity table for the Supplier entity (which can be done by looking at the [mdm].[viw_SYSTEM_SCHEMA_ENTITY] view) then in my case the entity table is called tbl_24_97_EN. Running a select statement on this table shows the following result:
The name has been entered as NULL, which is fine, as we’ve got the behaviour that we want.
Adding Data via the MDS Staging Tables
Now lets look what happens when we do the same operation via the MDS staging tables. First of all, we need to load a supplier member into the mdm.tblStgMember table, which we can carry out with the following SQL:
INSERT INTO mdm.tblStgMember
--Insert member with no name and a Code of 2
The null has been left in the statement as we’re simulating, for example, not being given a name for the member from our data source. You could then have an MDS business rule set up to handle missing names, perhaps setting them to ‘Unknown Supplier’.
To load from the staging tables into the entity table, we run the staging sweep stored procedure:
DECLARE @ModelName nVarchar(50) = 'Supplier'
DECLARE @Version_ID int
SET @Version_ID = (SELECT MAX(ID)
WHERE Model_Name = @ModelName)
--This will cause the members in the staging tables to be loaded into MDS
EXECUTE mdm.udpStagingSweep 1, @Version_ID, 1
To finish up we will also validate the model, which will cause the business rule to fire:
DECLARE @ModelName nVarchar(50) = 'Supplier'
DECLARE @Model_id int
DECLARE @Version_ID int
SET @Version_ID = (SELECT MAX(ID)
WHERE Model_Name = @ModelName)
SET @Model_ID = (SELECT Model_ID
WHERE Model_Name = @ModelName)
--Cause validation to occur
EXECUTE mdm.udpValidateModel 1, @Model_ID, @Version_ID, 1
If we now do a select from the entity table, we will see that the member with the code of 2 doesn’t have a NULL for the name column, but instead has a blank string:
If we now go and take a look at the Master Data Manager front end, we will see that the business rule condition hasn’t been met for the the member that we added, as its ‘Attribute1’ attribute is still blank:
This has happened because the stored procedure that implements the business rules carries out the ‘Name is equal to Blank’ check that we’re relying on by checking if the name column is NULL.
How this affects you will obviously depend on how you’re using the MDS business rules. It doesn’t affect the other member attributes, only the Name attribute. If you want to handle null Names on entities then a good work around is to use the business rules in the front end, and then to use SSIS to replace null names with ‘Unknown’ or something similar.
For your Master Data services business rules that have a validation action, a notification, if you’ve set one up, will be sent if the rule condition gets satisfied.
I’ve noticed a quirk to the way that the notifications are sent – I was getting unexpected email notifications when creating members – which as it happens is not a bug, but is worth exploring to understand how the MDS rules work.
The above was essentially due to the two step save process that occurs in the MDS front end. When adding a member, you first of all enter the name and the code, then you click the save button. At this point, the business rules are run, and validation issues are created in a table called [mdm].[tblValidationLog], and then displayed to the user at the bottom of the screen, before the user has entered the attributes. This is shown below on the sample Product model, which I have changed to enable notifications on a few of the rules:
Then click the save button, which will display the attributes pane below. As the sample Product model contains a rule called “Required Fields”, with a condition of “None”, then the validation issues will also be displayed:
At this point we’ve got no way to avoid getting the validation issues, as the rule has no condition, and we’ve not had any opportunity yet to alter the attributes that the rule checks. The email notifications behaviour to be aware of at this point is:
- The notifications are sent on an interval, defined by the system setting called “Notification e-mail interval”. Its default is 120 seconds.
- If you rectify the issues that you’ve been warned about in the gap between the last email notification cycle, and the next one, then [mdm].[tblValidationLog] will be updated to say that the validation issue doesn’t need a notification, and you will get no emails.
- On the other hand, if you’re in the middle of a few things and essentially don’t act quick enough, then you will get a notification with the above issues displayed, as the email interval has kicked in during the middle of your save.
Hence I was sometimes getting email notifications as shown below:
But sometimes I was getting no notification for carrying out what I perceived to be exactly the same behaviour. So by no means a bug, but worth explaining to your users how the notifications work.
I’ve just found out that the first CTP for SQL Server codename Denali is now available for download. The link to get both the 32 and 64 bit versions is:
On a related note, Simon Sabin has recently posted here about the Tech-Ed Europe keynote that shows a glimpse of a new Silverlight reporting tool for SQL that’s part of Denali. Well worth watching the keynote video…
Subscription Views are one of the ways to extract data from Master Data Services. Created via the ‘Data Integration’ section of the MDS Master Data Manager front end, they result in the creation of a physical SQL Server view in the underlying database.
All this you will probably know from the MSDN article. The purpose of this post, however, is to explain a few of the more subtle differences between the different types of subscription views that get created, and looking how they can be used to feed a data warehouse.
You can create a subscription view from a derived hierarchy, which will return all the levels from your hierarchy in a de-normalised form. If you choose the sample Product model, then you will see that one of the Derived Hierarchies that you can pick from is called Category. As your Product dimension in a data warehouse would include attributes such as Product Category and Sub Category, then on the surface this view looks to be ideal.
Unfortunately there is a slight issue with subscription views based on derived hierarchies. Users could have altered one of the members that exist within the hierarchy before the data warehouse nightly load has taken place. If the MDS validation of that member hasn’t occurred yet (which it doesn’t in some situations), then the member could potentially contain some incorrect attribute values that we don’t want to expose to the data warehouse.
For example, a user has edited a product, as shown below, but chosen a different Sub Category to that of its adjacent members:
As indicated by the yellow question mark above, the member is awaiting validation. However, when I run a select statement against my view that’s based on Derived Hierarchy Subscription view, then the member that is marked above will still appear in the view results. We would not want to load this change into the data warehouse, as until the business rules have been run, we dont know if the new Sub Category is valid in the context of the other attributes. For reference, the same behaviour exists for members that have explicitly failed validation.
You can also create a subscription view by picking an entity, and then choosing the format of Leaf Attributes, as shown below:
In my case, I’ve set up a subscription view called ProductLeafAtributes, and when I run a select statement against this view in SQL Server, then I get the following results:
As with the subscription view that’s based on a Derived Hierarchy format, I still get all the members returned, but there’s also a column called Validation Status, that shows that one of the members is awaiting revalidation. Therefore, this column can be filtered on in the data warehouse load, just to take through the changes that have definitely passed validation.
Creating the subscription views based on entities, rather than Derived Hierarchies, exposes the validation status of a member, and is therefore the preferred subscription view format to use in this case. The one catch is that the Leaf Attributes format will not return attributes of any related entities, e.g. Product Category. It will return the Sub Category code, so therefore its necessary to join to separate Subscription Views for both Sub Category and Category in order to pick up the missing attributes.
The SQL Bits 7 agenda was published recently, and I’m pleased to say that I will be delivering a session titled ‘End to End Master Data Management with SQL Server Master Data Services’.
The heart of this session will be an end-to-end technical demo that will show the role of MDS in extracting master data from source systems, before demonstrating how data is fed via MDS to subscribing downstream systems. I’ll cover areas such as the MDS staging process, business rules and notifications amongst others.
I’ve also seen from Simon Sabin that the SQLCAT team will be showing some of the features of SQL 11, code name Denali. Well worth going for that alone surely?!
You can register for SQL Bits here.
Back to a bit of Master Data Services this week – there’s been a few learning resources that have appeared here and there for MDS over the past couple of months. I thought that it would be worth rounding up the ones that I’ve noticed:
There was a comment on my last blog post by Peter Eb asking about whether Reporting Services 2008 R2 knows how to render SSAS KPIs automatically. The short answer is that it doesn't unfortunately, but there are only a few steps needed to get it working.
For this blog post I've built a simple report that uses the Adventure Works DW 2008 R2 Analysis Services database as a data source. The first step is to build a data set that contains a query that references the Month attribute and the Channel Revenue KPI:
I'm interested in reporting on the status and trend part of the KPI, so I've added the month and value parts of the KPI, plus blank columns for the status and the trend:
For the status and the trend columns, the next step is then to drag on (or right click and insert) an indicator in each of the cells. This is when the manual steps come in:
- Right click on the indicator to open the properties window;
- Set the Value of the indicator to be the Status part of the Analysis Services KPI;
- Change the States Measurement Unit to Numeric;
- Configure the indicator Start and End properties to be -1, 0 and 1. SSAS KPIs can have an unlimited number of status values, so it may be necessary to add more states here if SSAS returns more than 3 values for the Status.
This changes that are needed to the indicator properties window is shown below:
Applying the same method to the Trend part of the KPI produces the following simple report:
This post wasn't really supposed to compare SSRS Vs PerformancePoint Services, but I've found out that PerformancePoint Services does pick up the SSAS KPI meta data (slight mismatch on the colours only), saving you having to set up the indicator yourself:
It's a shame that the KPIs aren't picked up automatically by SSRS, but then again the above process isn't too difficult to master.
More Posts « Previous page
- Next page »