Sacha Tomey

Sacha Tomey's Blog

Basket Analysis using Analysis Services 2005

My job is pretty varied these days :o)

Today I've been looking at satisfying some basket analysis type requirements where, for example, we need to determine what other products customers have bought in addition to a known product.

We could have used data mining for this but decided not to as, in this case, it literally was a few mouse clicks away from our existing AS2005 cube.

The implementation was surprisingly straight forward and query results (admittedly on a subset of the full dataset) very impressive.

In an attempt to outline the implementation steps I will add some basket analysis to the Adventure Works sample database to demonstrate how easy it was !

Requirement

List all products (and associated profit margin) that have also been purchased by customers who have bought a road bike from the 550-W range.

Approach

Rather than using data mining to handle queries of this type I want to extend my current cube.  In order for the performance to be acceptable we will adopt a variation on the many-to-many distinct count solution discussed in numerous blogs, white papers and articles.

This result of this approach is a new measure group, we'll call it 'Customer Cross Purchase' and a new reference dimension based on product for our 'known product', we'll call this 'Product Filter'

 clip_image002From a relational perspective, the product filter dimension provides a lookup for customers (in Customer Cross Purchase) that have purchased that product.

This in turn provides the ability to locate all orders (Internet Sales Fact) for that subset of customers.  Once we know all the orders for that subset of customers we can simply list the distinct products that make up those orders. (Product)




Implementation

The measure group contains a distinct set of customers and products that will act as the junction table of our many-to-many.  This is a view over the main fact and customer dimension.

CREATE VIEW [dbo].[vCustomerCrossPurchase]
AS
SELECT DISTINCT
f.CustomerKey, f.ProductKey
FROM         dbo.FactInternetSales AS f INNER JOIN
                      dbo.DimCustomer AS c ON f.CustomerKey = c.CustomerKey

Next, the fact table [view] is added to the Data Source View, ensuring the relationships to the customer and product dimension are set up.

clip_image004

With the DSV updated, the cube itself can be extended.  A new measure group is created together with a Product Filter reference dimension.  The dimension usage looks like the diagram below.  This ensures the appropriate relationships exist as outlined above

clip_image006

The new measure group is mapped to the product filter and customer dimensions, as per our dsv.  Note, this is not done automatically as the real (non referenced) product dimension is selected instead.

To complete the picture, the Customer Cross Purchase measure group is used to create a many-to-many relationship between the Product Filter and the main Internet Sales measure group.

Testing

Once deployed and processed we can test out our modifications to check for some reasonable results.

The following MDX query returns a list of all products that have been bought by customers buying  a road bike from their 550-W range in Reading, England.

select
    non empty [Product].[Product].members on rows,
    [Measures].[Internet Gross Profit] on columns
from

    [adventure works]
where
(
    [Product Filter].[Product Model Categories].[Model].&[Road-550-W],
    [Customer].[City].&[Reading]&[ENG]
)

The query is simple, it lists products on the rows and profit on the columns, the 'where' clause slices by Reading, England and employs the new Product Filter dimension.  The Product Filter dimension has the effect of slicing the main fact table by customers that have bought a bike from the Road 550-W range.

clip_image008

So, we can see that apart from the road bikes, a few other accessories have been purchased too.  A quick couple of queries confirm the results.

image

Three customers (above) have bought a road bike from the 550-W range and the other products these customers have bought (below) match our results !

image

MDX Stored Procedures (Part 1)

UPDATE:  You can find part 2 here, along with a link to download a C# quick start. 

A long time ago I realised that inevitably, however well specified the requirements or however tight the deadlines are, clients always throw a curve ball.  To that avail, like many, I adopt an agile approach to project delivery with a keen eye on building frameworks and generic components that improve productivity and flexibility for current and future projects.

My current project is no different to any other, we knew the requirements would change as, even from the outset, they were high level and vague.  With little budget to flesh them out, a controlled, iterative approach was essential.  It's a typical BI project, various data sources require consolidating and processing to allow analytics and reporting to assist in strategy and revenue generation.

The reporting element was the vaguest of them all, although what did exist immediately eliminated using Reporting Services as the reporting/delivery mechanism.  The required reports were highly interactive and aesthetically challenging enough to stretch Reporting Services functionality to the limit, even without the vagueness busting curve balls out of sight but already primed for deployment.

With all this in mind, we decided to use ASP.NET 2.0 web forms for the reporting as this gave us a little more freedom with regard to layout, a high degree of flexibility surrounding interactivity and the ability to quickly react to the ever changing requirements.  Now, with an eye on productivity we decided to build an MDX Helper class (mentioned previously here and here) that would streamline the development process and enable us to react quickly to those inevitable changes.

Our own requirements for such a component were as follows:

  • Allow text files containing parameterised MDX to be executed.
  • Must support the execution of multiple queries over a single connection
  • Return a conventional ADOMD cell set object containing the query results
  • Return a typed/untyped ADO.NET DataTable object for binding to native and third party .NET controls
  • Allow the inspection of various server, database and cube properties
  • Lightweight, re-usable and simple to use

The public interface of the resultant component looks like this:

Class

In it's simplest form, usage is straight forward.

Take the following parameterised query stored in file UKSales.txt

select
    [Product].[Product Categories].[Category].members on rows,
    [Measures].[Internet Sales Amount] on columns
from
    [Adventure Works]
where
    @Country

You return an ADOMD cell set object using the following C# code snippet:

// Create an instance of the helper class, setting cube and query information
MDXHelper helper = new MDXHelper("ASTestServer", "Adventure Works DW", "Adventure Works", MDXHelper.MDXResourceFormat.RelativeURL, "UKSales.txt");

// Add in the country parameter 
helper.Parameters.Add("@Country", "[Geography].[Geography].[Country].&[United Kingdom]");

// Execute the query
CellSet result = helper.ExecuteCellSet();

// Tidy up
helper = null;

There are several overrides on the constructor and 'Execute' methods to allow for flexible usage and it's been used in anger now for a good couple of weeks supporting every type of MDX query we've managed to throw at it so far.  It still needs a little refinement and some full-on testing but it has achieved its goal and has helped us easily digest changes to the original requirements.  It has also given us some great ideas for version 2 where the query definitions are stored in the Analysis Services cube definition itself.

In a future post I'll delve into the approach and implementation of the helper class to see how it works under the hood.

XML Namespaces and the ADOMD XML Reader Result Set

I'm building a generic helper class to flatten an ADOMD data set to a data table for easier consumption by default and third party .NET controls.  I can't rely on the Load method of the data table, nor the CellSet object as in some cases not all the information is persisted; datatypes and top level hierarchies to name two.

To persist all of the information I need to traverse the actual xml dataset returned by the ExecuteXMLReader method of the ADOMDCommand object.

I don't use xml enough these days to remember all the nuances with namespaces and it took me a little while to work out what to set the namespaces to in order for the 'SelectNodes' method to work.

Below is the code snippet I used to ensure the output from the ExecuteXmlReader method is fully traversable.

// Execute the command returning the XML data set
XmlReader reader = command.ExecuteXmlReader();

// Load the object into an XML document
XmlDocument xml = new XmlDocument();
xml.Load(reader);

// Create an XML namespace manager
System.Xml.XmlNamespaceManager nsManager=new XmlNamespaceManager(xml.NameTable);

// Add the namepsaces to the manager
nsManager.AddNamespace("defaultNS","urn:schemas-microsoft-com:xml-analysis:mddataset");
nsManager.AddNamespace("xsi",
http://www.w3.org/2001/XMLSchema-instance);
nsManager.AddNamespace("xsd",
http://www.w3.org/2001/XMLSchema);

// Return a list of nodes that represent the row information
XmlNodeList rows = xml.SelectNodes("defaultNS:root/defaultNS:Axes/defaultNS:Axis[@name ='Axis1']", nsManager);

This kb article helped me out a bit:
http://support.microsoft.com/default.aspx?scid=kb;en-gb;q318545

There are some more goodies included in my helper class that I will blog about in the future, once it's stable !

WPF, the answer to those 'Problem' users?

A number of .NET 3.x application samples have been published over the last few months and one in particular has caught my eye.

Family.Show is a feature rich ClickOnce desktop based genealogy application developed by Vertigo using WPF. It's pretty impressive, and even if you are not into that sort of thing, well worth a look.

What really stood out for me is how close the family tree viewer was to the decomposition tree found in ProClarity and the PerformancePoint Server add-in for Excel. 

image

The clean, crisp, vector based UI allows slick and fast navigation of the entire data set that could, with a little imagination lend itself to a rather slick data analytical tool. 

This kind of UI really appeals to me and I think will also really appeal to those occasional 'problem' users that do not need to understand the full data model and are not interested in overcoming the learning curve of ProClarity or Excel but, on occasion need to analyse data.

SQL Server 2008 (Katmai) Launch - Feb 27th 2008

According to a recent Microsoft Press Release the launch date for SQL Server 2008 will be 27th February 2008 in LA.  In addition, Visual Studio 2008 (Orcas) and Windows Server 2008 (Longhorn) will be launched at the same time and will kick off a whole series of marketing focused launch events around the world.

No news on the RTM release yet although word on the street suggests that Visual Studio 2008 will be released before the end of this year !!  Now we are now in July, the start of Microsoft Fiscal Year 2008, can we expect the RTM release of SQL Server and Windows Server 2008 before July next year ?  If not, surely they should be called SQL Server and Windows Server 2009 !?

PerformancePoint Planning Admin Console

The first thing I noticed about the PerformancePoint Planning console was the 'in your face' look and feel.  The menu items are big and clunky, headings oversized and the font is generally big and wiry.

So, I made a point of finding the theme and style sheet files responsible and pulled out my metaphorical scalpel and got to work.

I've not made a huge design impact.  In fact, some would probably not notice as I've gone for the subtle-yet-effective cosmetic surgery.  However, it makes a difference to me and stops me from rolling my eyes every time I see a 28pt sub title!

For those interested, the .skin and .css files can be found in the following location on a default installation:

c:\Program Files\Microsoft Office PerformancePoint Server\2007\AdminConsole\App_Themes\BizAdmin

The two files in question are:

- bizadmin.skin and,
- BizAdmin.css

The updated files for my custom paint job can be downloaded from here

PerformancePoint Planning Connection Issues

Despite what the name suggests, adding yourself to the Global Administrator Role is not enough to access PerformancePoint server through the PerformancePoint Planning Business Modeler.

If you only belong to the Global Administrator Role and attempt to connect to the server through the Business Modeler you will receive a permissions error:

image

You need to belong to one of the other Roles (User Administrator, Data Administrator or Modeler) in order to gain access to the server content through the Planning Business Modeler.

I find this a little odd as the Global Administrator Role is required to perform certain actions like deleting an application which is unavailable (grayed out) if, for example, you connect as a member of the modeler role only.

I guess the reason is that users that are only members of the Global Administrator Role can perform the actions through the web-based Administration Console and do not necessarily need access to the business user targeted business modeler application.

TechEd Announcement: Microsoft purchase Dundas..

..Well, not quite.  Microsoft have not actually bought Dundas but they have bought the code base for the charting, gauge, calendar, map and barcode controls.  This is excellent news - I've used Dundas charting in various guises for a number of years and have been well impressed with the depth and breadth of the suite. 

The plan is to integrate the charting functionality into Reporting Services 2008 (Katmai).  Adding this functionality to Reporting Services is going to be great step forward and I can't wait to start developing some presentation ready reports.  Should make the sales pitch easier too !

PerformancePoint Server Planning

Adrian Downes has put together a great little background article on the reasons and purpose of PerformancePoint Server planning.  An overview of the types of plans, budgets and approaches are outlined together with a brief teaser on how PerformancePoint Server can overcome some of the associated challenges.

Even better, it looks like this is the start of a whole new series relating to the Planning aspect of PerformancePoint.