Simon

Simon Whiteley's Blog

Slowly-Changing Parent-Child Dimensions – Part 1: The Theory

A common structure in OLTP systems is a parent-child relationship with Object and ObjectParent tables creating a recursive structure. This is easily represented as a Warehouse dimension table, usually flattened out but occasionally left as native Parent-Child if required.

image

I recently encountered an issue where the client not only required a flexible, dynamic parent-child dimension but also required it to be slowly changing. Each fact record joining to the dimension, at any granularity, had to be aware of its hierarchal context at that point in time, despite there only being one source record for dimension object. We end up with something like this:

image

Throughout the course of the engagement I implemented two different models for this as requirements changed, I’ll detail the solutions in part 2 of this post.

If you’re not familiar with Slowly Changing Dimensions, namely Type 2 SCD, I put together a quick introduction in a blog post here to bring you up to speed.

So how do you slowly change a Parent-Child relationship?

Since we have source records for each node in our hierarchy, it makes sense to keep this structure. However, over time that same node can fit into the hierarchy in different places, changing parent records, gaining/losing children or even moving levels.

We therefore need to differentiate between the different hierarchal contexts of node. As with other SCD implementations, we give each historical version a surrogate key, so we can accurately identify the node in the relevant context.

Parent-Child makes this tricky however – if we just created historical versions when the individual nodes changed, records joining in at lower granularities would not know which historical version to use. We therefore need to amend child nodes to point to the new surrogate keys. This, in turn, means we have to create new references to those nodes, and so on down the hierarchy.

Essentially, anytime a node in the hierarchy changes, we need to create new historical versions for all descendants of that node. Our original structure, using this method, would now look like this:

image

You can see that when Parent N’s own parent changed, we had to propagate that change to Object A to ensure our lowest granularity object has a key for each temporal version of the structure.

We can now use this structure with a fact table – we know that a fact record occurring on 10/02/14, for example, would aggregate up the orange-marked path through A2 > N1 > Y.

That’s the key point to implementing SCD for Parent-Child structures. If any changes occur, anywhere in the hierarchy, all descendants will need a new type 2 record created. By using Type 2 SCD, each object is referencing the surrogate key of its parent, not the business key, this way every join in the structure is based upon a specific historical version of that record and thus historical context is implied by the foreign key relationships.

Whilst complex in theory, once implemented your fact > dimension relationship is very simple. Your fact record has a single foreign key which holds the full historical context of that record.

In the next post, I’ll discuss a couple of techniques for implementing the above transformation inside a standard ETL structure.

Comments (7) -

  • David Rice

    1/21/2016 3:09:43 PM | Reply

    Hi Simon,

    A really useful blog.  Did you ever get around to posting about how you'd achieve this in ETL.  I'm facing this challenge at the moment and its proving to be a complex one to crack!

    Thanks,
    Dave

    • Simon Whiteley

      4/1/2016 4:41:26 PM | Reply

      Very true, I think it's been sitting in my drafts folder for a year or so! I'll pop something together detailing the ETL method over the next week or two - essentially you can write a recursive CTE to scroll down the hierarchy finding all children of any nodes that have changed. Then create new type 2 objects for anything in that tree.

      Cheers for reading!
      Simon

  • Jon

    4/11/2016 5:26:15 PM | Reply

    I've got the same problem, needing to add SCD functionality to a Parent Child Hierarchy. Hoping you post your solution up soon.
    Cheers
    Jon

    • Simon Whiteley

      5/12/2016 10:37:01 AM | Reply

      Hi Jon, apologies for the delay - solution can be found here: http://bit.ly/27g7sd3

  • Di

    4/28/2016 12:44:32 AM | Reply

    Hi Simon,
    I do face same problem and really trouble to solve by considering we have 6 levels of employee/manager hierarchy. People can change in any level any time.
    Can you share your ETL? Really appreciate.
    Thanks, Di

    • Simon Whiteley

      5/12/2016 10:37:47 AM | Reply

      Hi Di, apologies to you too! 2 years is a little long for a followup post Smile - Solution is here: http://bit.ly/27g7sd3

Loading