Tristan Robinson

Tristan Robinson's Blog

Power BI Composite Models and Aggregations

So something which feels like its gone under the radar a bit is the addition of composite models and aggregations in Power BI. These have been around a couple of months now in Preview but I’ve not seen much buzz around the features. After recently attending PASS Summit and seeing Christian Wade’s 1 Trillion row demo, I thought it was worth blogging about – especially as I expect parts of this functionality to be expanded to Azure AS in the near future.

If you think about it, the majority of BI query’s are done at some form of aggregation level but users will still want the detail, and so these features essentially unlock the ability to report against giant datasets at both an aggregated and granular level at the same time – something that was not physically possible beforehand. Power BI is now able to work with multiple petabytes of data with ease over trillions of rows with pretty much instant response times – so this is a bit of a game changer.


Composite Models

Previously with Power BI, you were restricted to either DirectQuery or Import for a single data source. With Composite models that has all changed. A report can now include data connections from more than one connection in any combination, so you are able to connect to both DirectQuery AND Import in the same model and combine the data. This then opens up a world of possibilities that were not possible before.  With this functionality, we also have the ability to create many-to-many relationships, but I won’t be doing into detail in this for this blog. As part of this also comes some functionality called Storage Mode which unlocks table-level storage features. This is the next part of the jigsaw.


Storage Mode

This allows you to specify whether tables are either imported or queried on the fly. There is now a third option “Dual” which acts as either a cached or not cached table depending on the context of the query that's submitted at runtime.

Setting the correct storage mode has many advantages such as:

  • Better query performance (no need to push real time queries to the dataset for relatively static tables such as those used for filters).
  • Ability to use larger datasets (interactive analysis is better for datasets you don't want to cache into memory such as tables with significant data volume).
  • Data refresh optimisation (only cache data that's necessary, meaning quicker refresh times).
  • Reducing latency in real-time data (no need to re-read those static tables each time).

To access the storage mode, you either select the table in the report pane and click Properties, or navigate to the table in the new Modelling View (requires exposing this in Options as its in preview).

Changing a table storage mode will then prompt us to change the related tables to Dual. This propagation logic is designed to help with models that contain many tables!



Modelling View

Before I get to aggregations, I also need to introduce the new modelling view. This will be changing to fall in line with what you may be familiar with on SSAS / Azure AS. It feels like this is one of the first steps to integrate SSAS further into the Power BI spectrum and facilitate enterprise datasets. By updating the modelling view, you now have the ability to create additional diagrams, thus allowing you to break out particularly complex models by subject area rather than trying to join it all up within one model. For instance, if you are working with 5 tables in your model, this isn't a particularly big deal – now multiply that by 10 and all of a sudden it becomes a pain to manage, and this new feature will help alleviate that. The feature also allows you to multi select objects in one go and update properties through a new side-pane. An example of this new feature can be seen below.

Modeling View



Finally, the most exciting feature of the lot – aggregations. Without the new modelling view, without the composite models, without the new storage modes – this would not be possible. Aggregations allow us to create a newly defined table within the model but with all fields of the original table it is created from to be aggregated/grouped in some or another. These aggregations/groups include count, group by, max, min, sum, and count rows. This can be set to either set to Import mode with/without incremental refresh, or via DirectQuery and optimised by using columnstore indexes. This then unlocks faster query performance over huge datasets via the cache at aggregated level using fractions of resource compared to detailed levels.

You then have the flexibility to set the aggregate table to import, while leaving the granular table to DirectQuery, which will speed up performance when navigating the report. The aggregate table can also be hidden so that user will not even be aware of the implementation.

You can also have more than one aggregation table, potentially one for a particular set of reports and then a lower grain aggregate table for analysts, and this can be done through the precedence option. The tool then will then query the aggregation table with highest precedence level first to see if it can resolve the runtime query before moving down the levels.

Aggregations don’t just work for measures such as SUM or COUNT. It also works for more complex measures - all of the components of a measure are always folded down to the sum, min, max, count, level and then those sub query’s work out whether they can hit the cache or not.  Its also worth mentioning that you can check if its hit the cache via the DAX editor.




For me, all these features mentioned above will be a bit of a game changer in the right scenario. I’ve ran into scaling problems before and re-working the logic at the Warehouse level for both a granular and aggregated datasets certainly added some overhead. While the functionality obviously doesn't work with Live Connection into SSAS cubes, this new functionality opens up options for using Power BI to do of the activities which historically would have only been done in SSAS. As Christian Wade also alluded to at PASS, Power BI will soon become a superset of SSAS and so we may well be doing less and less with SSAS and more with Power BI in this area as time goes by.

Generating Usage Statistics from a SSAS Tabular Cube

Once you have users accessing your cube it’s almost inevitable at some point that someone will ask you to generate usage statistics from it, and there are a number of methods to achieve this. In this quick blog post, I’ll detail them and my experiences with each, and then use this data to create a PBI report.


Native Tabular Properties

The first method is natively through the tabular cube properties. This also has the added bonus (read impact) that it will optimise future aggregations based on usage – in fact that’s its main purpose.

This can be done by setting the CreateQueryLogTable to true, setting up the QueryLogConnectionString (to point to the DB where the usage table requires hosting), setting the QueryLogSamping rate (10 means every 10th query will be logged), and finally the name of the QueryLog table.

SQL Server Analysis Services Query Log Related Properties

Advantages of this method is that its very easy to setup with limited knowledge required and it could potentially improve performance if you have an environment where users submit repetitive queries. Unfortunately there are also a number of disadvantages which led me to find other methods. Firstly, it creates a degree of overhead on the cube if its sampling too often; we actually had visible performance related complaints once we turned it on – either through the sampling or change to the “optimised” aggregations. Depending on the sampling rate, you could also find that users who rarely use the cube are not picked up as part of the stats.  As well as this any changes to the cube structure will cause the logging table to be reset. The table is also limited in terms of what it actually logs (as you can see below) – useful if you just want just the user and timestamp info but not much else, and no real ability to configure.

Query the OlapQueryLog table for Cube activity


AS Trace

To that extent, I looked for other tools to do the same task but better and I found AS Trace.

Originally built for SQL Server 2012, it works fine on 2014 – and provides you the ability to run a trace against the cube activities (and log to a table) exactly like the SQL profiler but without the overhead of the GUI which adds unnecessary memory/processor power. It also runs as a windows service allowing it to restart automatically when the server reboots. If this is the case, the tool also logs the existing data to a History table and truncates the logging table. Exactly what I was after.

The tool collects information based on a preconfigured Analysis Services Profiler template, which can be optimised depending on which events you are interested in. I initially ran it using most events selected, and with a limited user set it was generating in the region of 25,000 rows a day. This was clearly not maintainable for a long period of time. I then used the following blog post to understand what each event of the profiler was giving me and then just created a lightweight trace definition file to give me what I wanted. I limited it to Query Begin, Query End (for DAX/MDX statements) and Audit Logon/Logout (for session data).

The setup is very straight forward, just run the install.bat as an escalated privileged account, and check it installs the service correctly. Next, add your SSAS service account to the Logon of the service, make sure the account has “Log on as Service” and membership to the database you are writing to in the form of DDL and DML access, i.e. able to create tables, write to tables – and lastly admin rights to the instance of SSAS you intend to use.

Next, configure the ASTrace.exe.config file with the parameters you want the tool to use. This includes the location of the cube (can handle multiple cubes), the location of the trace definition file, the location of the DB instance and table you want to log to and lastly whether you want to preserve history on restart. The only thing I couldn’t do here, is set the schema of the table it was using to log to, which defaults to dbo.

All that’s left is to start the service, and check the log file to see if it has created any errors on start-up. If not, the table should be created correctly and awaiting input.

I also saw another method while researching using Extended Events (XEvents) but did not implement this once AS Trace provided me with the information I needed.


View / Power BI Report

I initially used the data to run a limited set of queries to extract total users, and total queries for a given time period. This was useful to a degree but from the data collected I realised I could be doing so much more.

This lead me to do some analysis across the type of metrics being logged, and allowed me to create a view on top of the tables of what I thought might be useful on a report. I removed all the redundant columns it was tracking, and created some friendly names for the EventSubclass, and other columns. I used the PATINDEX function to check the query statement for existence of some important values – while not an exact science, it would give me a good picture of the split between certain user groups and KPIs being run. I’ve included the view definition below.

I ended up limiting the data to EventClass 10 as this seemed to capture all the necessary data. The only downside I have seen so far is that users querying through the Power BI web service are anonymised under the service account name. I’m currently looking into options to resolve this which I’ve seen as configuration options on Power BI – to allow through the username as long as it can be matched at the other end.

RowNumber AS ID, 
SPID AS SessionID,
CurrentTime AS DateQueried, 
NTUserName AS Username,
CASE EventSubClass 
WHEN 0 THEN 'MDX Query (Excel)' 
WHEN 3 THEN 'DAX Query (Power BI)' 
END AS QueryType, 
CASE Success WHEN 1 THEN 'Successful Query' ELSE 'Query Error' END AS SuccessfulQuery,
CONVERT(DECIMAL(10,2),CONVERT(DECIMAL(18,3),Duration)/1000) AS DurationSec, 
TextData AS Query, 
CASE PATINDEX('%Mexico%',TextData) WHEN 0 THEN 0 ELSE 1 END AS MexicoMarket,
CASE PATINDEX('%Colombia%',TextData) WHEN 0 THEN 0 ELSE 1 END AS ColombiaMarket,
EventClass = 10

Once I had the view, creating the report was relatively straight forward, and can be seen below.

I included metrics for number of queries by user (blurred out) which also doubled as a filter, the % split of queries for things such as Excel/Power BI, a measure of queries by timeframe, a logarithmic scaled display for queries by query duration, and lastly a split of queries by KPI. I intend to tweak these once I receive more data from the trace, but was relatively happy with the information that they were providing.

Untitled (002)

Please let me know if you have any comments.

Extracting Users from AD using SSIS

I've recently been working on a project which required KPI level security alongside the traditional row level security secured at a geography level. This would limit what financial data a user could see within a cube, without having to create multiple cubes or use perspectives (which would not actually secure the data).

To achieve this, I needed to populate a set of 'KPI User/Role' tables stored in Master Data Services (MDS) with a list of users who were stored in a particular AD group. I would need these tables updated on a regularly basis to grant/revoke access. We could then use these names along with the USERNAME() function in DAX to filter.


The Solution

One method to solve my problem would be by using SSIS. The package could  be setup to run as part of a SQL Agent Job, either by a schedule or on demand. My list of users were stored in an AD group called LH_FIN.

image image

To start with you will need to truncate and clear your MDS staging tables that you are about to populate.  You can then use the data flow to process the majority of the logic, by creating a script component task. The purpose of this is to loop through Active Directory and pick up the user details that belong to the specified AD Group or set of AD groups if dealing with multiple roles.

A number of variables are defined which the task uses to complete the lookup.

  • strLDAP – the LDAP directory on which to perform the lookup
  • strDomain – the domain on which the AD group(s) belong
  • strADPrefix – the AD group prefix from which to return user information about
  • strADParent – the parent group which contains the AD groups which you are looking up (may not need to be used if only looking up a single AD group)

To extract users from multiple groups, make sure the prefix stored in the variable strADPrefix covers both groups. Once the rows are extracted it would then be a case of using SSIS to split the data accordingly on the AD Group Name.

The following code can be used in the script:

The first section sets up the objects required to interrogate the directory, and the fields we expect to return from the accounts – the most important of which is memberof which is used to check versus our AD Prefix. It also filters out items such as service accounts and disabled accounts to speed up the interrogation process.

    Public Overrides Sub CreateNewOutputRows()

        Dim domain As String = Variables.strDomain
        Dim searchRoot As New DirectoryEntry(Variables.strLDAP, Nothing, 
                                             Nothing, AuthenticationTypes.Secure)
        Dim dirSearch As New DirectorySearcher(searchRoot)
        dirSearch.SearchScope = SearchScope.Subtree
        'LogonName, GroupsUserBelongsTo, Department, JobTitle, MailAddress, DisplayName
        'filter to user objects
        dirSearch.Filter = "(objectCategory=person)" 
        'filter to user objects
        dirSearch.Filter = "(objectClass=user)" 
        'filter out disabled accounts
        dirSearch.Filter = "(!userAccountControl:1.2.840.113556.1.4.803:=2)" 
        'filter out password never expires accounts, i.e. service accounts
        dirSearch.Filter = "(!userAccountControl:1.2.840.113556.1.4.803:=65536)" 
        'sets chunk size for retrieving items
        dirSearch.PageSize = 1000

The next section of code performs the search, and for any LDAP objects it finds within the filter set, returns the properties requested. These properties are then stored in key/value pairs.

        Dim props As ResultPropertyCollection
        Dim values As ResultPropertyValueCollection
        Dim key As String
        Dim userAccountName As String
        Dim departmentHome As String
        Dim jobtitle As String
        Dim GroupName As String
        Dim email As String
        Dim displayName As String
        Dim groups As New ArrayList

        Using searchRoot
        'Return all LDAP objects, LDAP://acl/CN=Tristan Robinson,OU=Employees,DC=ACL,DC=local
        'CN = Common Name, OU = Organisational Unit, DC = Domain Component
            Using results As SearchResultCollection = dirSearch.FindAll()
                For Each result As SearchResult In results
                    'For each object return properties, i.e. displayname, memberof, etc
                    props = result.Properties
                    For Each entry As DictionaryEntry In props
                        key = CType(entry.Key, String)
                        'For each property, inspect the property and record its value
                        'Logon Name
                        If key = "samaccountname" Then
                            values = CType(entry.Value, ResultPropertyValueCollection)
                            userAccountName = CType(values.Item(0), String)
                        End If
                        If key = "department" Then
                            values = CType(entry.Value, ResultPropertyValueCollection)
                            departmentHome = CType(values.Item(0), String)
                        End If
                        'Job Title
                        If key = "title" Then
                            values = CType(entry.Value, ResultPropertyValueCollection)
                            jobtitle = CType(values.Item(0), String)
                        End If
                        If key = "mail" Then
                            values = CType(entry.Value, ResultPropertyValueCollection)
                            email = CType(values.Item(0), String)
                        End If
                        'Display Name
                        If key = "displayname" Then
                            values = CType(entry.Value, ResultPropertyValueCollection)
                            displayName = CType(values.Item(0), String)
                        End If
                        'Groups User Belongs To (array/collection)
                        If key = "memberof" Then
                            values = CType(entry.Value, ResultPropertyValueCollection)
                            groups = GetGroups(values)
                        End If

The final section filters the data into the output buffer if from the array list we’ve extracted above, we have matching strings from our original AD Prefix variable. It will then reset, and loop round for the next account.

                    'Export user details to buffer if it passes the logical test
                    For Each item As String In groups
                        'Avoids computer accounts, i.e. ending with $
                        If userAccountName.EndsWith("$") = False 
                        And item.ToString.StartsWith(Variables.strADPrefix) 
                        Then 'And item.ToString <> (Variables.strADParent) 
                            If String.IsNullOrEmpty(userAccountName) Then
                                Output0Buffer.UserAccountName_IsNull = True
                                Output0Buffer.UserAccountName = userAccountName
                            End If
                            If String.IsNullOrEmpty(domain) Then
                                Output0Buffer.Domain_IsNull = True
                                Output0Buffer.Domain = domain
                            End If
                            If String.IsNullOrEmpty(item.ToString) Then
                                Output0Buffer.GroupName_IsNull = True
                                Output0Buffer.GroupName = item.ToString
                            End If
                            If String.IsNullOrEmpty(jobtitle) Then
                                Output0Buffer.JobTitle_IsNull = True
                                Output0Buffer.JobTitle = jobtitle
                            End If
                            If String.IsNullOrEmpty(email) Then
                                Output0Buffer.Email_IsNull = True
                                Output0Buffer.Email = email
                            End If
                            If String.IsNullOrEmpty(displayName) Then
                                Output0Buffer.DisplayName_IsNull = True
                                Output0Buffer.DisplayName = displayName
                            End If
                        End If
                    userAccountName = ""
                    departmentHome = ""
                    jobtitle = ""
                    GroupName = ""
                    email = ""
                    displayName = ""                    
            End Using
        End Using
    End Sub

I also required a function to split the list of groups a user belonged to and store them in another array list.

    Private Function GetGroups(ByVal values As ResultPropertyValueCollection) As ArrayList
        Dim valueList As ArrayList = New ArrayList()
        For Each Item As Object In values
            Dim memberof As String = Item.ToString()
            Dim pairs As String() = memberof.Split(",".ToCharArray)
            Dim group As String() = pairs(0).Split("=".ToCharArray)
        Return valueList
    End Function

End Class

Once a list of users has been extracted, you will need to do a lookup against the existing list and only stage those that are new. This can be achieved through a simple lookup component . You can then move the rows into the MDS staging table ready for the load into MDS.

After the data flow has processed successfully, the next stage is to sweep the records into MDS using one of the built in stored procedures in the product. The results of which can be seen in the MDS Import View on the MDS site.

The following entities can then be used to power the DAX:

  • User (Username, Team, AD Login – populated from the script above)  
  • Role (Role – populated manually)
  • User Role (User, Role – joins users to roles, populated manually)
  • KPI (KPI – populated manually)
  • KPI Role (Role, KPI – joins roles to KPIs, populated manually)

These are then processed to the DW by ETL, and a UserKPI view is written across the tables to provide an AD Login to KPI pair.

For filtering the measures in the cube, you can then apply the following piece of DAX to the original measure – Gross Sales in the example below:

Gross Sales:=
        CALCULATETABLE( 'UserKPI', UserKPI[KPIName] = "Gross Sales"),
        UserKPI[ADLogin], USERNAME()
    [Gross Sales (ACTUAL)]
    BLANK ()

This concludes this blog post – hopefully this will be useful for anyone that wants to extract users from AD to control access / security within an application. Data could also be loaded direct into DB tables rather than MDS if required. 

Please feel free to comment!