Adatis BI Blogs

Understanding DAX through the Power of Evaluation Context

If you really want to understand the behaviour of your DAX and be able to write more complex DAX, you need to understand Evaluation Context. The two really do go hand in hand. Many times I have had multiple people ask me to explain what Evaluation Context is which is why this blog post was put together. To get the most out of this read and be sure the below is for you, I have constructed a list of questions that you should be able to answer after reading this blog post:What is Evaluation Context?What is the difference between Filter Context and Row Context?What elements are considered in the Filter Context?What is Initial Filter Context (IFC) and what difference does it have with Filter Context?How do we read the Initial Filter Context?Does Evaluation Context apply before the DAX formula Evaluates?Are Grand Total/Sub Totals treated differently with Initial Filter Context?What is Context Transition?How is Context Transition applied?What is Evaluation Context?Evaluation Context is: The environment in which a DAX formula is evaluated in. So, when we write a DAX formula it gets evaluated within an environment. The DAX formula evaluated will be directly dependant on the Evaluation Context. This means the result returned can be different depending on the context, even if the DAX formula used is the same.Same DAX Formula, different result… huh?If this is your first read on Evaluation Context, you may be wondering why it is possible to get different results when using the same DAX formula. Let’s make sure we make this clear before moving on. Take a look at the below measure:Total Sales = SUM(Sales[ExtendedAmount])When reading this we interpret it as: “Show me the Total Sales Amount”. But when we add this measure in either of the below matrix’s we get a breakdown by Country and by Category. How is this possible? We never specified in the DAX formula anything in relation to Country or Category, right? All we asked for was the “Total Sales Amount”. So, how do we get different values on each row?It is important to understand why this is happening in order to really understand DAX. The reason this is occurring is due to the: Evaluation Context:The environment in which a DAX formula is evaluatedWhat types of Evaluation Context exist?We have two types of Evaluation Context:Filter ContextRow ContextFilter ContextFilter Context refers to the filtering that is applied to the tables within the data model. If the tables are joined, the filters propagate from the “one” side of the relationship to the “many” side of the relationship. The Filter Context is applied by: Rows in a VisualColumns in a VisualSlicers in a ReportFilters in a ReportOther on-screen visuals acting as filters in a ReportCALCULATE FunctionThe DAX formula evaluated within the Filter Context, is only evaluated once all filters have been applied to the Data Model. The deeper you go into Power BI, the more you will hear the term Filter Context. An additional term to be aware of is Initial Filter Context (IFC). This is the same as Filter Context with the only difference being it does not apply the CALCULATE function.Filter Context in ActionOkay, so we “know” what Filter Context is from what we have read so far! But I believe we can do better and the best way to do better is by looking at Filter Context in action. Below you will find four cases of Filter Context being applied, which once understood will be sure to boost your confidence in understanding this type of Evaluation Context.One thing worth noting, for each case below we will identify the Initial Filter Context meaning not considering the CALCULATE Function. The objective is to interpret all the elements on the report that make the Initial Filter Context.1. Filter Context in Action: RowLooking at the below cell highlighted in green within the Matrix, lets understand how this value is evaluated, we must ask ourselves:What is the Initial Filter Context for this particular cell?At the start of my Power BI journey, I initially used the below matrix to assist me in identifying the Initial Filter Context for a particular cell. I quickly grew out of this but found it useful in triggering the right thinking. So, the Initial Filter Context for the highlighted cell is only coming from the Rows in the Matrix from Territory[Country] = United Kingdom, as the below matrix displays:We now know how to read the Initial Filter Context of a particular cell. Slowly you will start to mentally visualise how the Initial Filter Context is applied to the underlying model. To make more sense of this now, let’s walk through the steps taken to propagate the Initial Filter Context:All rows are filtered in the Territory Table (Dimension) to display rows for ‘United Kingdom’.Those filtered records are propagated down the relationship from the ‘one’ to ‘many’ side.The Sales Table (Fact) only exists with those filtered records.It is very important to understand that the above three steps always take place for each individual cell and only once they do, does the DAX formula evaluate.2. Filter Context in Action: Row & SlicerWe must ask ourselves once again: “What is the Initial Filter Context for this particular cell?”To read the Initial Filter Context for the highlighted cell above in green, this time it goes beyond the Rows in the Matrix. Use the below Matrix to identify the Initial Filter Context:The Initial Filter Context is propagated through the below steps. Once again, these steps always take place for each cell individually and only once they do, does the DAX formula evaluate:All rows are filtered in the Territory Table (Dimension) to display rows for ‘United Kingdom’ and in the Calendar Table (Dimension) to display rows for ‘February’.Those filtered records are propagated down the relationship from the ‘one’ to the ‘many’ side. The Sales Table (Fact) only exists with those filtered records.3. Filter Context in Action: SlicerLet’s start of with the most important question: “What is the Initial Filter Context for this particular cell?”The purpose of this example is to understand how Grand Totals and Sub Totals work with the Initial Filter Context. For the above example, the IFC is not being filtered by an individual Row of Territory[Country]. In fact, the Grand Total is un-filtered from the Territory[Country], but it does get filtered by a Slicer in the report. It’s important to understand that the IFC for the highlighted cell is only the Slicer containing Calendar[MonthName] and not the Rows in the Matrix with Territory[Country].As always, the Initial Filter Context propagates as the below steps explain:All rows are filtered in the Calendar Table (Dimension) to display rows for ‘February’.Those filtered records are propagated down the relationship from the ‘one’ to the ‘many’ side. The Sales Table (Fact) only exists with those filtered records.4. Filter Context in Action: Rows, Column, Slicer & FilterFor the last time we need to ask the all famous question: “What is the Initial Filter Context for this particular cell?”The Initial Filter Context for the above highlighted cell is coming from multiple elements which are defined in the matrix below:Even though much more is happening in this example compared to the previous, the propagation of the Initial Filter Context still works through the three steps, resulting in the below:Tip: Filter IconA recent feature in Power BI called ‘Filter Icon’ is great way to identify the Initial Filter Context of a visual. Through a simple click, you can identify:Slicers affecting visualFilters affecting visualOther on-screen visuals affecting visualBut not the Rows and Columns of the Matrix itself. Regardless, it is a very useful feature which you can use to identify the IFC.Summary of Filter ContextWhat is the difference between Evaluation Context and Filter Context?Answer: Filter Context is a type of Evaluation Context.What is applied as part of the Filter Context?Answer: Rows, Column, Slicers, Filters, On-Screen Visuals, CALCULATE FunctionWhat is Initial Filter Context (IFC) and what difference does it hold with Filter Context?Answer: Initial Filter Context (IFC) does not consider the CALCULATE FunctionHow do we read the Initial Filter Context?Answer: “The IFC for this cell is Table[Column] = Value”Does Evaluation Context apply before the DAX formula Evaluates?Answer: YesAre Grand Total/Sub Totals treated differently with Initial Filter Context?Answer: No, they are not an exception. Remember, always ‘read’ the cell.Does the Filter Icon display the IFC?Answer: Partially, it displays everything apart from filters coming from Rows/Columns.Row ContextRow Context is a special iterating mechanism which takes into consideration the row it is currently on at the time that the DAX formula is evaluated. When thinking of Row Context, imagine each row in a table being a house. The Row Context will be the postman. The evaluation of the DAX formula is the delivery being made. The postman (Row Context) must visit each house (Row) in order to complete a delivery (Evaluation). This is how I initially start thinking of Row Context. Row Context is applied when we use: Calculated Columns, X-Functions and the FILTER() function.Row Context: Calculated ColumnTo make more sense of Row Context, let’s start off with Calculated Columns which apply Row Context. Take a look at the below Calculated Column:Margin £ = Sales[SalesExcVAT] – Sales[SalesCost]From our current understanding of Row Context and looking at the above DAX formula, the first question that might come to mind is “how does it know which row to start on?” Well the answer is simple. Row Context within a Calculated Column always starts with the first row and finishes on the last row. It iterates through the entire table the Calculated Column has been placed in, starting with the first row and evaluating the expression, then moving on the second row and evaluating the expression until it repeats the iteration for all records in the table.Looking at the below screenshot we can see that the Calculated Column ‘Margin £’ has produced a value for each row.As this Calculated Column has been placed in the Sales Table, this means the Row Context is applied against this table. The Sales Table has a total of 55,327 rows, therefore the Row Context (postman) will visit each of the 55,327 records (houses) and evaluate the expression (make the delivery).Row Context: X-FunctionsWe have spoken about Row Context and how this type of Evaluation Context is applied within Calculated Columns. Now let’s move on to another way Row Context is applied: X-Functions such as SUMX, RANKX, COUNTX, etc.Before we jump into an example, let me highlight this now. The analogy used above for Calculated Columns with the Postman (Row Context) having to visit each and every house (record) in order to complete a delivery (evaluation) is the same with X-Functions. The differences I am aware of are two.Firstly, the X-Function requires us to specify the table it will iterate through; therefore, we need to map the houses for the Postman. Secondly, the Calculated Column gets stored in your actual model, whilst X-Functions are measures that are not physically stored and only calculated at usage. If you are thinking whether to use a Calculated Column or a Measure and don’t know which to choose, always remember that memory is your biggest asset when working with Power BI. For a more in depth understanding, please check out the sources supplied below.Margin £ SUMX = SUMX(Sales, Sales[SalesExcVAT] – Sales[SalesCost])Take a look at the DAX formula above. First thing worth noting is that we declared the ‘Sales’ table. Therefore, we pinpoint each row (house) that needs to be visited.Looking at the table above, the DAX formula will visit the first record and evaluate the expression:Record 1 = 29.99 – 11.2163Then it will move to the second record and repeat the process:Record 2 = 28.99 – 10.8423Then it will move to the third record and repeat the process:Record 3 = 28.99 – 10.8423Until it repeats the same process for each record.Summary of Row ContextWhen is Row Context applied?Answer: Calculated Columns, X-Functions and Filter Function.From which row does Row Context begin?Answer: In Calculated Columns the first record otherwise the first record of table specified.What is the core difference between Row Context and Filter Context?Answer: Row Context works on a single row at a time and is aware of each individual row, whilst Filter Context refers to the filters applied when a DAX formula is evaluated.Why can’t I use an aggregator such as SUM, within Row Context?Answer: Row Context does not apply Filter Context by default, it must be enabled.What is Context Transition?Answer: The process of transforming the Row Context into Filter Context. Therefore, filters propagating down from one side to many side of tables.Row Context does not consider Filter ContextA very important part of Row Context is understanding that it does not automatically apply Filter Context. When I first started to write DAX, I remember using a SUM within a Calculated Column. The values in the column were clearly incorrect. Since then I have seen many experience the same outcome at least once, until figuring out that a measure does the trick. But this is not a trick and we should look to understand why this is happening.To better understand Context Transition, let’s work through an example. Below is the model which we will be working with, which is a simplified version from earlier examples:If we add the below DAX Formula as a Calculated Column within the Customers table, what do you think we would get back as a result?Total Sales Exc. VAT Calculated Column = SUM(Sales[Sales Exc. VAT])Here is the result:All rows are repeating the value 5435. This value is the total value of ‘Sales Excluding VAT’ that exists in the entire Sales table. The above result proves that Row Context does not automatically apply Filter Context.This occurs due to the Row Context iterating (visiting) every record in the Customer table and asking, “show me the Total Sales Amount” however no Filter Context exists. This means no filters are applied to the Sales table, at all! To ensure we understand this behaviour, let’s take it step by step:Go to Record 1 in Customer Table:What is the Total Sales Amount for Record 1?Sales Table has no filters, therefore return all sales.Go to Record 2 in Customer Table:What is the Total Sales Amount for Record 2?Sales Table has no filters, therefore return all sales.Go to Record 3 in Customer Table:What is the Total Sales Amount for Record 3?Sales Table has no filters, therefore return all sales.The all-important question here is: How do we apply Filter Context in the Row Context? This is done through:Context TransitionContext Transition is outside the scope of this read, however we will scrape the surface of this concept. I would highly recommend you check out the sources below, for a deeper dive into Context Transition.How do we apply Filter Context to Row Context?Row Context does not consider Filter Context. This is so important to remember that it is worth repeating. To convert Row to Filter Context we must use the power of the CALCUALTE Function! The CALCULATE Function with no filter parameters has the job of applying Context Transition.Look at the same DAX formula from above, only this time it is wrapped within the CALCULATE Function.Total Sales Exc. VAT (Calculated Column) = CALCUALTE(SUM(Sales[Sales Exc. VAT]))As you can see below, we get the correct result:The CALCULATE Function enables Filter Context; therefore, it converts the Row Context into Filter Context. The power of the CALCULATE Function! In more detail, the filters on the Customer table propagate down the relationship to the Sales Table for each record.Summary:If someone truly wants to learn Power BI, DAX is something that should not be avoided and for this reason Evaluation Context should be high on the to-learn list. This article should have armed you with the knowledge needed to interpret DAX behaviour, as well as easily digest more in-depth articles on Evaluation Context. @Data_LazSources:Matt Allington. (2018). Supercharge Power BI. Holy Macro! Books.Marco Russo & Alberto Ferrari (2015). The Definitive Guide to DAX. Microsoft Press Store.

Migrating to Native Scoring with SQL Server 2017 PREDICT

Microsoft introduced native predictive model scoring with the release of SQL Server 2017. The PREDICT function (Documentation) is now a native T-SQL function that eliminates having to score using R or Python through the sp_execute_external_script procedure. It's an alternative to sp_rxPredict. In both cases you do not need to install R but with PREDICT you do not need to enable SQLCLR either - it's truly native. PREDICT should make predictions much faster as the process avoids having to marshal the data between SQL Server and Machine Learning Services (Previously R Services). Migrating from the original sp_execute_external_script approach to the new native approach tripped me up so I thought I'd share a quick summary of what I have learned. Stumble One: Error occurred during execution of the builtin function 'PREDICT' with HRESULT 0x80004001. Model type is unsupported. Reason: Not all models are supported. At the time of writing, only the following models are supported: rxLinMod rxLogit rxBTrees rxDtree rxdForest sp_rxPredict supports additional models including those available in the MicrosoftML package for R (I was using attempting to use rxFastTrees). I presume this limitation will reduce over time. The list of supported models is referenced in the PREDICT function (Documentation). Stumble Two: Error occurred during execution of the builtin function 'PREDICT' with HRESULT 0x80070057. Model is corrupt or invalid. Reason: The serialisation of the model needs to be modified for use by PREDICT. Typically you might serialise your model in R like this: model <- data.frame(model=as.raw(serialize(model, NULL))) Instead you need to use the rxSerializeModel method: model <- data.frame(rxSerializeModel(model, realtimeScoringOnly = TRUE)) There's a corresponding rxUnserializeModel method, so it's worth updating the serialisation across the board so models can be used interchangeably in the event all models are eventually supported.  I have been a bit legacy. That's it.  Oh, apart from the fact PREDICT is supported in Azure SQL DB, despite the documentation saying the contrary.

Geographic Spatial Analysis with Azure Data Lake Analytics (ADLA)

Whilst working on an Azure Data Lake project, a requirement hit the backlog that could be easily solved with a Geographical Information System (GIS) or even SQL Server - Spatial data type support was introduced into SQL Server 2008. However, Azure Data Lake Analytics (ADLA) does not natively support spatial data analytics so we'll have to extract the data into another service right? Wrong ? :) Due to the extensibility of Azure Data Lake Analytics, we can enhance it to do practically anything. In fact, we can lean on existing components and enhance the service without having to develop the enhancement itself. This blog is a quick run through demonstrating how to enhance ADLA such that it will support Spatial analytics and meet our project requirement. Problem For simplicity I've trivialised the problem. Here's the requirement: Indicate which Bus Stops are within 1.5 km of Southwark Tube Station. To support this requirement, we have two datasets: A list of all the Bus Stops in London, including their Geo location (circa 20k records) The Geo location record of Southwark Tube Station (a single record !) In fact, the location of the tube station is pretty accurate and is geo located to the entrance pavement outside the tube station: This would be an easy problem for a GIS to solve. You would specify the central point i.e. our Southwark Tube station marker and draw a circle, or buffer, with a radius 1.5 km around it and select all bus stops that fall within or intersect with that circle. This spatial analysis is easy for these systems as that's essentially what they are built to do. SQL Server 2008 introduced the Spatial Data Type, this allowed spatial style analysis to be performed on geo data using T-SQL in conjunction with the supplied Geometry and Geography data types. More info on those can be found here So, how can we solve our problem in ADLA, without a GIS and without having to export the data to SQL Server?? Solution You can register existing assemblies with ADLA. It so happens that the SQL Server Data Types and Spatial assemblies are nicely packaged up and can be used directly within ADLA itself - think about that, it's pretty awesome ! Caveat: At the time of writing we have no idea of the licence implications. It will be up to you to ensure you are not in breach :) Those assemblies can be downloaded from here.  You only need to download and install the following file: ENU\x64\SQLSysClrTypes.msi This installs two key assemblies, which you'll need to grab and upload to your Data Lake Store: C:\Program Files (x86)\Microsoft SQL Server\130\SDK\Assemblies\Microsoft.SqlServer.Types.dll C:\Windows\System32\SqlServerSpatial130.dll Once they have been uploaded to your Data Lake Store, you need to register those assemblies with ADLA. DECLARE @ASSEMBLY_PATH string = "/5.UTILITY/USQL-Extend/SQL-Server/"; DECLARE @TYPES_ASM string = @ASSEMBLY_PATH+"Microsoft.SqlServer.Types.dll"; DECLARE @SPATIAL_ASM string = @ASSEMBLY_PATH+"SqlServerSpatial130.dll"; CREATE DATABASE IF NOT EXISTS SQLServerExtensions; USE DATABASE SQLServerExtensions; DROP ASSEMBLY IF EXISTS SqlSpatial; CREATE ASSEMBLY SqlSpatial FROM @TYPES_ASM WITH ADDITIONAL_FILES = ( @SPATIAL_ASM ); Following registration of the assemblies, we can see the registration loaded in the ADLA Catalog database we created: We are now ready to use this U-SQL enhancement in our U-SQL Query - let's go right ahead and solve our problem in one U-SQL Script. // Reference the assemblies we require in our script. // System.Xml we get for free as a System Assembly so we didn't need to register that and our SQLServerExtensions.SqlSpatial assembly REFERENCE SYSTEM ASSEMBLY [System.Xml]; REFERENCE ASSEMBLY SQLServerExtensions.SqlSpatial; // Once the appropriate assemblies are registered, we can alias them using the USING keyword. USING Geometry = Microsoft.SqlServer.Types.SqlGeometry; USING Geography = Microsoft.SqlServer.Types.SqlGeography; USING SqlChars = System.Data.SqlTypes.SqlChars; // First create the centralised point. // In this case it's the pavement outside the entrance of Southwark Tube Station, London. // Format is Longitude, Latitude and then SRID. // NB: It's Longitude then Latitude, that's the opposite way to what you might expect.. DECLARE @southwarkTube Geography = Geography.Point(-0.104777,51.503829,4326); // Next we extract our entire London bus stop data set from the file. // There's about 20k of them. @busStopInput = EXTRACT [StopCode] string, [StopName] string, [Latitude] double?, [Longitude] double? FROM @"/1.RAW/OpenData/Transport/bus-stops-narrow-full-london.csv" USING Extractors.Csv(skipFirstNRows:1,silent:true); // This is effectively the transform step and where the magic happens // Very similar syntax to what you would do in T-SQL. // We are returning all the bus stops that fall within 1500m of Southwark Tube // Essentially we return all stops that intersect with a 1500m buffer around the central tube point @closeBusStops= SELECT * FROM @busStopInput WHERE @southwarkTube.STBuffer(1500).STIntersects(Geography.Point((double)@busStopInput.Longitude,(double)@busStopInput.Latitude,4326)).ToString()=="True"; // The results are written out to a csv file. OUTPUT @closeBusStops TO "/4.LABORATORY/Desks/Sach/spatial-closebusstops.csv" USING Outputters.Csv(outputHeader: true); The query outputs a list of bus stops that are within the specified Spatial distance from Southwark Tube Station. If we have a look at all the bus stops (in red) and overlay all the 'close' bus stops (in green), we can see the results: Pretty neat. Azure Data Lake Analytics does not natively support spatial data analytics but by simply utilising the assemblies that ship with SQL Server, we can extend the capability of U-SQL to provide that functionality or practically any functionality we desire.

Friday Fun: GeoFlow does the Great South Run

GeoFlow was released to public preview yesterday; a new 3D visualization tool for Excel which allow users to create, navigate and interact with time-sensitive data applied to a digital map. Back in October last year, along with 25,000 other people, my good friend and colleague Tim Kent (@TimK_Adatis) and I ran the Great South Run; a 10 mile run around the City of Portsmouth on the south coast of England.  As it happened, we both wore GPS watches and using the data collected I've created a simple GeoFlow tour of the race. Tim is Green - I am Red - who wins...  there's only one way to find out ...... Run Race

SQL Server Delete does not release file space allocation

This one’s a short blog post, and by no means new, but it may help you in the future and maybe save some head scratching. Commonly in Data Warehousing projects you will be processing data through tables whilst staging and cleaning source data. These tables may commonly be heaps (no Clustered Index), that have no persistent data; rather they are a transient set of tables that get deleted from when your Warehouse loads each day. What you may not know is that when a heap is deleted from (as opposed to truncated) the space that was used by that data is not automatically de-allocated from the table, and when new data is added the table space is increased, even though the table may be “empty”. So what is actually going on? When deleting from a heap the database engine either locks the page or the row, as opposed to the whole table when deleting on a clustered index. During the delete on the heap the page is emptied, but the engine does not de-allocate the page. This behaviour only occurs on a heap, and can cause very large tables with no data in them. For my recent project, it caused the stage and clean processes to slow down gradually over time because the database engine was scanning huge tables with no data in them. So, what can we do to stop this behaviour? There are three options (as documented by Microsoft here (see Locking Behaviour)): 1. When deleting from the table obtain an exclusive lock on the whole table, by using a TABLOCK hint as follows:      DELETE FROM [TABLE] WITH (TABLOCK) WHERE [COLUMN] = 'CRITERIA' 2. If possible, truncate the whole table, this will de-allocate all pages. 3. Finally, my least favourite, add a clustered index to the table.  

Boot to VHD – Demo Environments and more

A few people have asked me about this recently, so I thought I’d share my approach. Creating demo environments, particularly for the MS BI Stack, can be time consuming and a challenge, particularly when you need to take the demo with you and can’t rely on powerful, internal servers and good client internet connectivity. A bit of history Initially we used to have a dedicated, decently specced, Demo laptop that would be installed with all the goodies that we would need to demo. This worked until the demo needed to be in two places at once, or you needed to carry around your day-to-day laptop too. The solution was to use the demo environment as our day to day workstation but it was massive overkill to have full blown Windows Server running SharePoint with multiple instances of SQL Server etc. and, unless you had a high spec machine, everything was a little laggy. The next approach was to carry around a couple of laptop hard disks that you’d swap in and out depending on whether you were demoing or working. This worked well for a good while but did prevent timely demos (no screwdriver, no demo). Then we entertained VirtualBox and Hyper-V and other virtualisation tech to run Virtual environments – this was all well and good but the primary downfall of this approach is the fact you need a really high spec machine to run both the host and the virtual environment or performance is going to be a major issue and for demos, you want performance to be good, as good as possible. Then we discovered Boot to VHD. I’m not sure when this was first possible and I definitely believe we were late to the game but we’ve been using it for around 12 months, long enough to prove it to be a solid approach to creating and running [not only] demo environments. Boot to VHD The concept is easy, and “does what it says on the tin”. You create, or acquire a VHD and configure your laptop to Boot directly to the VHD. Advantages 1) The VHD can use all the host resources. Under traditional virtualisation approaches you need to split memory and/or processors which impacts performance. So on an 8GB, 2 proc laptop traditionally you would have 4GB, 1 proc for the host and 4GB, 1 proc for the virtual environment. With Boot to VHD the virtual environment can utilise the full 8GB and both processors. 2) It’s flexible. I have a chunky external HHD containing several different virtual environments for different purposes. I can backup, swap out, replace and roll-back environments in the time it takes to copy a VHD from local to external or vice-versa. You can even share demo environments with you colleagues. 3) You always have a demo environment to hand. All it takes is a reboot to load up the appropriate environment for those spontaneous demo opportunities. Disadvantages 1) You do need to be careful regarding disk space usage and be very disciplined to ensure you always have enough disk space available. If you are running a number of large environments there will be an element of maintenance to ensure everything always fits. 2) Without resorting to a hybrid approach, you can’t demo a distributed system working together. Setup So to make use of Boot to VHD, we’ll assume we already have a VHD available and ready for booting to. These can either be created manually, acquired from your internal infrastructure team, or from other third-parties. When creating them manually I ALWAYS create “Dynamically Expanding” virtual hard disks. This way, you can actually store more VHD environments on your laptop than you would otherwise. Although dynamically expanding disks allow you to store more environments, you will still need to ensure you have enough disk space for the disk to expand into as this will be required at the time of boot up. So, if your VHD is set to a 100GB dynamically expanding disk, (it might only be a 20GB file), but when it’s booted up, it will expand to 100GB, so you will need that space on your hard disk or the boot up will fail. 1) Copy the VHD to your laptop to a suitable location e.g. C:\VHD 2) Create a new Boot entry Run the following at a command prompt as an Administrator: bcdedit /copy {current} /d "My New VHD Option" Be sure to update the label to something to help you identify the VHD – this label will appear on the boot menu when you reboot. Note the new GUID that has been created. 3) Using the GUID created for you in the previous step and the location of the VHD, run the following three commands, one after the other bcdedit /set {23dd42c1-f397-11e1-9602-923139648459} device vhd=[C:]\VHD\AdatisBI.vhd bcdedit /set {23dd42c1-f397-11e1-9602-923139648459} osdevice vhd=[C:]\VHD\AdatisBI.vhd bcdedit /set {23dd42c1-f397-11e1-9602-923139648459} detecthal on Note the square brackets around the drive letters, these are required. If you have spaces in your path, or filename, you’ll need to wrap the path, excluding the drive letter, in quotes.e.g. ..vhd=[C:]"\VHD Path\AdatisBI.vhd" That’s all there is to it. Reboot and you should be presented with a new Boot option and away you go. Troubleshooting When it doesn’t work you generally get a BSOD on boot up. To date I’ve identified two reasons for this: 1) You don’t have enough disk space for the VHD to expand (The BSOD actually does inform you of this) 2) You may need to change the SATA Mode configuration in the BIOS. Depending on how and where the VHD was created you may need to change the setting to either ATA or AHCI. If that works, you’ll have to reverse the change to reboot into your physical installation. I’ve yet to create a portable (i.e. sharable amongst colleagues) VHD for Windows 8. I have successfully created a Windows 8 VHD but it only currently works on the laptop it was created on, this is unlike any other VHD I have created in the past. If I work out a fix, I will update this post. Additional Information There are a couple of extra benefits that are worth pointing out. 1) Once you’ve booted to VHD, your original, physical OS installation drive is reallocated, normally to drive D (Your VHD will assume C drive). This allows you to share files between environments, or as I do, place my SkyDrive folder on an accessible location on the original, physical drive. This allows me to have SkyDrive installed on VHDs but only have a single copy of the contents on my HDD. 2) The reverse is true too. You can attach a VHD (from the physical install, or from within another VHD) using the Disk Management tool to access, move or copy files between environments. The disk is expanded at this point so you will need enough disk space to accommodate it. 3) If disk space is a premium, you can shrink the VHD using a tool such as VHD Resizer. It doesn’t resize the physical VHD file, but allows you to reduce the size of the virtual hard disk. It also allows you to convert from Fixed to Dynamic disks and vice-versa. 4) You can remove boot entries with the following (or you can use the System Configuration tool): bcdedit /delete {GUID} 5) I have found this approach so reliable my day-to-day Windows 7 installation is a VHD. I have not noticed any impact to performance. The only thing that I have noticed is that you cannot determine a “Windows Experience Index” when running a VHD – but I can live with that

SQL Server 2012 : Columnstore Index in action

One of the new SQL Server 2012 data warehouse features is the Columnstore index. It stores data by columns instead of by rows, similar to a column-oriented DBMS like the Vertica Analytic Database and claims to increase query performance by hundreds to thousands of times. The issue with indexes in a data warehouse environment is the number and broad range of questions that the warehouse may have to answer meaning you either have to introduce a large number of large indexes (that in many cases results in a larger set of indexes than actual data), plump for a costly spindle-rich hardware infrastructure, or you opt for a balanced hardware and software solution such as a Microsoft SQL Server 2008 R2 Fast Track Data Warehouse or a HP Business Data Warehouse Appliance where the approach is ‘index-light’ and you rely on the combination of high throughput and performance power to reduce the dependency on the traditional index. The Columnstore index is different in that, when applied correctly, a broad range of questions can benefit from a single Columnstore index, the index is compressed (using the same Vertipaq technology that PowerPivot and Tabular based Analysis Services share) reducing the effort required on the expensive and slow disk subsystem and increasing the effort of the fast and lower cost memory/processor combination. In order to test the claims of the Columnstore index I’ve performed some testing on a Hyper-V instance of SQL Server 2012 “Denali” CTP3 using a blown up version of the AdventureWorksDWDenali sample database. I’ve increased the FactResellerSales table from approximately 61,000 records to approximately 15.5 million records and removed all existing indexes to give me a simple, but reasonably large ‘heap’. Heap With a clear cache, run the following simple aggregation: SELECT     SalesTerritoryKey     ,SUM(SalesAmount) AS SalesAmount FROM    [AdventureWorksDWDenali].[dbo].[FactResellerSales] GROUP BY     SalesTerritoryKey ORDER BY      SalesTerritoryKey Table 'FactResellerSales'. Scan count 5, logical reads 457665, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 7641 ms, elapsed time = 43718 ms Non-Clustered Index Before jumping straight in with a columnstore index, let’s review performance using a traditional index. I tried a variety of combinations, the fastest I could get this query to go was to simply add the following: CREATE NONCLUSTERED INDEX [IX_SalesTerritoryKey] ON [dbo].[FactResellerSales] (    [SalesTerritoryKey] ASC ) INCLUDE ([SalesAmount]) WITH (     PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF,     DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON,     ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 100, DATA_COMPRESSION = PAGE ) ON [PRIMARY] GO Notice I have compressed the index using page compression, this reduced the number of pages my data consumed significantly. The IO stats when I re-ran the same query (on a clear cache) looked like this: Table 'FactResellerSales'. Scan count 5, logical reads 26928, physical reads 0, read-ahead reads 26816, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 6170 ms, elapsed time = 5201 ms. Much better! Approximately 6% of the original logical reads were required, resulting in a query response time of just over 5 seconds. Remember though, this new index will really only answer this specific question. If we change the query, performance is likely to fall off the cliff and revert back to the table scan. Incidentally, adopting an index-light ([no index]) approach and simply compressing (and reloading to remove fragmentation) the underlying table itself, performance was only nominally slower than the indexed table with the added advantage of being able to perform for a large number of different queries. (Effectively speeding up the table scan. Partitioning the table can help with this approach too.) Columnstore Index Okay, time to bring out the columnstore. The recommendation is to add all columns into the columnstore index (Columnstore indexes do not support ‘include’ columns), practically there may be a few cases where you do exclude some columns. Meta data, or system columns that are unlikely to be used in true analysis are good candidates to leave out of the columnstore. However, in this instance, I am including all columns: CREATE NONCLUSTERED COLUMNSTORE INDEX [IX_Columnstore] ON [dbo].[FactResellerSales] (     [ProductKey],     [OrderDateKey],     [DueDateKey],     [ShipDateKey],     [ResellerKey],     [EmployeeKey],     [PromotionKey],     [CurrencyKey],     [SalesTerritoryKey],     [SalesOrderNumber],     [SalesOrderLineNumber],     [RevisionNumber],     [OrderQuantity],     [UnitPrice],     [ExtendedAmount],     [UnitPriceDiscountPct],     [DiscountAmount],     [ProductStandardCost],     [TotalProductCost],     [SalesAmount],     [TaxAmt],     [Freight],     [CarrierTrackingNumber],     [CustomerPONumber],     [OrderDate],     [DueDate],     [ShipDate] )WITH (DROP_EXISTING = OFF) ON [PRIMARY] Now when I run the query on a clear cache: Table 'FactResellerSales_V2'. Scan count 4, logical reads 2207, physical reads 18, read-ahead reads 3988, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 235 ms, elapsed time = 327 ms. I think the figures speak for themselves ! Sub-second response and because all columns are part of the index, a broad range of questions can be satisfied by this single index. Storage The traditional (compressed) non-clustered index takes up around 208 MB whereas the Columnstore Index comes in a little less at 194 MB so speed and storage efficiency, further compounded when you take into account the potential additional indexes the warehouse may require. So, the downsides? Columnstore indexes render the table read-only. In order to to update the table you either need to drop and re-create the index or employ a partition switching approach. The other notable disadvantage, consistently witnessed during my tests, is the columnstore index takes longer to build. The traditional non-clustered index took approximately 21 seconds to build whereas the columnstore took approximately 1 minute 49 seconds. Remember though, you only need one columnstore index to satisfy many queries so that’s potentially not a fair comparison. Troubleshooting If you don’t notice a huge difference between a table scan and a Columnstore Index Scan, check the Actual Execution Mode of the Columnstore Index Scan. This should be set to Batch, not Row. If the Actual Execution Mode is reporting Row then your query cannot run in parallel: - Ensure, if running via Hyper-V, you have assigned more than one processor to the image. - Ensure the Server Property ‘Max Degee of Parallelism’ is not set to 1. Summary In summary, for warehousing workloads, a columnstore index is a great addition to the database engine with significant performance improvements even on reasonably small data sets. It will re-define the ‘index-light’ approach that the SQL Server Fast Track Data Warehouse methodology champions and help simplify warehouse based performance tuning activities. Will it work in every scenario? I very much doubt it, but it’s a good place to start until we get to experience it live in the field.

SQL Server 2012 Licensing

Today saw the announcement of how SQL Server 2012 will be carved up and licensed, and it's changed quite a bit. There are three key changes: 1) There's a new Business Intelligence Edition that sits between Standard and Enterprise 2) No more processor licensing. There's a move to Core based licensing instead (with a minimum cost of 4 cores per server) 3) Enterprise is only available on the Core licensing model (Unless upgrading through Software Assurance *) Enterprise, as you would expect, has all the functionality SQL Server 2012 has to offer. The Business Intelligence edition strips away - Advanced Security (Advanced auditing, transparent data encryption) - Data Warehousing (ColumnStore, compression, partitioning) and provides a cut-down, basic (as opposed to advanced) level of High Availability (AlwaysOn). In addition, the Standard Edition removes - Enterprise data management (Data Quality Services, Master Data Services), - Self-Service Business Intelligence (Power View, PowerPivot for SPS) - Corporate Business Intelligence (Semantic model, advanced analytics) If you are utilising 4 core processors, licence costs for Standard ($1,793 per core, or $898 per Server + $209 per CAL) and Enterprise ($6,874 per core) remain similar (ish).  However, you will be stung if you have more cores. The Business Intelligence edition is only available via a Server + CAL licence model and it's apparent that Microsoft are placing a big bet on MDS/DQS, Power View, PowerPivot for SharePoint and BISM as the licence for the Business Intelligence edition is $8,592 per server, plus $209 per CAL, that's nearly 10x more per server than Standard Edition ! For the complete low-down check out these links: Editions Overview: Licensing Overview: Licence Detail (including costs): * If you are currently running Enterprise as a Server + CAL and you upgrade to SQL 2012 through Software Assurance, you can keep Server + CAL model, providing you don’t exceed 20 cores.

Microsoft Tech-Ed 2010 BI Conference, New Orleans Day 2 (Tuesday 8th June 2010)

Day 2 and the BI Keynote. Announcements? Only two, although actually, old news: - They announced the availability of the MS BI Indexing Connector. Originally announced back in May - They got their story straight(er) with regard to the release of what will be called Pivot Viewer Extensions for Reporting Services. It will be available in 30 days. The session took more of a “look where we’ve come since the Seattle BI Conference” and, as Ted Kummert described, it’s Microsoft’s BI [School] Report Card. Interesting change in semantics for their BI strap line; no longer do they spout “BI for the Masses”, now it’s “BI for Everyone”. Although they admitted they, along with the rest of the industry are falling well short at only a current average of 20% ‘reach’. With the recent delivery of SQL Server 2008 R2, Sharepoint 2010 and Office 2010 the BI Integration story is significantly more complete. A large focus on PowerPivot and how it has helped customer quickly deliver fast, available reporting ‘applications’. Although I know a few people that would object to simply describing DAX purely as a familiar extension to the Excel formula engine. Following the look back, a brief look forward: - Cloud Computing will pay a part, Reporting and Analytics will be coming, when combined with Windows AppFabric, described yesterday this is a closer reality. - Consumerisation enhancements, with better search and improved social media integration BI will move towards becoming a utility. - Compliance: Several plans; Improved Data Quality, Data Cleaning and Machine Learning and strong meta data strategy support to deliver lineage and provide change impact analysis. - Data Volumes. SQL Server Parallel Data Warehouse Edition has completed CTP2, this will open up high performance datawarehousing to data volumes that exceed 100TB. Dallas, the data marketplace will be better integrated to development and reporting tools. Then tempted us with some previews of what *could* make a future version of SQL Server. Essentially, the theme for the future is to join the dots between Self Service BI and the Enterprise BI Platform and focussed on plans around PowerPivot: - KPI creation Essentially they are exposing (yet another) way to create (SSAS based) KPI’s through a neat, slider based GUI directly from within the PowerPivot Client. - Wide Table Support To help with cumbersome wide PowerPivot tables, they have introduced a ‘Record View’ to help see all the fields on one screen, all appropriately grouped with edit/add/delete support for new fields, calculations etc. - Multi Developer Support They plan to integrate the PowerPivot client into BIDS. This will facilitate integration with Visual SourceSafe for controlled multi developer support, they also plan to provide a lineage visualisation to help with audit and impact change analysis. - Data Volumes Following on from the BIDS integration, plans surrounding deployment to server based versions of SSAS to allow increased performance for higher data volumes. They replayed the demo of the 2m row data set from Seattle where we first saw almost instant sort and filtering, but this time applied it (with equally impressive performance) to a data set of more than 2bn records!  It was described by Amir Netz as “The engine of the devil!” ;)

Microsoft Tech-Ed 2010 BI Conference, New Orleans Day 1 (Monday 7th June 2010)

The Tech-Ed 2010 Conference kicked off today with the Keynote session.  The BI Keynote session is tomorrow but today's keynote did incorporate a small BI Element.  No huge announcements, but some announcements all the same. - Unsuprisingly, Cloud computing dominated the keynote.  Highlighting application Integration of Cloud apps & data with on-premise data e.g. Active Directory and business operational systems data to demonstrate "real-world" cloud computing solutions. - July will see a release of Service Pack 1 for Windows 7 and Windows Server 2008 R2 - Windows Server AppFabric, Application Role Extensions to, for example, faciliate Cloud to on premise integration capability is now RTM - Windows Intune, Cloud based PC management environment - No date set, but Internet Explorer 9 will focus on performance (Graphics accelleration) and new web standards, and is probably a response to Google speedy Chrome claims - The Microsoft Live Labs "Pivot" research project, is to hit the mainstream.  They were a little cagey around dates, but possibly this month. Maybe some more BI specific announcements tomorrow...        

Creating a Custom Gemini/PowerPivot Data Feed – Method 1 – ADO.NET Data Services

There are already a few good Gemini/PowerPivot blogs that provide an introduction into what it is and does so there is no need for repetition.  What I haven’t seen are examples of how existing investments can be harnessed for Gemini/PowerPivot based self-service analytics. This series of posts focuses on various ways of creating Custom Data Feeds that can be used by Gemini/PowerPivot natively – Providing a direct feed from otherwise closed systems opens up new channels of analytics to the end user. Gemini/PowerPivot supports reading data from Atom-based data feeds, this post looks at a quick way of creating an Atom-based feed that can be consumed by Gemini/PowerPivot.  By far the simplest way to develop an Atom-based data feed is to employ ADO.NET Data Services in conjunction with ADO.NET Entity Framework.  With very few (in fact one and a bit!) lines of code, a data source can be exposed as a feed that Gemini/PowerPivot can read natively.  I am going to use the AdventureWorksDW sample hosted by a SQL Server 2008 R2 instance for this – obviously Gemini/PowerPivot natively reads SQL Server databases, so creating a custom feed over the top may seems a little pointless.  However, this technique may be useful for quick wins in several scenarios, including: - Preventing the need for users to connect directly to the underlying data source. - Restricting access to various elements of the data source (tables/columns etc) - Applying simple business logic to raw data. ADO.NET Data Services are a form of Windows Communication Foundation (WCF) services, and therefore can be hosted in various environments.  Here, I will simply host the ADO.NET Data Service inside an ASP.NET site. To create a Native Gemini/PowerPivot feed, you take seven steps: 1 - Create ASP.NET Web Application 2 - Create Entity Data Model 3 - Create the Schema 4 - Create the Data Service 5 - Load From Data Feed 6 - Create Relationships 7 - Test Step 1) Create ASP.NET Web Application I’m using Visual Studio 2008 here to create an ASP.NET Web Application. Step 2) Create Entity Data Model Add an ADO.NET Entity Data Model item to the project, these files have a .edmx extension and allow us to create a schema that maps to the underlying database objects. Step 3) Create the Schema We simply require a 1:1 mapping so will ‘Generate from Database’.  Incidentally, the ‘Empty Model’ option allows you to build a conceptual model of the database resulting in custom classes that can be optionally mapped to the database objects later. Create a Microsoft SQL Server connection to AdventureWorksDW2008. Select the appropriate database objects, I’ve selected the following tables: - DimCurrency - DimCustomer - DimDate - DimProduct - DimPromotion - DimSalesTerritory - FactInternetSales Once the wizard has completed, a new .edmx and associated cs file is created that respectively contain an Entity Relationship Diagram and a set of Auto Generated Classes that represent the database objects. Due to the way the Entity Framework handles Foreign Key Constraints we have to apply a workaround to ensure the Foreign Keys on the FactInternetSales table are exposed and brought into Gemini/PowerPivot.  A previous post Exposing Foreign Keys as Properties through ADO.NET Entity Framework walks through the workaround.     Step 4) Create the Data Service Add an ADO.NET Data Service item to the project. The service class inherits from a generic version of the System.Data.Services.DataService object, so we need to inform the compiler what class to base the generic object on.  We essentially want to base our Data Service on the class representing our newly created Entity Data Model.  The class name is derived from the database name, unless changed when the Entity Data Model was created, so in our case the class name is AdventureWorksDW2008Entities. The auto generated service class contains a ‘TODO’ comment that asks you to ‘put your data source class name here’.  The comment needs replacing with AdventureWorksDW2008Entities. The final step is to expose the resources in the Entity Data Model.  For security reasons, a data service does not expose any resources by default.  Resources need to be explicitly enabled. To allow read only access to the resources in the Entity Data Model the InitializeService method needs updating with a single line of code.  The code snippet below details the final class implementation, notice the AdventureWorksDW2008Entities reference at line 1 and the the explicit resource enablement at line 6. Code Snippet public class GeminiDataService : DataService<AdventureWorksDW2008Entities>     {         // This method is called only once to initialize service-wide policies.         public static void InitializeService(IDataServiceConfiguration config)         {             config.SetEntitySetAccessRule("*", EntitySetRights.AllRead);         }     } That’s all that’s needed, by default, ADO.NET Data Services conform to the Atom standard, so in theory the Service is ready to be consumed by Gemini/PowerPivot. Before we try, it’s worth giving the service a quick test, building and running the solution (F5) launches Internet Explorer navigating to the Service hosted by the ASP.NET Development Server. You are first presented with an XML document containing elements that represent database objects, you can further drill into the objects by amending the URL.  For example, if you want to see the contents of the DimPromotion table then append DimPromotion to the end of the URL: http://localhost:56867/GeminiDataService.svc/DimPromotion (Case sensitive) Note:  You may need to turn off Feed Reader View in Internet Explorer to see the raw XML (Tools->Internet Options–>Content->Settings–>Turn On Feed Reader View – make sure this is unchecked) As a slight aside, the URL can be further enhanced to, filter, top n rows, extract certain properties etc etc. Here are a couple of examples: URL Effect http://localhost:56867/GeminiDataService.svc/DimCustomer?$top=5 Return the top 5 Customers http://localhost:56867/GeminiDataService.svc/DimCustomer(11002) Return Customer with id 11002 http://localhost:56867/GeminiDataService.svc/DimCustomer(11002)/FirstName Return the First Name of Customer 11002 http://localhost:56867/GeminiDataService.svc/DimProduct(310)?$exapnd=FactInternetSales Returns Product with id 310 and all related Internet Sales Records Confident that the feed is working, we can now deploy the service, and start using the feed in Gemini/PowerPivot.  Step 5) Load From Data Feed Open up Excel 2010, launch the Gemini/PowerPivot Client (by selecting ‘Load & Prepare Data’) Select ‘From Data Feed’ from the ‘Get External Data’ section of the Gemini/PowerPivot Home Ribbon to launch the Table Import Wizard. Specify the Url from the ADO.NET Data Services feed created earlier, in my case: http://localhost:56867/GeminiDataService.svc as the 'Data Feed Url’ and click Next. Incidentally, you can use the majority of the enhanced Urls to, for example only select the DimProduct table should you so wish, however by specifying the root Url for the service you have access to all objects exposed by the service. From the Table Import Wizard Select the required tables, in my case I’ll select them all.  (You can optionally rename and filter the feed objects here too). Following the summary screen, the Gemini/PowerPivot Client then gets to work importing the data from the ADO.NET Data Service: Once completed, Gemini/PowerPivot displays all the data from all of the feed objects as if it came directly from the underlying database. Step 6) Create Relationships There is one final step before we can test our model using an Excel Pivot Table.  We need to create the relationships between the tables we have imported.  The Gemini/PowerPivot Client provides a simple, if a little onerous way of creating relationships, the ‘Create Relationship’ action on the Relationships section of the Home Ribbon launches the Create Relationship wizard: Each table needs relating back to the primary Fact table which results in the following relationships: Step 7) Test We are now ready to start our analysis, selecting PivotTable from the View section of the Gemini/PowerPivot Client Home ribbon creates a pivot table in the underlying Excel workbook attached to your custom fed Gemini/PowerPivot data model.         So, to allow fast access to, for example, potentially sensitive data, through Gemini/PowerPivot you can quickly build a custom data feed that can be consumed natively by the Gemini/PowerPivot Client data feed functionality.

HACK: Exposing Foreign Keys as Properties through the ADO.NET Entity Framework

First post for months; the PerformancePoint Planning announcement forced some redirection and rebuilding.  We’ve grieved, we’ve moaned, but at some point, you just have to move on. ----------------- I’m not a fan of hacks – it normally means you are doing something wrong, but in this case, where I’m after a quick win, I’ve had to work out and resort to a bit of a hack.  It actually looks like the issue I’m facing maybe addressed in Entity Framework v2 (Microsoft .NET 4.0) – so maybe it’s more of a workaround than a hack after all ;o) I’m using the ADO.NET Entity Framework and ADO.NET Data Services to expose a subsection of a database for consumption by Gemini.  In order to relate the exposed database objects together in Gemini, I need to apply this hack to ensure I have Foreign Keys available in my Gemini models to support creating the relationships.  By default, the Entity Framework exposes Foreign Keys as Navigation Properties rather than Scalar Properties.  Gemini does not consume Navigation Properties. Lets take the scenario where I want to create an Entity Framework Model based on the following tables from the AdventureWorksDW2008 sample database: -FactInternetSales -DimCustomer -DimProduct -DimSalesTerritory Step 1)  Identify the table(s) that contain Foreign Keys.  In this case FactInternetSales. Step 2)  Load those table(s) into the Entity Framework Model on their own.  This ensures the Foreign Keys are set as Scalar Properties.  If you load in all the tables at once, the Foreign Keys are not exposed as Scalar Properties. Step 3)  Load in the related tables. (DimCustomer, DimProduct, DimSalesTerritory) At this point a bunch of Navigation Properties would have been set up, along with relationships between the related tables but the trouble now is the project will no longer build.  If you try you receive the following error for each relationship: Error 3007: Problem in Mapping Fragments starting at lines 322, 428: Non-Primary-Key column(s) [CustomerKey] are being mapped in both fragments to different conceptual side properties - data inconsistency is possible because the corresponding conceptual side properties can be independently modified. Step 4) Manually remove the relationships between tables. Clicking on the relationship line on the diagram and hitting delete, removes the relationship. Step 5) Remove all Association Sets By editing the edmx file manually in a text or XML editor you need to remove all <AssociationSet>…</AssociationSet> occurrences from the <EntityContainer> section: <EntityContainer Name="AdventureWorksDW2008Model1StoreContainer">     <EntitySet Name="DimCustomer" EntityType="AdventureWorksDW2008Model1.Store.DimCustomer" … />     <EntitySet Name="DimProduct" EntityType="AdventureWorksDW2008Model1.Store.DimProduct" … />     <EntitySet Name="DimSalesTerritory" EntityType="AdventureWorksDW2008Model1.Store.DimSalesTerritory" … />     <EntitySet Name="FactInternetSales" EntityType="AdventureWorksDW2008Model1.Store.FactInternetSales" … />     <AssociationSet Name="FK_FactInternetSales_DimCustomer" Association="AWDW08.FK_FactInternetSales_DimCustomer">         <End Role="DimCustomer" EntitySet="DimCustomer" />         <End Role="FactInternetSales" EntitySet="FactInternetSales" />     </AssociationSet>     <AssociationSet Name="FK_FactInternetSales_DimProduct" Association="AWDW08.FK_FactInternetSales_DimProduct">         <End Role="DimProduct" EntitySet="DimProduct" />         <End Role="FactInternetSales" EntitySet="FactInternetSales" />     </AssociationSet>     <AssociationSet Name="FK_FactInternetSales_DimSalesTerritory" Association="ADW08.FK_FactInternetSales_DimSalesTerritory">         <End Role="DimSalesTerritory" EntitySet="DimSalesTerritory" />         <End Role="FactInternetSales" EntitySet="FactInternetSales" />     </AssociationSet> </EntityContainer> The project should now build, with the foreign keys exposed as Scalar Properties.  Obviously no inherent relationships exist, so this could be dangerous in certain applications.  For Gemini however, providing you setup the relationships manually, it works a treat.

RIP PerformancePoint Planning

It's nearly a week since the announcement that shook the (PPS) world !  It's been a bit difficult to report on; generally the Adatis blogs try and offer solutions to problems we have encountered out in the real-world.  Now I could say something crass here about the real-world and the decision makers involved...but that would be childish right? If I was to offer up my feelings, they wouldn't be that far from Alan Whitehouse's excellent post on the subject.  If I had an ounce of class about me, it would be much more aligned with Adrian's poignant discussion opener, the one with the sharp-witted title, but alas.... We've spent the best part of the week speaking to customers, partners and Microsoft about what to do next.  The timing was choice - would you believe, we actually had three new PerformancePoint Planning phases kicking off this week, according to my project plan - I should be setting up Kerberos as we speak..  [There is always a positive right?] Some customers are carrying on regardless, they... ...already have planning deployments and are too far invested and dependent to back out at this stage or,  ...have a short-term view (That's not a criticism) and need a "quick" fix with a low TCO to get them through some initial grief.  (Typically these customers are going through rapid organisational change, or form part of a recent acquisition and, to help them see the wood from the trees during the transition, require short/sharp solutions) Other customers, with longer-term views, feel the product, or more importantly, the suitably skilled resource pool, will drain away far quicker than the life-span of the much touted Microsoft product support.  I have to agree - Fact - Adatis will not be employing or training anymore PerformancePoint Planning Consultants.  I doubt many other consulting firms will either. It's those customers with the longer-term view that are the ones currently in limbo - they are experiencing pain, they need pain relief, what should they do - wait and see what Office 14/15 offers? (There is talk of some planning functionality appearing in future Office versions - what truth there is in that..?). The Dynamics customers could wait for the resurrection of Forecaster - I do have information on good authority that they will be developing Forecaster to be closer, in terms of flexibility, to PPS Planning.  I had originally heard the opposite view in that Forecaster will be replaced with a cut down version of PPS Planning.  Either way, I'm sure some of the PPS Planning code-base will be utilised, which could end rumours of PPS Planning being 'given' to the community as some form of community/open-source arrangement.  An arrangement that is, in my opinion, a non-starter anyway, "Hey, Mr FD, We've got this great open-source budgeting and forecasting product we think you should implement!" - yeah right ! Another rumour (and mixed message) is that Service Pack 3 will contain some of the requested features that were earmarked for version 2 (After all, the code has already been written, right?) this rumour was actually started by Guy Weismantel in his Announcement Video.  However, the information I have since received, clearly states that Service Pack 3 will contain stability and bug fixes only - so which is it to be?  It's unlikely for a service pack to contain new features, but it's not unheard of; anyone remember the original release of Reporting Services?  That arrived as part of a service pack for SQL Server 2000. The burning question I cannot get answered is, have Microsoft actually stepped out of the BPM market for good?  We are told that Excel, Sharepoint and SQL Server provide BPM - I can't see, without Planning, how they can.  Short of hard-coded values, renewed Sharepoint/Excel hell, another vendor or bespoke planning solution, businesses can't set plans which have further reaching implications; effectively Planning's demise is also, effectively, shelving the Scorecard/KPI functionality from the M&A toolset too !  It will be interesting to see the new Monitoring & Analytics Marketing, will they still demo Strategy Maps and Scorecards, or will they now focus on Decomposition trees and Heat maps? Monitoring & Analytics may, in practice, just become Analytics.. I would have thought the cost of continuing to develop the product (even if it were a lemon, which Planning certainly wasn't)  is far less than the potential loss of revenue that Microsoft will face due not only to the loss of confidence by its customers (who are going to think twice about investing in any Microsoft product now, let alone a V1) but perhaps more significantly, the doors it opens to it's competitors who can offer a complete BI\BPM stack.  Planning was foot in the customer's door for BI - once you put planning in, the customer had already bought the full BI stack, and in most cases, our customers were wowed by what they could now achieve.  I suspect Cognos and SAP are still partying now!

PerformancePoint SP2 - Planning Fixes and a mini-feature

Jeremy has already announced the release of PerformancePoint Server SP2 and it's great to see that the PPS dev team hit their target release date !  I've spent a little commute time this morning checking out the documentation, admittedly I've initially focused on the Planning component and there are no great surprises (Tim has already told you about the new bits) but I have spotted what could arguably be described as a mini-feature surrounding form validation that I'm sure that will come in useful. As you would expect, previously released hot fixes have been packaged up into this service pack: 954710 Description of the PerformancePoint Server 2007 hotfix package: July 1, 2008 955432 Description of the PerformancePoint Server 2007 hotfix package: July 14, 2008 955751 Description of the PerformancePoint Server 2007 hotfix package: July 28, 2008 956553 Description of the PerformancePoint Server 2007 hotfix package: August 21, 2008 Plus fixes to issues not previously addressed: Excel Add-In Related You locally save and close a form in PerformancePoint Add-in for Excel. When you reopen the form, you are prompted to update the form. However, you expect that you are not prompted to update the form because the form is already up to date. In PerformancePoint Add-in for Excel, you open an offline form assignment. In the form assignment, you add a link to an external Excel worksheet in a cell. Then, you submit the changes to the PerformancePoint Planning Server database. However, when you reopen the assignment, the link that you added is not retained. After you install PerformancePoint Server 2007 Service Pack 1, you create a page filter in PerformancePoint Add-in for Excel. You have a user in PerformancePoint Server 2007 that does not have permission to the default member of the page filter. However, the user has permission to other leaf members in the page filter. When the user opens a report that uses this page filter, the user receives the following error message: Cannot render the <MatrixName> matrix. The server returned the following error: The <CubeName> cube either does not exist or has not been processed. However, in the release version of PerformancePoint Server 2007, the next member that the user has access to will be automatically selected for use in the page filter. You define data validation in a worksheet of Excel. However, you can still submit a form in PerformancePoint Add-in for Excel if data in the form is not validated. You have a matrix that is based on a large and complex model in PerformancePoint Add-in for Excel. You open the Select Filters dialog box to change a page filter for this matrix. When you click the Value column of the filter, the dialog box that displays the dimension members takes a long time to display. Business Rules Related After you migrate an application in PerformancePoint Server 2007 from one server to another server, the order of user-defined business rules and system business rules in models is not preserved. You cannot use the datamember function in the ALLOCATE statement and in the TRANSFER statement. Consider the following scenario. You create an automatic rule that uses MdxQuery implementation or Native MdxQuery implementation in Planning Business Modeler. Then you submit changes to the source data that the rule uses from an assignment form. The submission causes the model to be reprocessed. Because model reprocess causes rules in the automatic rule set to be executed, you expect that the target data of the automatic rule will reflect the change by the form submission. However, after the model is reprocessed, the target data of the automatic rule does not reflect the change. Rule expression of system business rules uses dimension member names instead of dimension member labels in PerformancePoint Server 2007. Planning Business Modeler Related You have a model that contains many form templates and assignments. When you try to change objects in the model in Planning Business Modeler, Planning Business Modeler crashes. You create a member property of the Date data type in a dimension in PerformancePoint Server 2007. Additionally, you specify the Set value to Null option when you create the member property. When you retrieve the value of this member property, you obtain a value of 1899-12-31T00:00:00. However, you expect that you obtain a value of blank. You cannot schedule recurring jobs for a frequency that is less than an hour. When a user updates a business rule in Planning Business Modeler, the audit log file of PerformancePoint Server 2007 logs the user ID of the user that created the rule. However, you expect that the audit log file logs the user ID of the user that updated the rule. Consider the following scenario. You create a dimension that has no hierarchy in a localized version of PerformancePoint Server 2007. Then you perform one of the following operations: You run the bsp_DI_CreateHierarchyLabelTableForDimension stored procedure to create label-based hierarchy table for the dimension. You perform the Prepare the Staging DB operation in PerformancePoint Planning Data Migration Tool. In this scenario, you receive the following error message: A problem was encountered while attempting to connect to, or Execute BSP on, the specified Database For more information regarding this error please review the Application Event Log on the SQL Server for any "MSSQLSERVER ERRORS" and\or Please check that all parameters in the UI are correct and try again

PerformancePoint Planning: Deleting a Custom Member Property - A Solution

I had a bit of a rant yesterday about the fact I have had to compromise naming member properties when I've inadvertently created them with the wrong data type.  As I mentioned, I found a Dimension attribute collection method in the Planning client assemblies that hinted that it might allow me to delete a member property so I decided to give it a go. Below is some really rough and ready C# code that actually does delete a dimension member property.  I will improve the code and probably add it in to my PPSCMD GUI interface as a 'feature pack' bonus at some stage, however, if you are in desperate need for the code to delete a member property, and you can't wait for PPSCMD GUI v0.2 or PerformancePoint Version 2 (I'm not sure which will come first !) the code is below (Use at your own risk !!) Note:  Replace "MyApp", "MyDimension", "MyAttribute", oh, and the server address, accordingly.. using Microsoft.PerformancePoint.Planning.Client.Common; using Microsoft.PerformancePoint.Planning.Bmo.Core; .. // Setup the PPS Application Metadata Manager ServerHandler serverHandler = new ServerHandler("http://localhost:46787"); MetadataManager manager = new MetadataManager(); manager.ServerHandler = serverHandler; manager.ServerHandler.Connect(); // Get the system metadata BizSystem system = manager.GetSystem(true); // Get hold of the PPS Application BizApplication ppsApp = system.Applications["MyApp"]; // Obtain the root model site from the application BizModelSite site = ppsApp.RootModelSite; // Obtain the dimension that contains the member property BizDimension dimension = site.Dimensions["MyDimension"]; // Obtain the member property BizDimensionAttribute attribute = dimension.Attributes["MyAttribute"]; // Check out the dimension manager.CheckOut(dimension.Id, dimension.ParentModelSite.Id); // Perform the delete dimension.DeleteDimensionAttribute(attribute, null); // Submit the change manager.SubmitModelSite(ppsApp.Id, dimension.ParentModelSite, Microsoft.PerformancePoint.Planning.Bmo.Interfaces.SubmissionType.Update); // Check in the dimension manager.CheckIn(dimension.Id, dimension.ParentModelSite.Id); Update:  I've since discovered that you can obtain an unsupported utility from Microsoft Support that reportedly does the same thing, doh ! Oh well, always nice to have the code ..J

PerformancePoint Planning: Deleting a Custom Member Property..

Update:  I've posted a solution to Deleting a Custom Member Property here I've done this countless times; I've created my perfectly named Custom Member Property when it suddenly dawns on me that I've forgotten to give it the right data type.  No problem, right?  Wrong!  From within PBM, can you change the data type?  No!  Can you delete the member property? No!  Can you rename the member property?  No! So, what are the options?  Well, you could wait for version 2 (I truly hope you can edit/delete member properties in V2!), you could hack the back end database in the vague hope of removing the member property safely, or, as I have been doing in the past, create a new member property with a less than perfect name and try not to clench teeth and fists every time I glance at the original. Well, I've had enough, and decided I'm going to take action. Strangely, the Microsoft.PerformancePoint.Planning.BMO assembly contains a method called DeleteDimensionAttribute on the Dimension attribute collection.  I wonder... Anyone tried?

New PerformancePoint Contoso Demo - Released

Amidst my write up of the first day of the Microsoft BI Conference, I mentioned a new planning demo suite was imminent, and I would post more information about the demos soon.  Well, as it has now been officially released (27th October) I can spill the beans... Taken directly from the PPS Planning Forum announcement, the demo.. .. consists of Planning and Financial Consolidation demo. It shows how the fictitious Contoso Group uses Microsoft Office PerformancePoint Server for planning, statutory consolidation and data analysis. Well, I'm proud to announce that Adatis, in the shape of my colleague Jeremy Kashel, designed and built the PerformancePoint Planning element of the suite.  The PerformancePoint Financial Consolidation element was conceived and developed by our friends at Solitwork of Denmark. The demo can be downloaded from here... ...and is part of the next 'All Up BI VPC' (Version 7). Great work guys!

Microsoft BI Conference - Day 3 - 8th October 2008

The last day of the conference came round quickly and due to my relatively early flight I only attended a couple of sessions and spent most of the day meeting more people in and around the BI community.  Shout out to Peter Eberhardy (PeterEb), a real highlight.  Barry Tousley, Test Dev on the PPS Server; thanks for listening to my grumbles about the PPS Data Migration Tool and explaining why it does what it does.  Norm Warren of NormBI fame.  Patrick Husting, who I actually met on Day 2 and Brian Berry of BlumShapiro, who I met on Day 1 and reportedly follows this blog ! I thought the conference was great.  The organisation was slick and right on the button, from registration, meals, session offerings right up to the party.  I think last year, the main criticism was the sessions were not technical enough, they appear to have raised the technical level of some of the sessions but I still found most of them to be a bit dated and apart from a couple of choice sessions most BI people wouldn't have learnt a great deal - Nothing official at all about PPS v2 :o(  Also, a couple of the sessions I wanted to attend clashed so I'll have to find the time to watch them on the reported DVD containing a video of every session.  However, I did the feel the standard of presentation was excellent, well practiced, clear, funny and engaging. I'll definitely be vying for a place at next years, where they really should have lots to show off!