Sacha Tomey

Sacha Tomey's Blog

SQL Server 2012 : Columnstore Index in action

One of the new SQL Server 2012 data warehouse features is the Columnstore index. It stores data by columns instead of by rows, similar to a column-oriented DBMS like the Vertica Analytic Database and claims to increase query performance by hundreds to thousands of times.

The issue with indexes in a data warehouse environment is the number and broad range of questions that the warehouse may have to answer meaning you either have to introduce a large number of large indexes (that in many cases results in a larger set of indexes than actual data), plump for a costly spindle-rich hardware infrastructure, or you opt for a balanced hardware and software solution such as a Microsoft SQL Server 2008 R2 Fast Track Data Warehouse or a HP Business Data Warehouse Appliance where the approach is ‘index-light’ and you rely on the combination of high throughput and performance power to reduce the dependency on the traditional index.

The Columnstore index is different in that, when applied correctly, a broad range of questions can benefit from a single Columnstore index, the index is compressed (using the same Vertipaq technology that PowerPivot and Tabular based Analysis Services share) reducing the effort required on the expensive and slow disk subsystem and increasing the effort of the fast and lower cost memory/processor combination.

In order to test the claims of the Columnstore index I’ve performed some testing on a Hyper-V instance of SQL Server 2012 “Denali” CTP3 using a blown up version of the AdventureWorksDWDenali sample database. I’ve increased the FactResellerSales table from approximately 61,000 records to approximately 15.5 million records and removed all existing indexes to give me a simple, but reasonably large ‘heap’.

Heap

With a clear cache, run the following simple aggregation:

SELECT
   
SalesTerritoryKey
    ,SUM(SalesAmount) AS SalesAmount
FROM
  
[AdventureWorksDWDenali].[dbo].[FactResellerSales]
GROUP BY
   
SalesTerritoryKey
ORDER BY 
    SalesTerritoryKey

clip_image0014_thumb_thumb[1]

Table 'FactResellerSales'. Scan count 5, logical reads 457665, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 7641 ms, elapsed time = 43718 ms

image

Non-Clustered Index

Before jumping straight in with a columnstore index, let’s review performance using a traditional index. I tried a variety of combinations, the fastest I could get this query to go was to simply add the following:

CREATE NONCLUSTERED INDEX [IX_SalesTerritoryKey] ON [dbo].[FactResellerSales]
(
   [SalesTerritoryKey] ASC
)
INCLUDE ([SalesAmount]) WITH
(
    PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF,
   
DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON,
    ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 100, DATA_COMPRESSION = PAGE

) ON [PRIMARY]
GO

Notice I have compressed the index using page compression, this reduced the number of pages my data consumed significantly. The IO stats when I re-ran the same query (on a clear cache) looked like this:

Table 'FactResellerSales'. Scan count 5, logical reads 26928, physical reads 0, read-ahead reads 26816, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 6170 ms, elapsed time = 5201 ms.

image

Much better! Approximately 6% of the original logical reads were required, resulting in a query response time of just over 5 seconds. Remember though, this new index will really only answer this specific question. If we change the query, performance is likely to fall off the cliff and revert back to the table scan.

Incidentally, adopting an index-light ([no index]) approach and simply compressing (and reloading to remove fragmentation) the underlying table itself, performance was only nominally slower than the indexed table with the added advantage of being able to perform for a large number of different queries. (Effectively speeding up the table scan. Partitioning the table can help with this approach too.)

Columnstore Index

Okay, time to bring out the columnstore. The recommendation is to add all columns into the columnstore index (Columnstore indexes do not support ‘include’ columns), practically there may be a few cases where you do exclude some columns. Meta data, or system columns that are unlikely to be used in true analysis are good candidates to leave out of the columnstore. However, in this instance, I am including all columns:

CREATE NONCLUSTERED COLUMNSTORE INDEX [IX_Columnstore] ON [dbo].[FactResellerSales]
(
    [ProductKey],
    [OrderDateKey],
    [DueDateKey],
    [ShipDateKey],
    [ResellerKey],
    [EmployeeKey],
    [PromotionKey],
    [CurrencyKey],
    [SalesTerritoryKey],
    [SalesOrderNumber],
    [SalesOrderLineNumber],
    [RevisionNumber],
    [OrderQuantity],
    [UnitPrice],
    [ExtendedAmount],
    [UnitPriceDiscountPct],
    [DiscountAmount],
    [ProductStandardCost],
    [TotalProductCost],
    [SalesAmount],
    [TaxAmt],
    [Freight],
    [CarrierTrackingNumber],
    [CustomerPONumber],
    [OrderDate],
    [DueDate],
    [ShipDate]
)WITH (DROP_EXISTING = OFF) ON [PRIMARY]

Now when I run the query on a clear cache:

Table 'FactResellerSales_V2'. Scan count 4, logical reads 2207, physical reads 18, read-ahead reads 3988, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 235 ms, elapsed time = 327 ms.

image

I think the figures speak for themselves ! Sub-second response and because all columns are part of the index, a broad range of questions can be satisfied by this single index.

Storage

The traditional (compressed) non-clustered index takes up around 208 MB whereas the Columnstore Index comes in a little less at 194 MB so speed and storage efficiency, further compounded when you take into account the potential additional indexes the warehouse may require.

So, the downsides? Columnstore indexes render the table read-only. In order to to update the table you either need to drop and re-create the index or employ a partition switching approach. The other notable disadvantage, consistently witnessed during my tests, is the columnstore index takes longer to build. The traditional non-clustered index took approximately 21 seconds to build whereas the columnstore took approximately 1 minute 49 seconds. Remember though, you only need one columnstore index to satisfy many queries so that’s potentially not a fair comparison.

Troubleshooting

If you don’t notice a huge difference between a table scan and a Columnstore Index Scan, check the Actual Execution Mode of the Columnstore Index Scan. This should be set to Batch, not Row.

image

image

If the Actual Execution Mode is reporting Row then your query cannot run in parallel:

- Ensure, if running via Hyper-V, you have assigned more than one processor to the image.
- Ensure the Server Property ‘Max Degee of Parallelism’ is not set to 1.

Summary

In summary, for warehousing workloads, a columnstore index is a great addition to the database engine with significant performance improvements even on reasonably small data sets. It will re-define the ‘index-light’ approach that the SQL Server Fast Track Data Warehouse methodology champions and help simplify warehouse based performance tuning activities. Will it work in every scenario? I very much doubt it, but it’s a good place to start until we get to experience it live in the field.

SQL Server 2012 Licensing

Today saw the announcement of how SQL Server 2012 will be carved up and licensed, and it's changed quite a bit. There are three key changes:

1) There's a new Business Intelligence Edition that sits between Standard and Enterprise
2) No more processor licensing. There's a move to Core based licensing instead (with a minimum cost of 4 cores per server)
3) Enterprise is only available on the Core licensing model (Unless upgrading through Software Assurance *)

Enterprise, as you would expect, has all the functionality SQL Server 2012 has to offer.

The Business Intelligence edition strips away
- Advanced Security (Advanced auditing, transparent data encryption)
-
Data Warehousing (ColumnStore, compression, partitioning)
and provides a cut-down, basic (as opposed to advanced) level of High Availability (AlwaysOn).

In addition, the Standard Edition removes
- Enterprise data management (Data Quality Services, Master Data Services),
- Self-Service Business Intelligence (Power View, PowerPivot for SPS)
- Corporate Business Intelligence (Semantic model, advanced analytics)

If you are utilising 4 core processors, licence costs for Standard ($1,793 per core, or $898 per Server + $209 per CAL) and Enterprise ($6,874 per core) remain similar (ish).  However, you will be stung if you have more cores. The Business Intelligence edition is only available via a Server + CAL licence model and it's apparent that Microsoft are placing a big bet on MDS/DQS, Power View, PowerPivot for SharePoint and BISM as the licence for the Business Intelligence edition is $8,592 per server, plus $209 per CAL, that's nearly 10x more per server than Standard Edition !

For the complete low-down check out these links:

Editions Overview:
http://www.microsoft.com/sqlserver/en/us/future-editions/sql2012-editions.aspx

Licensing Overview:
http://www.microsoft.com/sqlserver/en/us/future-editions/sql2012-licensing.aspx

Licence Detail (including costs):
http://download.microsoft.com/download/D/A/D/DADBE8BD-D5C7-4417-9527-5E9A717D8E84/SQLServer2012_Licensing_Datasheet_Nov2011.docx

* If you are currently running Enterprise as a Server + CAL and you upgrade to SQL 2012 through Software Assurance, you can keep Server + CAL model, providing you don’t exceed 20 cores.

Business Data Warehouse Appliance

Microsoft and HP announced the release of their latest Data Warehousing focused appliance last week, the Business Data Warehouse Appliance (BDW):
http://blogs.technet.com/b/dataplatforminsider/archive/2011/06/06/announcing-the-hp-business-data-warehouse-appliance.aspx

Not to be confused with the BDA (the Business Decision Appliance aka the "PowerPivot Appliance") this latest appliance is targeted at data warehouse workloads and follows the Fast Track Data Warehouse principals.

This is great move, and I think they have the sizing just right;  Approx. 5TB-8TB of compressed user data (depending on your achieved compression ratio) will cater for a decent proportion of the warehouses and data marts in operation today.

The announcement appears to focus on the fast deployment (they claim it can be installed and configured in around 10 minutes), and that's pretty impressive but I'd like to know what other appliance specific value they have added to the overall package.  After all, installation and configuration of the appliance is just the tip of the iceberg.

I've been lucky enough to be involved in a Fast Track Data Warehouse implementation so I have a couple of ideas (some of which we've implemented) that I'd like to see baked in to a Data Warehouse Appliance offering:
  • Operational Reporting (Some of which are already achievable through the SQL Server Management Data Warehouse)
     
    • Disk and File group usage reports - Help with on-going capacity planning by detailing growth and trends over time for the disk as a whole and file groups associated with each database.
       
    • Fragmentation Reports - [The methodology surrounding Fast Track has a high emphasis on avoiding and managing fragmentation]  Reports that detail levels of fragmentation at both the physical and logical level would potentially pre-empt any fragmentation related issues.  A kind of combination of WinDirStat and Internals Viewer would be a great graphical representation of the fragmentation at both those levels.
  • Database Administration/Developer Accelerators
     
    • Database/object Creation functions - I want to be able to create a database and or a file group of a specific size and let the appliance do the work of creating the physical files on each of the mount points for me.  PDW (SQL Server Parallel Data Warehouse Edition) already does something like this that's baked into the product.
       
    • Simplified partition management - For example, I'm likely to want to age my data over time and merge smaller partitions into larger partitions (certainly until the maximum number of partitions limit is raised in 'Denali'), or I might want to remove all the data from a specific partition in preparation for a reload.  Make it easy and handle all the swapping out, multi merging etc for me.
       
    • Fragmentation Management - For example, I might want to select 1 or more tables and have them completely rebuilt to remove any extent fragmentation.
       
    • Resize management - My file group is approaching full, I need a bigger one, I want a function to perform that resize in a 'Fast Track approved' manner.

    There's a whole host of other 'value-adds' e.g. a 'Best Practice Analyser' that could be included as part of a Data Warehouse appliance, and it will certainly be interesting to see how the appliance develops following adoption over the next few appliance updates/revisions.

Microsoft Tech-Ed 2010 BI Conference, New Orleans Day 2 (Tuesday 8th June 2010)

Day 2 and the BI Keynote.

Announcements? Only two, although actually, old news:

- They announced the availability of the MS BI Indexing Connector. Originally announced back in May

- They got their story straight(er) with regard to the release of what will be called Pivot Viewer Extensions for Reporting Services. It will be available in 30 days.

The session took more of a “look where we’ve come since the Seattle BI Conference” and, as Ted Kummert described, it’s Microsoft’s BI [School] Report Card.

Interesting change in semantics for their BI strap line; no longer do they spout “BI for the Masses”, now it’s “BI for Everyone”. Although they admitted they, along with the rest of the industry are falling well short at only a current average of 20% ‘reach’.

With the recent delivery of SQL Server 2008 R2, Sharepoint 2010 and Office 2010 the BI Integration story is significantly more complete.

A large focus on PowerPivot and how it has helped customer quickly deliver fast, available reporting ‘applications’. Although I know a few people that would object to simply describing DAX purely as a familiar extension to the Excel formula engine.

Following the look back, a brief look forward:

- Cloud Computing will pay a part, Reporting and Analytics will be coming, when combined with Windows AppFabric, described yesterday this is a closer reality.

- Consumerisation enhancements, with better search and improved social media integration BI will move towards becoming a utility.

- Compliance: Several plans; Improved Data Quality, Data Cleaning and Machine Learning and strong meta data strategy support to deliver lineage and provide change impact analysis.

- Data Volumes. SQL Server Parallel Data Warehouse Edition has completed CTP2, this will open up high performance datawarehousing to data volumes that exceed 100TB. Dallas, the data marketplace will be better integrated to development and reporting tools.

Then tempted us with some previews of what *could* make a future version of SQL Server. Essentially, the theme for the future is to join the dots between Self Service BI and the Enterprise BI Platform and focussed on plans around PowerPivot:

- KPI creation

Essentially they are exposing (yet another) way to create (SSAS based) KPI’s through a neat, slider based GUI directly from within the PowerPivot Client.

- Wide Table Support

To help with cumbersome wide PowerPivot tables, they have introduced a ‘Record View’ to help see all the fields on one screen, all appropriately grouped with edit/add/delete support for new fields, calculations etc.

- Multi Developer Support

They plan to integrate the PowerPivot client into BIDS. This will facilitate integration with Visual SourceSafe for controlled multi developer support, they also plan to provide a lineage visualisation to help with audit and impact change analysis.

- Data Volumes

Following on from the BIDS integration, plans surrounding deployment to server based versions of SSAS to allow increased performance for higher data volumes. They replayed the demo of the 2m row data set from Seattle where we first saw almost instant sort and filtering, but this time applied it (with equally impressive performance) to a data set of more than 2bn records!  It was described by Amir Netz as “The engine of the devil!” ;)

Microsoft Tech-Ed 2010 BI Conference, New Orleans Day 1 (Monday 7th June 2010)

The Tech-Ed 2010 Conference kicked off today with the Keynote session.  The BI Keynote session is tomorrow but today's keynote did incorporate a small BI Element.  No huge announcements, but some announcements all the same.

- Unsuprisingly, Cloud computing dominated the keynote.  Highlighting application Integration of Cloud apps & data with on-premise data e.g. Active Directory and business operational systems data to demonstrate "real-world" cloud computing solutions.

- July will see a release of Service Pack 1 for Windows 7 and Windows Server 2008 R2

- Windows Server AppFabric, Application Role Extensions to, for example, faciliate Cloud to on premise integration capability is now RTM

- Windows Intune, Cloud based PC management environment

- No date set, but Internet Explorer 9 will focus on performance (Graphics accelleration) and new web standards, and is probably a response to Google speedy Chrome claims

- The Microsoft Live Labs "Pivot" research project, is to hit the mainstream.  They were a little cagey around dates, but possibly this month.

Maybe some more BI specific announcements tomorrow...

 

 

 

 

Stop Reporting Services (SSRS) 2008 from overwriting custom Parameter Datasets

Frustrating little quirk when building reports in SSRS 2008 using Visual Studio 2008.  If you write a custom query against a parameter dataset, and then change a query that references the parameter, the custom query used by the parameter is overwritten and reset to the default.

Now, you could keep a copy somewhere and replace it after every update, or, alternatively, use the <rd:SuppressAutoUpdate> tag by editing the rdl file directly.

<Query>
...
<rd:SuppressAutoUpdate>true</rd:SuppressAutoUpdate>
</Query>

I was close to raising this as a bug, when I found someone had beaten me to it:
https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=417209

Creating a Custom Gemini/PowerPivot Data Feed – Method 1 – ADO.NET Data Services

There are already a few good Gemini/PowerPivot blogs that provide an introduction into what it is and does so there is no need for repetition.  What I haven’t seen are examples of how existing investments can be harnessed for Gemini/PowerPivot based self-service analytics.

This series of posts focuses on various ways of creating Custom Data Feeds that can be used by Gemini/PowerPivot natively – Providing a direct feed from otherwise closed systems opens up new channels of analytics to the end user.

Gemini/PowerPivot supports reading data from Atom-based data feeds, this post looks at a quick way of creating an Atom-based feed that can be consumed by Gemini/PowerPivot.  By far the simplest way to develop an Atom-based data feed is to employ ADO.NET Data Services in conjunction with ADO.NET Entity Framework.  With very few (in fact one and a bit!) lines of code, a data source can be exposed as a feed that Gemini/PowerPivot can read natively. 

I am going to use the AdventureWorksDW sample hosted by a SQL Server 2008 R2 instance for this – obviously Gemini/PowerPivot natively reads SQL Server databases, so creating a custom feed over the top may seems a little pointless.  However, this technique may be useful for quick wins in several scenarios, including:

- Preventing the need for users to connect directly to the underlying data source.
- Restricting access to various elements of the data source (tables/columns etc)
- Applying simple business logic to raw data.

ADO.NET Data Services are a form of Windows Communication Foundation (WCF) services, and therefore can be hosted in various environments.  Here, I will simply host the ADO.NET Data Service inside an ASP.NET site.

To create a Native Gemini/PowerPivot feed, you take seven steps:

1 - Create ASP.NET Web Application
2 - Create Entity Data Model
3 - Create the Schema
4 - Create the Data Service
5 - Load From Data Feed
6 - Create Relationships
7 - Test

Step 1) Create ASP.NET Web Application

I’m using Visual Studio 2008 here to create an ASP.NET Web Application.

image

Step 2) Create Entity Data Model

Add an ADO.NET Entity Data Model item to the project, these files have a .edmx extension and allow us to create a schema that maps to the underlying database objects.

image

Step 3) Create the Schema

We simply require a 1:1 mapping so will ‘Generate from Database’.  Incidentally, the ‘Empty Model’ option allows you to build a conceptual model of the database resulting in custom classes that can be optionally mapped to the database objects later.

image

Create a Microsoft SQL Server connection to AdventureWorksDW2008.

image

Select the appropriate database objects, I’ve selected the following tables:

- DimCurrency
- DimCustomer
- DimDate
- DimProduct
- DimPromotion
- DimSalesTerritory
- FactInternetSales

image

Once the wizard has completed, a new .edmx and associated cs file is created that respectively contain an Entity Relationship Diagram and a set of Auto Generated Classes that represent the database objects.

Due to the way the Entity Framework handles Foreign Key Constraints we have to apply a workaround to ensure the Foreign Keys on the FactInternetSales table are exposed and brought into Gemini/PowerPivot.  A previous post Exposing Foreign Keys as Properties through ADO.NET Entity Framework walks through the workaround.

 image

image 

Step 4) Create the Data Service

Add an ADO.NET Data Service item to the project.

image

The service class inherits from a generic version of the System.Data.Services.DataService object, so we need to inform the compiler what class to base the generic object on.  We essentially want to base our Data Service on the class representing our newly created Entity Data Model.  The class name is derived from the database name, unless changed when the Entity Data Model was created, so in our case the class name is AdventureWorksDW2008Entities.

The auto generated service class contains a ‘TODO’ comment that asks you to ‘put your data source class name here’.  The comment needs replacing with AdventureWorksDW2008Entities.

The final step is to expose the resources in the Entity Data Model.  For security reasons, a data service does not expose any resources by default.  Resources need to be explicitly enabled.

To allow read only access to the resources in the Entity Data Model the InitializeService method needs updating with a single line of code.  The code snippet below details the final class implementation, notice the AdventureWorksDW2008Entities reference at line 1 and the the explicit resource enablement at line 6.

Code Snippet
  1. public class GeminiDataService : DataService<AdventureWorksDW2008Entities>
  2.     {
  3.         // This method is called only once to initialize service-wide policies.
  4.         public static void InitializeService(IDataServiceConfiguration config)
  5.         {
  6.             config.SetEntitySetAccessRule("*", EntitySetRights.AllRead);
  7.         }
  8.     }

That’s all that’s needed, by default, ADO.NET Data Services conform to the Atom standard, so in theory the Service is ready to be consumed by Gemini/PowerPivot.

Before we try, it’s worth giving the service a quick test, building and running the solution (F5) launches Internet Explorer navigating to the Service hosted by the ASP.NET Development Server.

image

You are first presented with an XML document containing elements that represent database objects, you can further drill into the objects by amending the URL.  For example, if you want to see the contents of the DimPromotion table then append DimPromotion to the end of the URL: http://localhost:56867/GeminiDataService.svc/DimPromotion (Case sensitive)

Note:  You may need to turn off Feed Reader View in Internet Explorer to see the raw XML (Tools->Internet Options–>Content->Settings–>Turn On Feed Reader View – make sure this is unchecked)

image

As a slight aside, the URL can be further enhanced to, filter, top n rows, extract certain properties etc etc. Here are a couple of examples:

URL Effect
http://localhost:56867/GeminiDataService.svc/DimCustomer?$top=5 Return the top 5 Customers
http://localhost:56867/GeminiDataService.svc/DimCustomer(11002) Return Customer with id 11002
http://localhost:56867/GeminiDataService.svc/DimCustomer(11002)/FirstName Return the First Name of Customer 11002
http://localhost:56867/GeminiDataService.svc/DimProduct(310)?$exapnd=FactInternetSales Returns Product with id 310 and all related Internet Sales Records

Confident that the feed is working, we can now deploy the service, and start using the feed in Gemini/PowerPivot. 

Step 5) Load From Data Feed

Open up Excel 2010, launch the Gemini/PowerPivot Client (by selecting ‘Load & Prepare Data’)

image

Select ‘From Data Feed’ from the ‘Get External Data’ section of the Gemini/PowerPivot Home Ribbon to launch the Table Import Wizard.

image

Specify the Url from the ADO.NET Data Services feed created earlier, in my case: http://localhost:56867/GeminiDataService.svc as the 'Data Feed Url’ and click Next.

Incidentally, you can use the majority of the enhanced Urls to, for example only select the DimProduct table should you so wish, however by specifying the root Url for the service you have access to all objects exposed by the service.

image

From the Table Import Wizard Select the required tables, in my case I’ll select them all.  (You can optionally rename and filter the feed objects here too).

Following the summary screen, the Gemini/PowerPivot Client then gets to work importing the data from the ADO.NET Data Service:

image

Once completed, Gemini/PowerPivot displays all the data from all of the feed objects as if it came directly from the underlying database.

image

Step 6) Create Relationships

There is one final step before we can test our model using an Excel Pivot Table.  We need to create the relationships between the tables we have imported.  The Gemini/PowerPivot Client provides a simple, if a little onerous way of creating relationships, the ‘Create Relationship’ action on the Relationships section of the Home Ribbon launches the Create Relationship wizard:

image

Each table needs relating back to the primary Fact table which results in the following relationships:

image

Step 7) Test

We are now ready to start our analysis, selecting PivotTable from the View section of the Gemini/PowerPivot Client Home ribbon creates a pivot table in the underlying Excel workbook attached to your custom fed Gemini/PowerPivot data model.

image

 

 

 

 

image

So, to allow fast access to, for example, potentially sensitive data, through Gemini/PowerPivot you can quickly build a custom data feed that can be consumed natively by the Gemini/PowerPivot Client data feed functionality.

HACK: Exposing Foreign Keys as Properties through the ADO.NET Entity Framework

First post for months; the PerformancePoint Planning announcement forced some redirection and rebuilding.  We’ve grieved, we’ve moaned, but at some point, you just have to move on.

-----------------

I’m not a fan of hacks – it normally means you are doing something wrong, but in this case, where I’m after a quick win, I’ve had to work out and resort to a bit of a hack.  It actually looks like the issue I’m facing maybe addressed in Entity Framework v2 (Microsoft .NET 4.0) – so maybe it’s more of a workaround than a hack after all ;o)

I’m using the ADO.NET Entity Framework and ADO.NET Data Services to expose a subsection of a database for consumption by Gemini.  In order to relate the exposed database objects together in Gemini, I need to apply this hack to ensure I have Foreign Keys available in my Gemini models to support creating the relationships.  By default, the Entity Framework exposes Foreign Keys as Navigation Properties rather than Scalar Properties.  Gemini does not consume Navigation Properties.

Lets take the scenario where I want to create an Entity Framework Model based on the following tables from the AdventureWorksDW2008 sample database:

-FactInternetSales
-DimCustomer
-DimProduct
-DimSalesTerritory

Step 1)  Identify the table(s) that contain Foreign Keys. 

In this case FactInternetSales.

Step 2)  Load those table(s) into the Entity Framework Model on their own. 

This ensures the Foreign Keys are set as Scalar Properties.  If you load in all the tables at once, the Foreign Keys are not exposed as Scalar Properties.

Step 3)  Load in the related tables. (DimCustomer, DimProduct, DimSalesTerritory)

At this point a bunch of Navigation Properties would have been set up, along with relationships between the related tables but the trouble now is the project will no longer build.  If you try you receive the following error for each relationship:

Error 3007: Problem in Mapping Fragments starting at lines 322, 428: Non-Primary-Key column(s) [CustomerKey] are being mapped in both fragments to different conceptual side properties - data inconsistency is possible because the corresponding conceptual side properties can be independently modified.

Step 4) Manually remove the relationships between tables.

Clicking on the relationship line on the diagram and hitting delete, removes the relationship.

Step 5) Remove all Association Sets

By editing the edmx file manually in a text or XML editor you need to remove all <AssociationSet>…</AssociationSet> occurrences from the <EntityContainer> section:

<EntityContainer Name="AdventureWorksDW2008Model1StoreContainer">
    <EntitySet Name="DimCustomer" EntityType="AdventureWorksDW2008Model1.Store.DimCustomer" … />
    <EntitySet Name="DimProduct" EntityType="AdventureWorksDW2008Model1.Store.DimProduct" … />
    <EntitySet Name="DimSalesTerritory" EntityType="AdventureWorksDW2008Model1.Store.DimSalesTerritory" … />
    <EntitySet Name="FactInternetSales" EntityType="AdventureWorksDW2008Model1.Store.FactInternetSales" … />
    <AssociationSet Name="FK_FactInternetSales_DimCustomer" Association="AWDW08.FK_FactInternetSales_DimCustomer">
        <End Role="DimCustomer" EntitySet="DimCustomer" />
        <End Role="FactInternetSales" EntitySet="FactInternetSales" />
    </AssociationSet>
    <AssociationSet Name="FK_FactInternetSales_DimProduct" Association="AWDW08.FK_FactInternetSales_DimProduct">
        <End Role="DimProduct" EntitySet="DimProduct" />
        <End Role="FactInternetSales" EntitySet="FactInternetSales" />
    </AssociationSet>
    <AssociationSet Name="FK_FactInternetSales_DimSalesTerritory" Association="ADW08.FK_FactInternetSales_DimSalesTerritory">
        <End Role="DimSalesTerritory" EntitySet="DimSalesTerritory" />
        <End Role="FactInternetSales" EntitySet="FactInternetSales" />
    </AssociationSet>

</EntityContainer>

The project should now build, with the foreign keys exposed as Scalar Properties.  Obviously no inherent relationships exist, so this could be dangerous in certain applications.  For Gemini however, providing you setup the relationships manually, it works a treat.

RIP PerformancePoint Planning

It's nearly a week since the announcement that shook the (PPS) world !  It's been a bit difficult to report on; generally the Adatis blogs try and offer solutions to problems we have encountered out in the real-world.  Now I could say something crass here about the real-world and the decision makers involved...but that would be childish right?

If I was to offer up my feelings, they wouldn't be that far from Alan Whitehouse's excellent post on the subject.  If I had an ounce of class about me, it would be much more aligned with Adrian's poignant discussion opener, the one with the sharp-witted title, but alas....

We've spent the best part of the week speaking to customers, partners and Microsoft about what to do next.  The timing was choice - would you believe, we actually had three new PerformancePoint Planning phases kicking off this week, according to my project plan - I should be setting up Kerberos as we speak..  [There is always a positive right?]

Some customers are carrying on regardless, they...

...already have planning deployments and are too far invested and dependent to back out at this stage or, 

...have a short-term view (That's not a criticism) and need a "quick" fix with a low TCO to get them through some initial grief.  (Typically these customers are going through rapid organisational change, or form part of a recent acquisition and, to help them see the wood from the trees during the transition, require short/sharp solutions)

Other customers, with longer-term views, feel the product, or more importantly, the suitably skilled resource pool, will drain away far quicker than the life-span of the much touted Microsoft product support.  I have to agree - Fact - Adatis will not be employing or training anymore PerformancePoint Planning Consultants.  I doubt many other consulting firms will either.

It's those customers with the longer-term view that are the ones currently in limbo - they are experiencing pain, they need pain relief, what should they do - wait and see what Office 14/15 offers? (There is talk of some planning functionality appearing in future Office versions - what truth there is in that..?).

The Dynamics customers could wait for the resurrection of Forecaster - I do have information on good authority that they will be developing Forecaster to be closer, in terms of flexibility, to PPS Planning.  I had originally heard the opposite view in that Forecaster will be replaced with a cut down version of PPS Planning.  Either way, I'm sure some of the PPS Planning code-base will be utilised, which could end rumours of PPS Planning being 'given' to the community as some form of community/open-source arrangement.  An arrangement that is, in my opinion, a non-starter anyway, "Hey, Mr FD, We've got this great open-source budgeting and forecasting product we think you should implement!" - yeah right !

Another rumour (and mixed message) is that Service Pack 3 will contain some of the requested features that were earmarked for version 2 (After all, the code has already been written, right?) this rumour was actually started by Guy Weismantel in his Announcement Video.  However, the information I have since received, clearly states that Service Pack 3 will contain stability and bug fixes only - so which is it to be?  It's unlikely for a service pack to contain new features, but it's not unheard of; anyone remember the original release of Reporting Services?  That arrived as part of a service pack for SQL Server 2000.

The burning question I cannot get answered is, have Microsoft actually stepped out of the BPM market for good?  We are told that Excel, Sharepoint and SQL Server provide BPM - I can't see, without Planning, how they can.  Short of hard-coded values, renewed Sharepoint/Excel hell, another vendor or bespoke planning solution, businesses can't set plans which have further reaching implications; effectively Planning's demise is also, effectively, shelving the Scorecard/KPI functionality from the M&A toolset too !  It will be interesting to see the new Monitoring & Analytics Marketing, will they still demo Strategy Maps and Scorecards, or will they now focus on Decomposition trees and Heat maps? Monitoring & Analytics may, in practice, just become Analytics..

I would have thought the cost of continuing to develop the product (even if it were a lemon, which Planning certainly wasn't)  is far less than the potential loss of revenue that Microsoft will face due not only to the loss of confidence by its customers (who are going to think twice about investing in any Microsoft product now, let alone a V1) but perhaps more significantly, the doors it opens to it's competitors who can offer a complete BI\BPM stack. 

Planning was foot in the customer's door for BI - once you put planning in, the customer had already bought the full BI stack, and in most cases, our customers were wowed by what they could now achieve. 

I suspect Cognos and SAP are still partying now!