Callum

Callum Green's Blog

Microsoft BI – Coming Soon to SQL Server 2016

After attending the Pass Summit 2016 a couple of weeks ago, I attended a number of sessions that provided an insight into the direction Microsoft are heading with BI.  I thought I’d share this with the community.

Looking back to October 2015, the official Reporting Roadmap blog from Microsoft stated their intent in dramatically improving the visualisations, integration and harmonisation of both on-prem (SSRS) and cloud based services (Power BI).  Whilst reporting appeared to be a renewed focus, they are constantly driving other areas of BI - such as analysis services and database/data warehousing development in the cloud.

Now, for the interesting bit.  Here is what we can expect in the SQL Server 20916 BI Stack in the near future:

-          SQL Server 2018 CTP (to be released within the next month).

o   This seems very early, considering 2016 has only just been released!  Let’s wait for an official Microsoft announcement.

-          Azure DW

o   Auto Create Statistics

§  Currently statistics have to be generated and updated on the fly.

o   Replicated Distributed Table

§  One that will excite any cloud based SQL Server developer.

§  This will reduce data transfer between distributions/nodes and consequently improve performance.

-          Azure Data Lake

o   General Availability (GA) is imminent.

o   Future Features:

§  Polybase, so that you can connect the two big MMP platforms (ADL and DW).

§  SSIS connectors (released with GA) for Store.

§  Python and R in U-SQL.

-          SSIS

o   Lots of new connectors, including:

§  ADLS.

§  CRM.

-          SSAS

o   Migrating Power BI Models into Tabular.

§  This is coming very soon apparently, but will be developed in Azure SSAS first.

o   Object Level Security in Tabular

§  We currently have row level, but there are talk to secure a physical object, not just a row.

§  Even better news - Microsoft want to integrate the two together, which will make security awesome in Tabular.

-          SSRS

o   Supporting (not pinning) Excel reports in RS report.

§  This will come, but Power BI is the focus right now and we may have to wait a while.

-          Power BI

o   Additional and better Pivot Table functionality.

o   Integrating Active Directory dynamically.

o   Potential to use Direct Query and Imported modes together – as a hybrid.

§  Functionality is possible, but performance needs to be weighed up by Microsoft before anything will emerge.

o   Quick Calcs.

§  Only ‘Percent of Current Total’ currently available.

§  Potential is to offer lots more – such as YTD, MAT, Current Year vs. Previous Year, etc.

§  This is for the users who aren’t familiar with DAX.

o   Template organisational Content Packs.

§  The ability to give the user the ability to personalise colours, fonts, etc. within a structured (organisational) content pack.

-          Power BI Embed

o   Application developer will be able to limit user sessions and therefore, reduce the charge per 1 hour costs that come with it.

 

There are some features/issues Microsoft do not plan to change.  Although, the good thing about Microsoft is that they are community driven, so if you feel strongly about anything (and get support from your peers), they may change their minds.

-          SSRS

o   Q&A, Query Editor (Power Query), R integration, etc. not being developed.

§  Pretty obvious really.  Whilst they are going to introduce almost everything from Power BI, some elements of functionality are just not needed for on pre purposes.

§  R Scripts may come one day, but not a focus right now.

-          Power BI

o   Source Control

§  No immediate plans to integrate with TFS or modularise the pbix files (for a more developer based solution)

§  Not surprising as this is a self-service tool, not a development team.

§  Work around is to upload pbix files into OneDrive and use the versioning as an element of Source Control or add a file into Visual Studio.

§  Keep Voting on PowerBI.com if you want this! (Currently 278 votes).

·         https://ideas.powerbi.com/forums/265200-power-bi-ideas/suggestions/9677517-source-control

-          Power BI Embed

o   Licence model

§  ‘Speak to Marketing’ is what we were told.

 

Is everyone excited?  I certainly am.  Microsoft are openly admitting that any new BI related feature will be developed in the cloud first, but this was pretty obvious.  For all you on-prem fanatics (me included), at least the bugs/undocumented features will be ironed out before we get to use them!  My personal favourite has to be the direction SSRS is going.  It was recently labelled a ‘mature’ product, but now RS has suddenly become a cool and current tool to use.  The fact Power BI and SSRS will become almost the same product, tells us all that on-prem is still an important part of the Microsoft licensing strategy.

 

I am sure there are lots of other features coming into the BI stack over the next 6 months to a year and these were just the ones I had heard about.  Feel free to comment if you have additional ones to share.

Replicating SSIS Data Profiling in Power Query

As a BI Consultant, I regularly sit down with a new/prospective client and dissect both their business requirements and source data.  With the latter, the SSIS Data Profiling Task (in Visual Studio or SSDT) is a very useful tool to find out the general quality of the data we will be importing into the warehouse.  Count of NULLS, Value frequency, Min/Max/Median Values and unique count of records are just a few types of data profiling.

I have recently being playing with the out of the box Data Profiling functionality in Power Query.  Although the functions are very useful, I quickly realised that I would need to extend my PowerBook to match the Data Profiling Task in SSIS.  I will break my findings down into a couple of parts, with this instalment focusing on a specific data profiling output from SSIS. 

Table Profile Function

Table.Profile() is a relatively new function in Power Query that gives some profiling stats for columns in a specific database table or raw file.  In the example below, I pulled in a raw Excel spread sheet from the Power BI website, which contains sample financial data.  Click here to download the data.

clip_image002

To obtain stats on this query, open Advanced Editor, add the following M at the end of the existing code:

    #"TableProfiling" = Table.Profile( #"Changed Type" as table) as table

in

     #"TableProfiling"

clip_image004

All we are doing is creating a new step in M, referencing the previous step and calling the new one.  The actual Table.Profile() function doesn’t even require an expression value and the underlying Power Query engine provides all of the functionality.  Each column has been profiled – see below:

clip_image006

I think the outputs are self-explanatory, especially if you are familiar with Data Profiling.  Either way, one line of code gives us some useful high level stats about the data and can help us captures the extremes or anomalies of each column.  Not all analysis will be suitable for a column, e.g. the ‘Country’ field will not produce Standard Deviation or Average values because it uses the Text datatype. 

Column/Frequent Value Distribution

Whilst the in-built Power Query function is ideal for overall table/file analysis, data profiling is most effective when focusing on specific columns and data patterns.  Both Column and Frequent Value Distribution are basic features of the SSIS Profiling Task, shown below.

clip_image008 

For the Power Query proof of concept, I have connected to AdventureWorksDW2014 and the Person.Address table – named PersonAddress.  This means I can use the SSIS Data Profiling screenshot as the benchmark to not only prove it can be replicated, but to ensure the numbers match. 

The first step is to apply the Table.Profile() function to a Reference query – called TableProfiling.  We can then create a simple staging query that just includes what we need from the out of the box profiling – called TableProfilingStage.  ‘Number of Values’ is renamed to ‘TotalCount’ and ‘Number of Distinct Values’ to ‘DistinctCount’.  We are left with the following data:

  clip_image010

The row of interest is ‘City’ (contained in Column), as this is what we need to profile.  The queries TableProfilingStage is the reference point for the frequent value distribution.  Create a New Source and paste the following M Code into the Advanced Editor:

clip_image012

Points of note:

1.       Lines 1-4 are referencing parameters in other queries and more specifically, with the ordinal position of 3.  The ‘City’ column is the 4th record in TableProfilingStage which equates to position 3 (column 1 starts at 0).

2.       Each ‘City’ value summed with a total count.

3.       Individual record counts (value of 1 per row) to use in the eventual calculation.  This value is then summed to give a total record count on each row.

4.       Apply (Count/TotalCount)*100 to work out the Frequent Value Distribution.

5.       Renamed, sorted (by count) and hidden any unnecessary columns.

6.       Save your new query – I called it ColumnProfiling.

7.       You should have 4 queries in total:

a.       PersonAddress

b.       TableProfiling

c.       TableProfilingStage

d.       ColumnProfiling

As long as your queries are named identically to my examples, you will end up with the below outputs:

Column Value Distribution:

clip_image014

Frequent Value Distribution:

clip_image016

We have now replicated Column and Frequent Value Distribution and the figures matches the SSIS Profiling Task! Although Power Query does not provide the pretty visualizations, we can easily re-produce them in the Power BI reporting suite.  I will save this for another blog, though.    

Next Steps

Look out for my next blog, where I aim to enhance the current Data Profiling PowerBook and add configuration.  In order to make this method as or more efficient than SSIS, it will need to be automated and handle a number of sources (e.g. csv, database connection, etc) and incorporate invoked Power Query functions. 

Conclusion

It is clear to see that Power Query is a very effective tool to replicate SSIS Data Profiling.  I believe with further development, a PowerBook could be used as a quicker and more efficient tool to run data profiling. 

I am very interested to see if anyone has created their own Data Profiling tasks in Power Query and welcome any comments, feedback or questions.

New Features in SQL Server 2016 – Part 4: Integration Services

In the current SQL Server 2016 CTP2, there are not many exciting improvements made to SSIS.  The hope is that the rumours are true and that the more exciting and beneficial features will come later, probably closer to the actual release.

The below have been announced by Microsoft and are available in CTP2:

·        AlwaysOn Support.

·        Incremental Package Deployment.

·        Project Upgrade

The speculation and noises coming out of the SQL Community imply two features will be available soon.  As nothing has been officially released, I cannot offer much more insight right now.  Look out for an updated blog as and when something is announced.

·        Azure Data Factory integration with SSIS.

·        Enhanced SSIS - Designer support for previous SSIS versions and support for Power Query.

AlwaysOn Support

This high-availability, disaster recovery solution is used for lots of large scale databases as an alternative to mirroring.  Integrating this into SSIS will provide some great benefits.

1.      Easily deploy to a centralised SSIS Catalog (e.g SSISDB database).

2.      SSISSDB can now be added to an AlwaysOn Availability group.

3.     When a failover occurs, a secondary node will automatically become the primary.

 

Incremental Package Deployment

You can now deploy one or more packages to an existing or new project, without the need to deploy the whole project.  The incremental process can be actioned from:

1.      Deployment Wizard.

2.      SQL Server Management Studio (SSMS).

3.      Stored Procedures.

4.      Management Object Model (MOM) API.

I have tested the package deployment in CTP2 and it works in exactly the same way as project deployment.  The wizard, especially has the exact same look and feel.  In order to run this in CTP2, you will need to navigate to the standard SQL Server 2016 directory, which is usually in:

64 Bit:   C:\Program Files\Microsoft SQL Server\130\DTS\Binn
32 Bit:   C:\Program Files (x86)\Microsoft SQL Server\130\DTS\Binn

To run the wizard, simply double click the ‘ISDeploymentWizard.exe’ executable in the folder or run it from command line.  You can now select ‘Package’ or ‘Project’ deployment in the options – see below:

clip_image002[4]

There is an option to set password security on individual packages and individually which ones you want to deploy at one time.

When SQL Server 2016 is officially released, I would imagine the package deployment wizard will be integrated into Visual Studio and SSDT and implemented without the need of command line or an exe.

Project Upgrade

Not much has changed here, apart from:

1.      Project-level connection managers work as normal.

2.      Package layout/annotations are always retained.

References

For more information on all of the new SSIS SQL Server 2016 features, the below resources/blogs are highly recommended.

·        Official Microsoft Page -
https://msdn.microsoft.com/en-us/library/bb522534(v=sql.130).aspx

·        Gilbert Quevauvilliers – BI blog -

https://gqbi.wordpress.com/2015/05/07/bi-nsight-sql-server-2016-power-bi-updates-microsoft-azure-stack/

·        Manoj Pandey Blog -
http://sqlwithmanoj.com/2015/05/09/microsoft-announced-sql-server-2016-new-features-and-enhancements/