Callum

Callum Green's Blog

Direct Query vs. Live Connection in Power BI – Part 1

There are lots articles out there that compare Import Mode vs.Direct Query, but people rarely talk about if there are any differences between Direct Query and Live Connection.  “Wait. Aren’t they the same thing?”  Well, not quite.

The first big difference between Direct Query and Live Connection is the type of data source used for the connection.  The former uses a database connection (typically SQL Server), whilst the latter requires an Analysis Services OLAP Cube.

This blog series won’t explain what Direct Query and Live Connection can do (found here and here), but will instead highlight the other subtle differences between the two connections. 

Feature Differences

There aren’t any features I can find that are available in Live Connection and not Direct Query.  However, there are a few the other way around.

I will first show you a Power BI Direct Query screen, focusing on the 2 of the 4 differences (highlighted green):

o   Quick Measures

o   Relationships

clip_image002

Quick Measures

This feature was first released in April 2017 and is available in Import Mode and Direct Query.  It enables a non-technical Business Analyst to create relatively complex DAX aggregations, with the use of a nice Wizard.  To access Quick Measures, right click on a measure or attribute and select Quick Measures.  Let’s try the same thing in Live Connection mode.

clip_image003

You’ll notice that Quick Measures is missing from the list of options.

I find it bizarre that Live Connection doesn’t support Quick Measures, especially when using a Tabular Cube as the connection.  The Power BI DAX language and engine are the same as Tabular, so you would think the two are compatible!

Please vote for this feature to be added into Live Connection - http://bit.ly/2umsiJy.

Relationships

There are two tabs displayed on the left-hand pane in Direct Query mode.

clip_image004

If you click the highlighted tab, it opens a Relationships page – where you can begin to join datasets (from the database) together.  I created a manual relationship that joined DimEmployee and DimDate together – as shown below.  No relationships are created in the underlying SQL Server database, but instead stored within the Power BI model.

clip_image006

In Live connection, the left-hand pane looks bare:

clip_image008

There is no option to create any form of Relationship against the Live Connection Tabular Cube.  This kind of makes sense because a BI Developer would be the person responsible for creating relationships within an OLAP Cube.  I would argue that if you want the ability to mashup data or create your own relationships, you shouldn’t be connecting to a Cube in the first place.

Coming Soon

Check out Part 2 of my blog series - available here.  The focus of this article are the Add Hierarchy and Change to Import Mode features.

Part 3 will conclude the Trilogy, where I go off-piste slightly and focus on the Data Models in Direct Query and Live Connection.

Further Reading

Other than the Power BI Blog, there are some other great pages out there too:

o  Power BI Blog - http://bit.ly/2rkLGoq

o   Import Mode vs. Direct Query - http://bit.ly/2t4ragx

o   Direct Query in Power BI - http://bit.ly/2nUoLOG

o   Live Connection in Power BI – http://bit.ly/2tfJr5L  

Contact Me

If you would have any questions or thoughts, please leave a comment below.  My Twitter details are also provided.

Twitter:                @DataVizWhizz

Managing msdb in SQL Server

In the world of BI, the SQL Server system databases (with the exception of tembdb) are often forgotten about.  You won’t be surprised to learn that I am a BI Consultant and until recently, I was also in this boat.  System database are usually left to a DBA, whilst we develop ETL packages, model data or produce fancy reports.

After attending Ryan Adam’s session at SQL Pass 2016, my mindset has changed.  The msdb database used to be the hub for SSIS logging but since 2012, this is all stored in the SSISDB database.  However, there are still a number of processes and functions that are logged in msdb and we need to manage this, to prevent the database from becoming too large and impacting storage costs. 

This blog will provide a brief overview of what is stored and logged in msdb, as well as a little trick of how to manage SQL Agent Jobs. 

What is Stored in msdb?

Before I provide a brief list of what is in msdb, one obvious bit of advice is it should be backed up regularly.  We all have backup strategies for our OLTP or data warehouse database, but the system ones are often neglected.  All this logging and auditing history in msdb is useful!

The msdb database consists of the following:

o   SSIS - Only if packages are stored in SQL Server (not the SSIS Catalog).

o   Backup and Restore

o   Maintenance Plans

o   Database Mirroring

o   Policy Based Management

o   Log Shipping

o   Service Broker

o   Database Engine Tuning Advisor

SQL Agent Jobs

Whilst all of the logging tables in msdb are important for a DBA, one real area of interest in BI is around SQL Agent jobs.  There can be many SSIS packages being called from the Catalog, spanning across multiple jobs.  It is important that logging is available, yet we don’t want job history to be stored forever.

Out of the box, the logging for msdb looks like the below.  Right click on SQL Server Agent properties in SSMS and navigate to the ‘History’ tab.

clip_image002

Let’s work through a use case, extracted from Ryan Adam’s Pass presentation:

1.       Imagine you have 10 database backup jobs running every hour in your production environment.

a.       10 Databases * 24 rows per day = 240 Total rows per day.

b.      Total rows for all jobs would be maxed out in 4 days and 4 hours.

Ok, so now we have a problem.  All DBA’s and BI developers would want to see more than 4 days’ worth of history, which means the ‘’ setting is not a good default to retain.  How about changing the Agent job to the below?

clip_image004

Sounds perfect, right? Wrong.  The GUI setting is not persisted, meaning the Agent Job will log history for 30 days on the next run only.  This seems crazy, but is what we are dealing with unfortunately.  There is a solution, which involves executing a stored procedure within the msdb database.

DECLARE @DeletePrior DATETIME;

SET @DeletePrior = CONVERT(VARCHAR(10), DATEADD(dd, -30, GETDATE()), 101);

EXEC msdb.dbo.sp_purge_jobhistory @oldest_date = @DeletePrior;

 

NOTE:   Ensure the ‘row restriction’ property (shown above) is unchecked against the SQL Server Agent Job settings before executing.

Although you cannot see the changes in the GUI settings, you will now retain 30 days of history in msdb.  The query can be tweaked to days, months or even years if you like. 

Conclusion

If you take anything away from this article, then it should be to not ignore system databases.  Microsoft create them automatically for us but unfortunately, the out of the box configurations are not always correct for your situation or business.  For a deeper dive into msdb and other system databases, try some of the recommended blogs below.

If anyone would like to share their experiences with using/refining msdb, feel free to comment. 

Recommended Reading 

o   MSDN - https://msdn.microsoft.com/en-us/library/ms187112.aspx?f=255&MSPPError=-2147217396

o   Hemantgiri Goswami’s Blog - https://www.toadworld.com/platforms/sql-server/b/weblog/archive/2013/10/21/msdb-cleanup-is-necessary

o   Ryan Adam’s Blog - http://www.ryanjadams.com/

Adatis Coding Dojo – Session No.2

The Challenge

Write a program that generates all two-word anagrams of the string "documenting", in SQL Server.
e.g. “Documenting” = “Document Gin”.

Introduction

This weeks’ challenge was….well, a challenge. In our first session everyone attempted it via paired programming, for this dojo we decided to attempt the problem with one larger group, consisting of 5 people. This technique encourages an open forum for developers to discuss ideas, with one person at the computer coding. The assigned coder must swap with another participant every 5 – 10 minutes.

Preparation

Before beginning the Dojo, I asked for one of the Senior Consultants to attempt to complete the problem and provide the code and concepts behind his thinking. The idea is that one challenge can be solved in a number of ways, with 5 people working collaboratively thinking in a completely different way to one individual coder. We provided a file containing 10,000 words (which would become the master list for the anagram solution). If you would like to try this yourself, the text file can be downloaded from here.

Senior Developer’s Solution

Most importantly, the senior developer DID NOT use the TDD development methodology. It took around 2 hours for the developer to implement a fully functioning stored procedure. Here is how it was achieved:

1. Stored procedure, encapsulating a long T-SQL script. The solution was focused around string manipulation and recursive querying.

2. While loop turns word into a pivoted array of individual letters, which can then be compared against the words in the word list table.

3. Recursive CTE returns words consisting only of letters in that array

4. XML PATH used to create CHECKSUM string of letters and counts.

5. All word combinations of correct length returned and checked against checksum to validate letter counts.

The solution was built to be flexible from the start – it returned anagrams with ‘n’ number of words rather than the proposed 2. It would also work with any provided starting word and special characters.

Code Dojo Solution

The first task was to brainstorm ideas on how to conquer the challenge, starting with whether it could even be achieved in SQL Server! The coding team weighed up string manipulation, recursive CTE’s, a cursor or mathematical calculations, amongst other SQL Server functions. The general consensus was to avoid recursion (where possible) to ensure faster query results and mathematical algorithms to compare against the data in the imported wordlist table.

The development team used TDD to implement the mathematical approach to solving the problem. Each code enhancement contained a new automated test script that would capture any failures and enable a simple rollback to the last working codebase. This proved to be a successful technique, as ideas were changing constantly throughout development.

Actual Dojo Solution

After around 90 minutes of SQL coding and TDD, the team managed to come up with a working solution. This is how they achieved it:

1. A SQL Function to work out a unique value for a character (letter).

a. Each letter of the word ‘Documenting’ has a case sensitive ASCII value e.g. ‘D’ is 68 and ‘d’ is 100.

b. All letters were converted to uppercase, to ensure a non-case sensitive approach.

c. Apply the POWER SQL function within a given letter, which gives it a unique number and cannot be duplicated by another letter.

2. Simple stored procedure that looks up the existing words in the full word list table, which references the function whilst comparing letter values.

a. Find all word combinations and their total POWER value.

b. Compare the total number against the hardcoded word ‘Documenting’

c. Return and records that have two words adding up to the total POWER of ‘Documenting’.

d. Nest the hard coded word into the procedure, which means any word can be compared at run time.

3. The TDD approach helped facilitate the iterative, code review process.

Conclusion

Although the Senior Developer and Dojo Team’s solutions met the minimum expectations for the challenge, there were a number of pro’s and con’s to both solutions.

The Senior Developer thought ahead and made improvements to the requirements, such as dynamically handling more than 2 word anagrams. He also demonstrated some of the under used and more powerful functionality within SQL. However, it takes around 2 minutes to execute one anagram and the code itself is not the most efficient. By not using TDD in his approach, he over complicated the solution and did not encounter performance bottlenecks until the end of the build.

On the other hand, the Dojo team fully practiced TDD. This was reflected in the much smaller codebase and, most importantly, the speed in which the anagram procedure executed was much quicker (23 Seconds). Their solution is limited in that it is very rigid and cannot handle more than two word anagrams. It also cannot handle special characters, whereas the Senior Developer solution can. However, these were not requirements of the solution – would a client accept a slower, more complicated product that handles requirements they do not need?

Overall, both solutions work adequately but it is clear that when confronting a technical challenge/problem, running a Dojo and practising TDD can bring more efficient results. Had we added further requirements to the challenge, maybe the dojo team would have found another, even more efficient solution.

References

General

Good Dojo Practices - http://codingdojo.org/
TDD - http://code.tutsplus.com/tutorials/the-newbies-guide-to-test-driven-development--net-13835

SQL Functions

POWER – https://msdn.microsoft.com/en-us/library/ms174276.aspx
ASCII – https://msdn.microsoft.com/en-us/library/ms177545.aspx
XML PATH – http://blogs.msdn.com/b/mind_talks/archive/2012/01/18/xml-path-for-sql-server.aspx
CHECKSUM - https://msdn.microsoft.com/en-us/library/ms189788.aspx

Dojo Code

For access to the Senior Developer’s and the Dojo team’s SQL solutions, please leave a comment and I will get back to you directly.  It would be interesting to see if anyone else has a better technique (both in terms of code length and performance) and welcome any suggestions.

SQL Server NOT IN Clause - Avoid like the….

Background

Up until recently, I was one of the SQL Server developers adopting the bad habit that is known as the NOT IN clause.  It is an easy way of finding data in one table, that does not exist in another.  For this purpose, I thought using the NOT IN would help me conceptualise a query result, as well as help make it easier for someone else looking at the code.  In fact, although the performance (within an execution plan) is OK, you can pull back incorrect results from the overall query.

The Problem

The NOT IN clause is problematic in only one, but VERY IMPORTANT way…….it DOES NOT include NULLS in the comparison table.  Please see the example below:

Create two tables for NOT In Example:

NOT IN SQL 1

Query results for both tables:

NOT IN SQL 2

NOT In Query:

NOT IN SQL 3

As you can see, 0 records were returned.  We would expect the record (containing Striker, Andy Cole) in the NewFootyPlayers table to be returned.  The NOT IN Clause is ignoring any comparisons on NULLS.

NOTE 
Adding an additional ‘WHERE Position IS NOT NULL’ filter to the NOT IN clause would also give the same result but a lot of people will forget to add it and spend a substantial amount of time wondering why certain records are missing from their result set.

The Solution(s)

There are a number of clauses or SQL syntax that can be used instead of the NOT IN.  although most do not have any major performance benefits, they actually return what is expected.  The three examples below all return the one expected record:

NOT IN SQL 4

All three return the below result, which we expected in the first place:

NOT IN SQL 5

Recommended Solution

Whilst none of the solutions above cause major performance problems, there is one method that is better than the others.  If we are working with hundreds of millions of records in both tables, using the NOT EXISTS is the most efficient query.  Its performance is similar to NOT IN and EXCEPT, and it produces an identical plan, but is not prone to the potential issues caused by NULLs or duplicates.

I would be interested to see if anyone else has performance tested each query type and if there are better alternatives to NOT EXISTS.  One thing I am certain on, however, is that no one should have to use the NOT IN clause.