PlamenaMincheva

Short description of the blog

Power BI with Azure Databricks for Dummies (in 15 minutes)

    Microsoft Azure Databricks is Microsoft’s Apache Spark-based platform optimised for Azure and thus integration with Power BI. It was released on 26th of February this year and is still in preview but in our recent project we decided to give it a go and explore what options would such a solution behold for an enterprise data warehouse solution.
To do so we have created a cluster for our project with some notepads with a PySpark script that does all our ETL. On top of that we have decided to use Power BI to feed directly from the Databricks cluster tables to build our dataset and reports.
With this blog post I am going to take you through the quick process of setting up the Databricks connection in Power BI desktop using the web app and the scheduled refresh.
The prerequisites for such a task are:

  • Azure Databricks cluster with data tables (Facts, Dimensions, etc.)
  • An access Databricks token (not the same as Azure tokens)
  • Power BI subscription (I’m working with Pro version)
  • Power BI Desktop

And that’s all – no gateways, no drivers, no batteries needed.


     1. Firstly, we need to make sure we have started our cluster as this takes approximately 5-10 mins. If we have already created our token (non-expiring would be ideal) and saved it somewhere we just need to open Power BI desktop and select Get Data from the welcoming screen.
If you try to filter by Spark on all the connectors options you will notice there are currently three options, we will use “Spark (Beta)” one and click Connect:


clip_image002


    2. The next step is a bit trickier as you need the URL address of your cluster (and from now on we need it to be started). To compose the above-mentioned address open Databricks, go to Clusters and select the cluster you want to connect to.
On the cluster edit page, scroll down and select the JDBC/ODBC tab.

clip_image004

On the JDBC/ODBC tab, copy and save the JDBC URL.
Construct the JDBC server address that you will use when you set up your Spark cluster connection in Power BI Desktop.
Take the JDBC URL that you copied and saved in step 3 and do the following:
⦁    Replace jdbc:hive2 with https.
⦁    Remove everything in the path between the port number (443) and sql (sql/protocol…) whilst keeping the boxed text as per the image below:.

clip_image006

In our example, the server address would be:

https://eastus2.azuredatabricks.net:443/sql/protocolv1/o/5441075421675247/0221-211616-tings395

or https://eastus2.azuredatabricks.net:443/sql/protocolv1/o/5441075421675247/boost

(if you choose the aliased version)

Once you have the address you need to input under the Server label in the dialog box of the next setup step:

clip_image008

Please note that the Protocol must be HTTP. As per the Data Connectivity mode – both are supported so it is up to your personal / business preference.


3. The next step will be to input your login details.
Please keep in mind this is although this is a bit misleading as for User name you need to input “token” and for Password – the token string value you have previously saved (or if you forgot to save it just go create a new one as I did a couple of times )

clip_image010


4. After you have successfully connected (if the cluster is Online you shouldn’t experience any issues) you will be able to select the data tables you need for your report and edit the queries to your liking. It is a ‘bit’ slow when compared to other data connectors but it is still in beta and hopefully it will be much more responsive in the near future.

clip_image012


5. Upon configuring your dataset, creating the report masterpiece of the day and publishing it online, comes the final touch of the report development - setting up the automatic dataset refresh from the Power BI web service:

clip_image014

And here comes the sweetest part – as Power BI and Databricks are integrated and everything is in the cloud you don’t need gateways, ODBC drivers nor anything just setup the credentials once again as per the following image:

clip_image016


N.B. Whenever your cluster is offline you will see an error that your credentials are not valid :

clip_image018

For this to work as expected always start your Databricks cluster first and make sure it is online before your dataset refresh schedule hits!
However, there is a not-so-sweet part as well. I’d even say salty as the tears I cry whenever I need to fight this bug.
Currently once you deploy your dataset over a week or two the scheduled refresh starts failing with strange errors like “Bad request” or my favourite “Invalid method name: ‘GetCrossReference’” for which I have logged a bug with Microsoft.
This is manually fixed by opening your Power BI Desktop, click Edit Queries and refresh all the data tables one by one. If you get an error just try again – the Spark connector is a bit buggy and takes a while to fetch the data but once you have refreshed the data set and publish it your report will refresh smoothly (taking for granted your cluster is always online when you are refreshing the dataset) for at least a week .


As a conclusion, Power BI with Databricks is a tempting new option for cloud only based solution which even in preview is pretty stable and reliable, so just give it a go and let me know if you experience any bugs or if you discover something new and exciting about this connection of Power BI.

SQL Operations Studio Vs (good old) SSMS


In the following post we’ll go over a “short” comparison between the most popular MS DB management tool SSMS and the new tool provided by Microsoft – SQL Operations Studio. We will go over each tool’s “strengths” and “weaknesses” in an objective fashion as much as possible.

In November last year Microsoft have launched a new lightweight cross platform tool for SQL Developers and DBAs called as ‘Microsoft SQL Operations Studio ‘. It vividly resembles Visual Studio Code perhaps because it was forked from it. Also, both tools are designed not only for Windows but also for Mac or Linux machines and are free of charge.

Does this mean that good old SQL Server Management Studio will be replaced by SQL Operations Studio and in time even forgotten?

It is hard to answer determinately to the above questions as it depends on the needs of the developer using the tool and the system. However, the following comparison between the two tools will be taking into consideration the daily struggles of a Database Developer (not DBA as the needs there are a bit more specific).

Firstly, let me exclude the major difference between the two tools – if you are working under Linux or Mac and don’t want to use a VM you don’t have a choice here. But for Windows OS users SQL Operations Studio also contains a lot of hidden gems that would ease your workload and is getting new functionalities way faster than expected, so I’m in a hurry to write this blog post before it gets outdated .

Also, I presume all readers here have some experience with the SSMS tool (regardless of the version) and little to no experience with Microsoft SQL Operations Studio.

So, to start with, the thing that I liked most about the new tool is the Black Theme (finally!!!). It was about time DB developers could use some tool that wouldn’t make them cry every evening after a long day in front of the monitor. Currently SSMS has a black theme but it needs to be manually enabled and doesn’t really work as expected:

clip_image002[5]

Versus:

clip_image004[5]

Looks way better to me!

Another important upgrade at least for me is the more responsive IntelliSense. SSMS’ IntelliSense has gone a long way to be actually working and useful but the SQL Operations Studio one beats it!

As a developer I’m used to SQLPrompt - awesome tool from Redgate! SQLPrompt can auto generate your whole script, including keywords and stored procedures’ executions. I consider SQL Operations Studio’s IntelliSense to be significantly better than SSMS (as a note it still has a long way to go to be a match for what Redgate offer as a paid tool).

Other fancy new tweaks that I’ve come across are hovering over “Select *” that shows you all columns with their data types and allows you to COPY them. The peek definition feature also allows you to get the script of the object you want (doesn’t work for system objects though). Just to mention I’ve used both with SQLPrompt but unfortunately unless you’ve paid your license you cannot use it in SSMS.

Here is a peek of peek definition:

clip_image006[4]

And the column list while hovering over the wildcard symbol:

clip_image008[4]

However, if we connect to an Azure SQL Database these don’t seem to be working. You do get a hint for the reserved keywords but nothing more ☹.

clip_image010[4]

I am also happy to see there is an “Execution plan” in SQL Operations Studio called “Explain”:

clip_image011[4]

Probably not as extensive as the one in SSMS but it could be further enhanced.

There is, after all, an additional tab called “TOP OPERATIONS” that simplifies the investigation into why a certain query performs poorly:

clip_image013[4]

SQL Operations Studio also displays line numbers automatically – i.e. you don’t have to look for the option in the settings. If you must modify the user settings it is quite a pain … there is no GUI and you are modifying the JSON source manually:

clip_image015[4]

Sadly, you cannot also drag and drop object in the Query window, as well as rename objects with double click (if you have the rights, of course).

As we continue with the downsides I need to mention that the various scripting options when right-clicking the DB or object are also missing and there is no Designer for tables which I commonly use.

Another thing I couldn’t find as well is something like the Activity monitor in SQL Operations Studio.

Using ALT key to select a box of items also is unavailable (works in VS 2015).
As for the most commonly used hotkeys the differences are the following between SSMS and SQL Operations Studio:

F5, CTRL + E or ALT + X - Still executes the currently selected code in both studios;

CTRL + R - That shows/hides the Results pane doesn’t work;

CTRL + N - Does open a new query window;

CTRL + L - Display the query execution plan is not available;

CTRL + I - Display the incremental search dialog doesn’t work;

Alt + F1 (sp_help) - Also is not working in Operations Studio.

Only 2 out of 6 – not a good score.

There are many more differences between the two and probably in future SQL Operations Studio will get much more functionalities as well as SSMS. For now, at least for my DB work I will stick to SSMS because of the extra options that are not yet available in Operations Studio but crucial for my work – e.g. numerous hotkeys, administration GUI and scripting options like generate scripts, import, export, etc. Another major reason that should not be underestimated is that SQL Operations Studio is still in preview mode which means that potentially there are a lot of bugs still to be discovered.

Nonetheless, your preferences might be totally different and there may be many other reasons for you to choose Operations Studio over SSMS. If you decide to give it a try please let me know as I am keen to further investigate all the new upgrades that are coming out. I’ll keep an eye on the upcoming updates and will be releasing more blogs on the topic.

References:

https://visualstudiomagazine.com/blogs/data-driver/2017/11/sql-operations-studio.aspx

https://twitter.com/sqldatatools

https://blogs.technet.microsoft.com/dataplatforminsider/2018/01/17/the-january-release-of-sql-operations-studio-is-now-available/

https://visualstudiomagazine.com/articles/2018/01/22/sql-ops-studio.aspx\

https://logicalread.com/first-look-sql-operations-studio/#.WnLCKIhuaUk

https://logicalread.com/sql-server-management-studio-10-tips-and-tricks-mo01/#.WnLD74huaUk