Sacha Tomey

Sacha Tomey's Blog

The Azure Modern Data Warehouse: Unparalleled Performance

Today, 80% of organisations adopt cloud-first strategies to scale, reduce costs, capitalise on new capabilities including advanced analytics and AI and to remain competitive. Cloud-adoption is accelerating, and data exploitation is a key driver.

The central tenet to this enterprise-wide exploitation of data is the cloud-based Modern Data Warehouse. Legacy on-premises or appliance based EDWs, that were once the strategic asset for only the largest of enterprise organisations, not only limit performance, and flexibility, but are also harder to set up & scale securely.

The Modern Data Warehouse fundamentally changes the landscape for data analytics by making analytics available to everyone across organisations of all sizes, and not only the largest enterprise.

A modern data analytics platform enables you to bring together huge volumes of relational and non-relational, or structured and unstructured data into a single repository; the highly scalable and cost-effective Azure Data Lake. This provides access across the enterprise from basic information consumption through to innovation led data science.

Big data processing capability for data preparation, such as transformation and cleansing, can be performed as well as infusing Machine Learning and AI with the results made readily available for analysis through visual tools like Power BI.

Azure provides unparalleled performance at incredible value. To further support this claim, Microsoft have just announced the GigaOM TPC-DS Benchmark Results that further cements Azure SQL Data Warehouse as a leader for price/performance for both decision support benchmarks, having already attained best price/performance status for the TPC-H benchmark, announced back in Feb 2019.

TPC-DS @ 30TB
$ per Query per Hour

DS_thumb1

TPC-H @ 30TB
$ per Query per Hour

H_thumb58

Azure SQL Data Warehouse (Azure SQL DW) always delivered on performance when compared to alternatives, and now GigaOm found analytics on Azure is 12x faster and 73% cheaper when compared using the TPC-DS benchmark. Azure SQL DW has established itself as the alternative to on-premises data warehouse platforms and leader in Cloud Analytics.

Adatis have been at the cutting edge of Cloud Analytics Solutions since the introduction of the Azure SQL Data Warehouse PaaS offering back in 2015. In the last 18 months we have noticed the profile of Azure SQL DW rise sharply; with Azure SQL DW outperforming and taking over workloads from its closest competitors.

We specialise in all aspects of delivering the value of Cloud Analytics, AI and the Modern Data Warehouse, from strategic business value led engagements through technical design and implementation to on-going continuous improvement via fully managed DataOps practices.

Arch_thumb[7]

Adatis utilise Microsoft Azure technologies, in conjunction with first-party Spark based services, that securely integrate to provide enterprise-grade, cloud-scale analytics and insight for all and partner deeply with Microsoft to enable data driven transformation for our customers. We work to develop a modern data analytics strategy and ensure it is implemented and supported in the best way possible, aligning to your specific company’s goals and overriding strategy.

If you want to find out how Adatis can help you make sense of your data and learn more about the Modern Data Warehouse, please join us in London on 6th June for an exclusive workshop. We will guide you through the Microsoft landscape and showcase how we can help you get more value from your data, wherever you are on your data transformation journey.

Register here for our "Put your Data to Work" in-person event to be held in London on 6th June 2019

Additional Resources

Microsoft Azure Big Data and Analytics

Information on Azure SQL Data Warehouse

Try Azure SQL Data Warehouse for Free

#SimplyUnmatched, #analytics, #Azure, #AzureSQLDW, #MSPowerBI

Power BI Streaming Datasets–An Alternative PowerShell Push Script

I attended the London Power BI Meetup last night. Guest speaker was Peter Myers On the topic of "Delivering Real-Time Power BI Dashboards With Power BI." It was a great session.

Peter showed off three mechanisms for streaming data to a real time dashboard:

  • The Power BI Rest API
  • Azure Stream Analytics
  • Streaming Datasets

We've done a fair bit at Adatis with the first two and whilst I was aware of the August 2016 feature, Streaming Datasets I'd never got round to looking at them in depth. Now, having seen them in action I wish I had - they are much quicker to set up than the other two options and require little to no development effort to get going - pretty good for demo scenarios or when you want to get something streaming pretty quickly at low cost.

You can find out more about Streaming Datasets and how to set them up here: https://powerbi.microsoft.com/en-us/documentation/powerbi-service-real-time-streaming/

If you create a new Streaming Dataset using 'API' as the source, Power BI will provide you with an example PowerShell script to send a single row of data into the dataset.  To extend this, I've hacked together a PowerShell script and that loops and sends 'random' data to the dataset. If you create a Streaming Dataset that matches the schema below, the PowerShell script further below will work immediately (subject to you replacing the endpoint information). If you create a different target streaming dataset you can easily modify the PowerShell script to continually push data into that dataset too.

I’ve shared this here, mainly as a repository for me, when I need it, but hopefully to benefit others too.

Streaming Dataset Schema

clip_image001_thumb

Alternative PowerShell Script

Just remember to copy the Power BI end point to the relevant location in the script.

You can find the end point (or Push URL) for the Dataset by navigating to the API Info area within the Streaming Dataset management page within the Power BI Service:

SNAGHTML1104da5e_thumb

# Initialise Stream
$sleepDuration = 1  #PowerBI seldom updates realtime dashboards faster than once per second.
$eventsToSend = 500 #Change this to determine how many events are part of the stream
$endpoint = "[INSERT YOUR ENDPOINT HERE]"
    
# Initialise the Payload
$payload = @{EventDate = '' ; EventValue = 0; EventSource = ''}

# Initialise Event Sources
$eventSource = @('Source1', 'Source2', 'Source3')

# Iterate until $eventsToSend events have been sent
$index = 1 
do
{
    # Update payload
    $payload.EventDate = Get-Date -format s
    $source = Get-Random -Minimum 0 -Maximum 3 
    $payload.EventSource = $eventSource[$source]
    $value = Get-Random -Minimum 0.00 -Maximum 101.00 
    $payload.EventValue = $value

    # Send the event
    Invoke-RestMethod -Method Post -Uri "$endpoint" -Body (ConvertTo-Json @($payload))

    # Report what has been sent
    "`nEvent {0}" -f $index
    $payload

    # Sleep for a second
    Start-Sleep $sleepDuration

    # Ready for the next iteration
    $index++

} While ($index -le $eventsToSend)

# Finished
"`n{0} Events Sent" -f $eventsToSend