Sacha Tomey

Sacha Tomey's Blog

The Azure Modern Data Warehouse: Unparalleled Performance

Today, 80% of organisations adopt cloud-first strategies to scale, reduce costs, capitalise on new capabilities including advanced analytics and AI and to remain competitive. Cloud-adoption is accelerating, and data exploitation is a key driver.

The central tenet to this enterprise-wide exploitation of data is the cloud-based Modern Data Warehouse. Legacy on-premises or appliance based EDWs, that were once the strategic asset for only the largest of enterprise organisations, not only limit performance, and flexibility, but are also harder to set up & scale securely.

The Modern Data Warehouse fundamentally changes the landscape for data analytics by making analytics available to everyone across organisations of all sizes, and not only the largest enterprise.

A modern data analytics platform enables you to bring together huge volumes of relational and non-relational, or structured and unstructured data into a single repository; the highly scalable and cost-effective Azure Data Lake. This provides access across the enterprise from basic information consumption through to innovation led data science.

Big data processing capability for data preparation, such as transformation and cleansing, can be performed as well as infusing Machine Learning and AI with the results made readily available for analysis through visual tools like Power BI.

Azure provides unparalleled performance at incredible value. To further support this claim, Microsoft have just announced the GigaOM TPC-DS Benchmark Results that further cements Azure SQL Data Warehouse as a leader for price/performance for both decision support benchmarks, having already attained best price/performance status for the TPC-H benchmark, announced back in Feb 2019.

TPC-DS @ 30TB
$ per Query per Hour

DS_thumb1

TPC-H @ 30TB
$ per Query per Hour

H_thumb58

Azure SQL Data Warehouse (Azure SQL DW) always delivered on performance when compared to alternatives, and now GigaOm found analytics on Azure is 12x faster and 73% cheaper when compared using the TPC-DS benchmark. Azure SQL DW has established itself as the alternative to on-premises data warehouse platforms and leader in Cloud Analytics.

Adatis have been at the cutting edge of Cloud Analytics Solutions since the introduction of the Azure SQL Data Warehouse PaaS offering back in 2015. In the last 18 months we have noticed the profile of Azure SQL DW rise sharply; with Azure SQL DW outperforming and taking over workloads from its closest competitors.

We specialise in all aspects of delivering the value of Cloud Analytics, AI and the Modern Data Warehouse, from strategic business value led engagements through technical design and implementation to on-going continuous improvement via fully managed DataOps practices.

Arch_thumb[7]

Adatis utilise Microsoft Azure technologies, in conjunction with first-party Spark based services, that securely integrate to provide enterprise-grade, cloud-scale analytics and insight for all and partner deeply with Microsoft to enable data driven transformation for our customers. We work to develop a modern data analytics strategy and ensure it is implemented and supported in the best way possible, aligning to your specific company’s goals and overriding strategy.

If you want to find out how Adatis can help you make sense of your data and learn more about the Modern Data Warehouse, please join us in London on 6th June for an exclusive workshop. We will guide you through the Microsoft landscape and showcase how we can help you get more value from your data, wherever you are on your data transformation journey.

Register here for our "Put your Data to Work" in-person event to be held in London on 6th June 2019

Additional Resources

Microsoft Azure Big Data and Analytics

Information on Azure SQL Data Warehouse

Try Azure SQL Data Warehouse for Free

#SimplyUnmatched, #analytics, #Azure, #AzureSQLDW, #MSPowerBI

Migrating to Native Scoring with SQL Server 2017 PREDICT

Microsoft introduced native predictive model scoring with the release of SQL Server 2017.

The PREDICT function (Documentation) is now a native T-SQL function that eliminates having to score using R or Python through the sp_execute_external_script procedure. It's an alternative to sp_rxPredict. In both cases you do not need to install R but with PREDICT you do not need to enable SQLCLR either - it's truly native.

PREDICT should make predictions much faster as the process avoids having to marshal the data between SQL Server and Machine Learning Services (Previously R Services).

Migrating from the original sp_execute_external_script approach to the new native approach tripped me up so I thought I'd share a quick summary of what I have learned.

Stumble One:

Error occurred during execution of the builtin function 'PREDICT' with HRESULT 0x80004001. 
Model type is unsupported.

Reason:

Not all models are supported. At the time of writing, only the following models are supported:

  • rxLinMod
  • rxLogit
  • rxBTrees
  • rxDtree
  • rxdForest

sp_rxPredict supports additional models including those available in the MicrosoftML package for R (I was using attempting to use rxFastTrees). I presume this limitation will reduce over time. The list of supported models is referenced in the PREDICT function (Documentation).

Stumble Two:

Error occurred during execution of the builtin function 'PREDICT' with HRESULT 0x80070057. 
Model is corrupt or invalid.

Reason:

The serialisation of the model needs to be modified for use by PREDICT. Typically you might serialise your model in R like this:

model <- data.frame(model=as.raw(serialize(model, NULL)))

Instead you need to use the rxSerializeModel method:

model <- data.frame(rxSerializeModel(model, realtimeScoringOnly = TRUE))

There's a corresponding rxUnserializeModel method, so it's worth updating the serialisation across the board so models can be used interchangeably in the event all models are eventually supported.  I have been a bit legacy.

That's it.  Oh, apart from the fact PREDICT is supported in Azure SQL DB, despite the documentation saying the contrary.