Simon

Simon Whiteley's Blog

A Guide to Azure SQL DataWarehouse

So you've heard the hype - the Azure SQL DW is going to solve all of your problems in one fell swoop… Right? Well… maybe. The system itself is a mix of technologies designed for low concurrency analytics across huge amounts of relational data. In short, it's a cloud-scalable T-SQL-based MPP platform, with all the benefits and restrictions that performing everything in parallel brings. If your problem can be solved by performing lots of calculations over small of your data before aggregating the results into a whole, this is the technology for you.

However, before you jump right in, be aware that SQLDW is a very different beast to other SQL tools. There are specific concepts you need to be familiar with before building your system, otherwise you're not going to see the promised performance gains and will likely lose faith very quickly!

I'm in the process of documenting these concepts, plus there is a wealth of information available on the Azure SQLDW site. For the next few months, I'll be running through the blog topics below and updating the links accordingly. If there are topics you'd like me to add to the list, please get in touch!

Azure SQLDW Core Concepts:

- What is it?

- How Does Scaling Work?

- Distributions

- Polybase

- CTAS

- Resource Classes

- Partitioning

Designing ETL (or ELT) In Azure SQLDW

- Row counts

- Statistics

- Surrogate Keys

Performance Tuning Azure SQLDW

- Plan Exploration in SQLDW

- Data Movement Types

- Minimising Data Movement

Managing Azure SQLDW

- Backup & Restore

- Monitoring Distributions

- System Monitoring

- Job Orchestration

- Scaling and Management

- Performance Tuning Queries

Azure SQLDW Architecture

- Presentation Layers

- Data Lake Integrations

Comments (2) -

  • Nick

    6/2/2017 7:53:18 AM | Reply

    I'd be interested in seeing how you would solve the Source Control issue on AzureDW, given that it appears to be 100% manual at the moment (no SSDT Database Project support, and no 3rd Party Source control tooling).

    Manual tracking of DB changes (tables and sprocs) is just asking for trouble.

    • Simon Whiteley

      6/14/2017 11:09:30 AM | Reply

      Hi Nick - afraid there's no nice answer at the moment! I still use database projects to organise my SQL code, structuring by schema/object type etc. I then pull together deployment scripts using Powershell but, you're right, this doesn't allow for change tracking/schema drift etc - it's more a wipe & replace approach which works for greenfield new builds, rubbish for ongoing incremental deployment.

      There are a few Microsoft chaps who go a step further and deploy their code (with PDW/SQLDW specific bits commented out) to an Azure SQLDB. They perform change tracking etc against the SQLDB and so can generate change scripts, drift reports etc from that. Then, when they're ready to deploy their code, then run a bit of powershell to uncomment those SQLDW bits.

      Both approaches are clunky and flawed, but until the SSDT team integrates d-sql files into database projects there's not much else we can do. That said, haven't looked to see if Redgate have caught up yet, might be worth a bit of a dig at their SQLCompare tool to see if they have SQLDW on the roadmap?

      Thanks for reading!
      Simon

Loading