Microsoft Azure SQL Data Warehouse. How much does it cost?

As many of you know, Microsoft has open to public preview its latest BI technology: Azure SQL Data Warehouse.
On the documentation page is defined as: “an enterprise-class distributed database capable of processing petabyte volumes of relational and non-relational data.”
How much does it cost, is it worth it, and how does it differ from previous SQL offering on Azure?
The main difference -in my opinion- is the billing of compute power and storage:
in Azure SQL DW you decide how fast your query run and how big your data is. This generates a charge for DWU (Data Warehouse Unit), a unit of measure defined as the level of performance of your data warehouse; and a charge for RA-GRS Page Blobs (Read-Access Geographically Redundant Storage).
I suppose this will cause more than a headache in financial departments and SQL administrators, but it is indeed a good thing.

The main point I see here is that -wisely used- the DWU pricing can help tune the database responsiveness and number-crunching speed according to the time of the day (or day of week). I can set for example 100DWU on low traffic hours or weekends, and scale up to 2000DWU Monday morning or during the reporting season when everybody want to access data. I can even choose to pause the database -for example during holidays- and incur in no compute  charge at all. It would be extremely useful to have PowerShell scripts to switch from a DWU level to another and schedule them so that the database can auto-adjusts the compute power needed.

Doing a very simple math: assuming a 30 days month, at 100DWU we will receive a $504 (USD) bill for computing + $62.5 (USD) for 1TB of data, a total $566.5* for a fully functioning SQL Data Warehouse. On the other end of the spectrum, 2000DWU for 1 month would be $10,080(USD)*.

Additionally to the Grow, shrink, or pause capabilities, Azure SQL Data Warehouse features an MPP architecture, column-store indexes, and the ability to query relational and non relational (read Hadoop) with Polybase, more on this here.

So my .02: let’s give it a try…

*During the preview period the prices are discounted, I do not expect them to vary that much anyway, but certainly at the moment it is not clear how much the discount is and how much will the price be after the preview, see here for more details:

The Lahman's Baseball Database – real life sample data

Mr Lahman is (by his own words) a Watchdog reporter & data journalist, and an author.
But the collective wisdom of the Business Intelligence world reminds him as the creator of a wonderful public dataset: the Baseball Database.

The Lahman database is extremely useful and I am grateful to his author for several reasons:

  1. It is not a bike/sport/equipment shop sales database, that almost all examples in the BI world use (my book is not an exception…)
  2. It is real life data with an interesting subject, and frequently updated.
  3. And it has errors (very few), those little nice imperfections that come very handy when you are testing/training/developing a test case or a proof of concept.
I don’t remember who told that PowerPoint is always right and nothing can go wrong with a slide presentation -as long as the projector is working and the usb drive is plugged in- but he surely would agree that it’s no fun to demo with a database that is perfectly crafted to return exact results.
I never found in my life a “squeaky clean” production db. As a general rule there is always a primary key missing, an unenforceable foreign key, or missing records, outdated documentation, you name it…that’s why I like the Lahman db, because I can experiment -under a controlled environment- how the software behave in situations that are beyond the standard Show and Tell “look-ma-this-works” case scenarios.
It is available for free and you can download it in several formats from here:
I converted it to Oracle and SQL Server, and you can download those version too from here:
In Oracle, simply run IMPDP with the SCHEMAS=LAHMAN option, the password for the LAHMAN schema is (guess) LAHMAN.
In SQL Server copy the MDF into the DATA folder and attach it to an existing SQL Engine instance.
Hope this helps, I will often use this data to showcase MicroStrategy and other products in my next posts.