How to Lie with Statistics (by Darrell Huff)

I just finished reading this 140 paperback about statistics, and I found it enlightening:
How to Lie with Statistics: Written by Darrell Huff, 1993 Edition, (Reissue) Publisher: W. W. Norton & Company [Paperback]

It was written back in 1954, but the topics it covers are timeless, and -not surprisingly- Mr Huff has few very useful tips applicable to nowadays business intelligence. You can only tell that the text is 60 years old when looking at the details: in one example there is the price of a good hotel room in San Francisco (an eye popping figure: 8$).

One of the first Chapters explains the “biased sample” problem, which affects so many scientific results; another one points the finger to an old trick used to convey distorted messages with line graphs. Many of the wrongdoings he exposes, sounds hard to believe, are still in use today, and you can find plenty of them in ads, magazines, and (sadly) in government polls. Not earlier than one month ago, one very famous cable news network was showing a chart picturing an “impressive” improvement in Spanish unemployment rate: from 26% down to 22% (ouch!). It was something like this:

spunemp2

What they failed to show was data from preceding years:

spunemp1

data source: Spain Unemployment Rate 1976-2015

One of my favorite part is when Mr Huff shows how unions or entrepreneurs give inconsistent (to say the least) results from the same set of data. Or how “experts” sum percentages in an mind bending attempt to rewrite the math rules: “buy any 20 products, wait one year, see their price increase by 5%, so overall the cost of life (20 x 5% = 100%) has doubled in one year…”.

Oh, one last point: don’t forget to read the exhilarating story about the rabbit burger…

There OAuth to be a better way (Power BI)

Edit: The base URL has changed from https://api.powerbi.com/beta/myorg to https://api.powerbi.com/v1.0/myorg

OAuth is clearly becoming the de-facto standard for authenticating API calls around the web. In the business intelligence arena, while we can discuss its pros and cons, we undoubtedly must get acquainted with it because sooner or later we’ll be tasked with importing data from one of the many “OAuth protected” web services.

In this third release of the OAuth series I’ll show how to get an Access Token from Microsoft Power BI. The same procedure can be used for many other Azure services (by changing the appropriate scope in the resource parameter).

Our typical scenario would be an unattended server process downloading data; I’m using a headless Linux box with cURL and jq.

You can see here and here my previous posts about how to use cURL to authenticate with flickr or BigQuery and make API calls with an Access Token.

The way Azure works is a little different. You get an Access Token valid for 1 hour and a Refresh Token. You can reuse the Access Token for as many calls as you want during the hour, and then you’ll need to ask for a new Access Token presenting the Refresh Token.

Prerequisites:

  • An Azure subscription with a real work domain (no personal account)

loginazure

  • An Azure Active Directory so you can add users to @yourdomain

azuread

  • A Power BI subscription with a user belonging to your Azure Active Directory

azure users

Once you have the requisites in place, follow this article to create an app and get a Client ID.

clientid

With the username, password and the Client ID you can use this script to get an Access Token:

[code language=”bash” gutter=”true” light=”false”]
#!/bin/bash

OAUTH_CLIENT_ID="yourClientId"
OAUTH_USERNAME="youruser@yourdomain.com"
OAUTH_PASSWORD="yourpassword"

POST_RESULT="$(curl -s -X POST -d "resource=https://analysis.windows.net/powerbi/api&client_id="$OAUTH_CLIENT_ID"&grant_type=password&username="$OAUTH_USERNAME"&password="$OAUTH_PASSWORD"&scope=openid" "https://login.windows.net/common/oauth2/token" | jq -r .)"

REFRESH_TOKEN="$(echo ${POST_RESULT} | /usr/local/bin/jq -r .refresh_token)"
ACCESS_TOKEN="$(echo ${POST_RESULT} | /usr/local/bin/jq -r .access_token)"
AUTH_HEADER="Authorization: Bearer ${ACCESS_TOKEN}"
echo "${AUTH_HEADER}"

echo "${AUTH_HEADER}" > ./auth_header.txt
echo "${REFRESH_TOKEN}" > ./refresh_token.txt

[/code]

This script will save two files: one is the Authorization Header and the other is the Refresh Token. You will use the Authorization Header passing it to every API call that you make (during 1 hour), for example to get a list of the available datasets in your Power BI storage use:

[code language=”bash” gutter=”true” light=”false”]
#!/bin/bash

AUTH_HEADER=$(<./auth_header.txt)
curl -k -s "https://api.powerbi.com/beta/myorg/datasets" -H "$AUTH_HEADER" | /usr/local/bin/jq -r .

[/code]

After an hour or so, you will ask for a new Access Token and store the new Authorization Header (can also be crontabbed every nn minutes):

[code language=”bash” gutter=”true” light=”false”]
#!/bin/bash

REFRESH_TOKEN=$(<./refresh_token.txt)
OAUTH_CLIENT_ID="yourClientId"
OAUTH_USERNAME="youruser@yourdomain.com"
OAUTH_PASSWORD="yourpassword"

POST_RESULT="$(curl -k -s -X POST -d "resource=https://analysis.windows.net/powerbi/api&client_id="$OAUTH_CLIENT_ID"&grant_type=refresh_token&username="$OAUTH_USERNAME"&password="$OAUTH_PASSWORD"&scope=openid&refresh_token=${REFRESH_TOKEN}" "https://login.windows.net/common/oauth2/token" | jq -r .)"

REFRESH_TOKEN="$(echo ${POST_RESULT} | /usr/local/bin/jq -r .refresh_token)"
ACCESS_TOKEN="$(echo ${POST_RESULT} | /usr/local/bin/jq -r .access_token)"

AUTH_HEADER="Authorization: Bearer ${ACCESS_TOKEN}"

echo "${AUTH_HEADER}"
echo "${AUTH_HEADER}" > ./auth_header.txt
echo "${REFRESH_TOKEN}" > ./refresh_token.txt

[/code]

So, no browser, no GUI, no problem!