A map, a map! My kingdom for a Choropleth map! (Power BI)

One of the most fascinating aspect of Business Intelligence is the power to picture data in an easy to understand way. I personally like maps a lot, not the google or bing type with lots of circular points which give very low added value to the information, but the Choropleth maps. Those shaped areas with shaded colors intended to convey an idea at the first sight.

Choropleth maps are mostly used with regional data but they can picture any kind of information on any kind of shape, the idea behind is very simple: you have a SVG image (a vectorial one), with one id for each closed area. Remember those days when we were kids and used the coloring book to paint on pre-designed images?
330fdcec81c9109b4ee013c58e188774

That’s it. Then you load your data and assign to each id a numeric value to get a colored area with a increasing intensity of color.

This is exactly what the Synoptic panel does.

It’s a custom visual that you can use with Power BI, more on this here. You can download the current version from gitHub, and import it into your Power BI report (yes, it’s free).

The Synoptic panel itself doesn’t paint the picture, you need to have a coded SVG to use with it. And creating a coded SVG is not difficult, you may use any SVG editor, or just go to http://synoptic.design which is an extremely useful web build by the same people who did the custom visual.

I must thank @DanielePerilli for his very good job and his continuous support. He was very responsive and helped me through building my first map.

spanish elections 2011

So in about an hour and a half here we have two maps with polls data from the Spanish elections in 2011. I downloaded the CSV from http://www.electionresources.org/es/data/index_es.html#PROVINCIAS and used the Synoptic panel to create two maps where I filtered respectively the votes obtained by PP and PSOE.

No rocket science, anybody can do that. Just a little caveat, it’s not very intuitive: if you want to have different shades of color, you should drag a field to the “Saturation Values” box in the visualization settings. And if you want to display names on the map, you need to enable “Category Labels” in the format tab of visualization settings. That took me more than 5 minutes to figure out.

Anyway, go download it, create your own map or use those available in the gallery and happy coloring!

Average of Averages… no, seriuosly.

Tom, Bill and Sandra have one euro each; they go and buy some fruits.

Tom  buys 3 apples for 1€. Bill buys 2 oranges for 1€ and Sandra buys a coconut for 1€.
Tom says “apples cost 0,33 euro cents on average”.
Bill “oranges are 0,50 euro cents on average”
Sandra “coconuts are 1 euro on average”

What is the overall average price of fruits ?

I have seen more than once calculating this as:

0.33 + 0.50 + 1 = 1.83
1.83 / 3 = 0.61

Now, you are a Business Intelligence professional: imagine your customer asking you to calculate an average, such as the price per product grouped by product type and the total overall price per product at company level.
Next he dig into his laptop and proudly shows off his “treasure” Excel sheet that does it all, and by magic copy and paste computes the AVG() of the AVG()…

Unfortunately for your client, acrobatic math is not yet an Olympic discipline.

Tom, Bill and Sandra soon realize that they bought 3 + 2 + 1 = 6 pieces of fruit and they spent 3€, so a single piece is 0,50 euro cents.

Similarly your customer (after an intense discussion) will realize that for the last 15 years he used to calculate a price per product average “from a different point of view”. You may want to ease the pain by saying that it is a quite common error and he’s in good company: see Simpson’s paradox and the Berkeley gender bias case

In MicroStrategy terms this is what is called a “Smart Metric”.
smartmetric

When in the Metric Editor, switch to the Subtotals / Aggregation tab, down to the left there’s a check box that you can enable to surprise your customer… cool.

There OAuth to be a better way (Power BI)

Edit: The base URL has changed from https://api.powerbi.com/beta/myorg to https://api.powerbi.com/v1.0/myorg

OAuth is clearly becoming the de-facto standard for authenticating API calls around the web. In the business intelligence arena, while we can discuss its pros and cons, we undoubtedly must get acquainted with it because sooner or later we’ll be tasked with importing data from one of the many “OAuth protected” web services.

In this third release of the OAuth series I’ll show how to get an Access Token from Microsoft Power BI. The same procedure can be used for many other Azure services (by changing the appropriate scope in the resource parameter).

Our typical scenario would be an unattended server process downloading data; I’m using a headless Linux box with cURL and jq.

You can see here and here my previous posts about how to use cURL to authenticate with flickr or BigQuery and make API calls with an Access Token.

The way Azure works is a little different. You get an Access Token valid for 1 hour and a Refresh Token. You can reuse the Access Token for as many calls as you want during the hour, and then you’ll need to ask for a new Access Token presenting the Refresh Token.

Prerequisites:

  • An Azure subscription with a real work domain (no personal account)

loginazure

  • An Azure Active Directory so you can add users to @yourdomain

azuread

  • A Power BI subscription with a user belonging to your Azure Active Directory

azure users

Once you have the requisites in place, follow this article to create an app and get a Client ID.

clientid

With the username, password and the Client ID you can use this script to get an Access Token:

[code language=”bash” gutter=”true” light=”false”]
#!/bin/bash

OAUTH_CLIENT_ID="yourClientId"
OAUTH_USERNAME="youruser@yourdomain.com"
OAUTH_PASSWORD="yourpassword"

POST_RESULT="$(curl -s -X POST -d "resource=https://analysis.windows.net/powerbi/api&client_id="$OAUTH_CLIENT_ID"&grant_type=password&username="$OAUTH_USERNAME"&password="$OAUTH_PASSWORD"&scope=openid" "https://login.windows.net/common/oauth2/token" | jq -r .)"

REFRESH_TOKEN="$(echo ${POST_RESULT} | /usr/local/bin/jq -r .refresh_token)"
ACCESS_TOKEN="$(echo ${POST_RESULT} | /usr/local/bin/jq -r .access_token)"
AUTH_HEADER="Authorization: Bearer ${ACCESS_TOKEN}"
echo "${AUTH_HEADER}"

echo "${AUTH_HEADER}" > ./auth_header.txt
echo "${REFRESH_TOKEN}" > ./refresh_token.txt

[/code]

This script will save two files: one is the Authorization Header and the other is the Refresh Token. You will use the Authorization Header passing it to every API call that you make (during 1 hour), for example to get a list of the available datasets in your Power BI storage use:

[code language=”bash” gutter=”true” light=”false”]
#!/bin/bash

AUTH_HEADER=$(<./auth_header.txt)
curl -k -s "https://api.powerbi.com/beta/myorg/datasets" -H "$AUTH_HEADER" | /usr/local/bin/jq -r .

[/code]

After an hour or so, you will ask for a new Access Token and store the new Authorization Header (can also be crontabbed every nn minutes):

[code language=”bash” gutter=”true” light=”false”]
#!/bin/bash

REFRESH_TOKEN=$(<./refresh_token.txt)
OAUTH_CLIENT_ID="yourClientId"
OAUTH_USERNAME="youruser@yourdomain.com"
OAUTH_PASSWORD="yourpassword"

POST_RESULT="$(curl -k -s -X POST -d "resource=https://analysis.windows.net/powerbi/api&client_id="$OAUTH_CLIENT_ID"&grant_type=refresh_token&username="$OAUTH_USERNAME"&password="$OAUTH_PASSWORD"&scope=openid&refresh_token=${REFRESH_TOKEN}" "https://login.windows.net/common/oauth2/token" | jq -r .)"

REFRESH_TOKEN="$(echo ${POST_RESULT} | /usr/local/bin/jq -r .refresh_token)"
ACCESS_TOKEN="$(echo ${POST_RESULT} | /usr/local/bin/jq -r .access_token)"

AUTH_HEADER="Authorization: Bearer ${ACCESS_TOKEN}"

echo "${AUTH_HEADER}"
echo "${AUTH_HEADER}" > ./auth_header.txt
echo "${REFRESH_TOKEN}" > ./refresh_token.txt

[/code]

So, no browser, no GUI, no problem!

Some like it OAth (BigQuery)

Second issue of the series about OAuth (you can see the previous here). I’ll try to explain how to use the Google OAuth 2.0 mechanism to authorize a server side scripts without the need of the user’s credentials, in order to automate the INSERT or SELECT (or WHATEVER) processes.

I am testing this with BigQuery service, so in another post I’ll be able to get data from the sample datasets provided by Google.

Prerequisites:

First we need to enable BigQuery for our project (go create one if you don’t have one):

  1. enable google api
  2.  bigquery api

When we click on the Enabled APIs tab, we should see it listed:

bigquery api enabled

Go to APIs & auth / Credentials section on the left and Create new Client ID with the Service account option:

service account

Your browser will download a json file, that you should copy it in a safe place, but we’re not going to use it.

email address

note down the Email address that was generated for your Client ID.

You’ll need to generate also a P12 file, by clicking on the Generate new P12 key button, and download it:

p12 key

notice the pass-phrase for the P12 file (by default is notasecret).

Suppose the downloaded P12 file is named tutorial.p12. We will convert this P12 file to a PEM by issuing this command in a terminal console:

[code language=”bash” gutter=”true” light=”false”]
openssl pkcs12 -passin pass:notasecret -in tutorial.p12 -nocerts -nodes -out tutorial.pem
[/code]

this will remove the pass-phrase from the P12 file and generate a tutorial.pem that we can use with openssl.

Next step is to get an Access Token from Google with a JSON Web Token (JWT): we create the JWT with the Email address above plus some boilerplate constants. I’m not going into details on how the JWT is generated, but it is essentially a base64 encoded string composed by a header, a claim set and a signature; the signature is a little more tricky as it is calculated using SHA-256 hashing algorithm and has the characters forward slash [/] and underscore [_] substituted respectively with plus [+] and minus [-] signs. See documentation here. The JWT is finally sent via a HTTP POST call to https://www.googleapis.com/oauth2/v3/token.

here is the complete bash script, please change the EMAIL_ADDRESS variable with the appropriate value:

[code language=”bash” gutter=”true” light=”false”]

#!/bin/bash

JWT_HEADER="$(echo -n ‘{"alg":"RS256","typ":"JWT"}’ | /usr/bin/openssl base64 -A -e)"
#echo JWT_HEADER=$JWT_HEADER

EMAIL_ADDRESS="11327003054-us……………..g@developer.gserviceaccount.com"

JWT_CLAIM_SET="$(echo -n "{"iss":"$EMAIL_ADDRESS","scope":"https://www.googleapis.com/auth/bigquery.readonly","aud":"https://www.googleapis.com/oauth2/v3/token","exp":"$(($(date +%s)+3600))","iat":"$(date +%s)"}" | /usr/bin/openssl base64 -A -e  | /bin/sed ‘s/=//g’)"
#echo JWT_CLAIM_SET=$JWT_CLAIM_SET

JWT_SIGNATURE_INPUT=$JWT_HEADER.$JWT_CLAIM_SET
#echo JWT_SIGNATURE_INPUT=$JWT_SIGNATURE_INPUT

JWT_SIGNATURE="$(echo -n $JWT_SIGNATURE_INPUT | /usr/bin/openssl sha -sha256 -sign tutorial.pem | /usr/bin/openssl base64 -A -e | /bin/sed ‘s/=//g’ | /usr/bin/tr ‘/+’ ‘_-‘)"
#echo JWT_SIGNATURE=$JWT_SIGNATURE

JWT=$JWT_HEADER.$JWT_CLAIM_SET.$JWT_SIGNATURE
#echo JWT=$JWT

/usr/bin/curl -s -d "grant_type=urn:ietf:params:oauth:grant-type:jwt-bearer&assertion="$JWT -X POST "https://www.googleapis.com/oauth2/v3/token" | /usr/bin/jq -r .access_token

[/code]

this script will output the Access Token retrieved from Google. The Access token is valid for 1 hour, you can store it in a text file and reuse for your API calls until it expires, then redo from start.

# standing on the shoulders of giants: thanks to http://superuser.com/questions/606953/bash-oauth-2-0-jwt-script-for-server-to-google-server-applications