MicroStrategy and LinkedIn

With custom SSO it is possible to allow users log into MicroStrategy using their LinkedIn account. By leveraging the OAuth API offered by the social network we can authorize and create new users without previously knowing their names. Once validated and connected it’s simple to use they user ID to retrieve profile, groups or even post to the network update stream.

See an example at: http://moraschi.eu/MicroStrategy

You will be asked to authorize the application to access your LinkedIn account,

and you’re in.

To revoke authorization, from the Home page in LinkedIn, go to the menu with your name in the upper right corner and click <Settings>; in the settings page click on <Groups, Companies & Applications> link, then on <View your Applications>. From this page you can remove Euro-Strategy from the list.

please post feedbacks here or mail them to euostat.microstrategy@gmail.com



Chapter 2 – The Oracle VALUES table

An article I’ve read time ago gave me the idea to attach an extra table to manage the variable number of columns in different data sets*. This article about dimensional modeling was introducing the concept of “helper table”. In few words, I can create a single column and store a reference (a code, an ID) to another table which holds the real dimensions, no matter how many columns they are. I simply need to create a fake code for the dimension group and link the two tables on that code.
In the “helper table” I may have a row for every dimension, which would be the correct think to do if we want to comply strictly with the normalization rules.
So for the example TSTRAD in Chapt. 1and the helper table would be stored this way:
But I am using Oracle 10g and in this version there is no easy way later to transform several rows into several columns; say for example that I want to select all dimensions in group 1 and display them as columns with the respective value. This would be a very complicated query involving DECODE and MAX and who knows what else. So I am taking advantage of an Oracle feature: the ability to store custom type objects inside table columns, so I create a data type to hold an array of 10 dimensions:


This type will hold up to ten varchar2 strings and since varrays in Oracle are numbered sequentially I will always be sure that they are in correct order. So the “helper table” becomes:



Why do I do this? Can’t I simply store the varray in the original VALUES table and get rid of the helper table? Yes, I can, I do this just in case that two or more values happen to share the same exact dimension columns: in that case I will not need duplicate the information, I will simply reuse the code of the proper dimension group and feel an experienced database developer 🙂

Chapter 1 – The beginning

The institution where I work needs to analyze data from different sources with different statistical and modeling packages.

The way that the data is stored at the moment is baroque: tab or comma delimited ASCII data files, Excel sheets, and word of mouth. Multiple copies of the same data set may be stored on the network and -even worse- different and updated version can coexist. Result data sets are stored on local hard drives and are not visible to everyone. This can lead to embarrassing situations were at the same meeting two or more pie-chart of the same data set do not agree with each other.

So the aim of these posts is trying to find an elegant database solution to this: a way to build a common repository for different data sets from different softwares so that data can be easily exchanged and the history of the changes can be visible to everyone involved in data analysis. And -much better- a new interface will help maintain, correct, cleanse the various data sets.

The data sets can be about anything, from pollution data, to GDP, to fuel consumption or demographics. All data sets have both cross-sectional and time series dimensions (see Christopher Dougherty “Introduction to Econometrics 3e” book, for a good introduction to PANEL or LONGITUDINAL data sets)*.

There are different modeling packages on the market; in my job we use both GAMS® and VENSIM®. Also STATA® is used sometimes for specific functions. Every package has its own terminology and give different names to the same entity: for example VENSIM® uses “Subscripts” while GAMS® has “Sets” but they both refer to the same thing; this lead to confusion among people using different tools who need to share their data sets.

mmpdr aus xoc Y1996 12
mmpdr aus xoc Y1999 2
mmpdr aus xoc Y2000 15
mmpdr aus xoc Y2001 2
mmpdr aus xoc Y2002 3
mmpdr aus hkg Y1979 1
mmpdr aus hkg Y1980 1
mmpdr aus hkg Y1981 3
mmpdr aus hkg Y1982 1
mmpdr aus jpn Y1994 133
mmpdr aus jpn Y1995 17
mmpdr aus jpn Y1996 47
mmpdr aus jpn Y1997 23
mmpdr aus jpn Y1998 13
mmpdr aus jpn Y1999 22

Take this for example: the last column is a value, column number 4 looks like a time indicator. Now look at column 3; it can contain one of the following: (xoc, hkg or jpn). The list of all possible codes for column 3 is called “Subscript Range” in VENSIM® and “Set” in GAMS®.

Look at the first data row, it could be written this way:

TSTRAD[mmpdr, aus, xoc, Y1996]=12

So how do we call TSTRAD? We may call it a variable, and what’s the name of the codes inside parentheses? Subscripts, dimensions, indexes? There is a need for a common terminology across different software tools. So from now on we will call them variable and dimension respectively, and domain the list of all possible codes inside a column.

Another issue we are facing is that different data sets have different number of columns, and the number is not known “a priori”. If I want to store the example table above in a database I could easily create a table with 5 columns and set the primary key to the combination of the first four columns. But if I need to store another data set with 3 or 7 columns then I have to modify my table or create another table, leading to poor design and confusion in the names of the columns. Also, do I need to normalize this data set? For storage and speed sake I’d better replace all descriptive columns with numerical codes. Giving for example xoc a code of 1, hkg = 2, and jpn = 3, I can spare several bytes every row and make faster searches. Then I would need an extra reference table to explain what numerical codes mean.

In order to facilitate the table design and maintenance I will need a flexible table to store a variable number of descriptive columns and their respective value. VENSIM® and GAMS® have a limit on the number of columns for a given data set, this limit is 10 for both. We can then create a table with 11 fields and have a coffee break -end of the story- but we wouldn’t be satisfied, would we?