The Lahman’s Baseball Database – real life sample data
Mr Lahman is (by his own words) a Watchdog reporter & data journalist, and an author.
But the collective wisdom of the Business Intelligence world reminds him as the creator of a wonderful public dataset: the Baseball Database.
The Lahman database is extremely useful and I am grateful to his author for several reasons:
- It is not a bike/sport/equipment shop sales database, that almost all examples in the BI world use (my book is not an exception…)
- It is real life data with an interesting subject, and frequently updated.
- And it has errors (very few), those little nice imperfections that come very handy when you are testing/training/developing a test case or a proof of concept.
I don’t remember who told that PowerPoint is always right and nothing can go wrong with a slide presentation -as long as the projector is working and the usb drive is plugged in- but he surely would agree that it’s no fun to demo with a database that is perfectly crafted to return exact results.
I never found in my life a “squeaky clean” production db. As a general rule there is always a primary key missing, an unenforceable foreign key, or missing records, outdated documentation, you name it…that’s why I like the Lahman db, because I can experiment -under a controlled environment- how the software behave in situations that are beyond the standard Show and Tell “look-ma-this-works” case scenarios.
It is available for free and you can download it in several formats from here:
I converted it to Oracle and SQL Server, and you can download those version too from here:
Oracle Data Pump: https://www.dropbox.com/s/kcrilso23b45oa3/EXPDAT.DMP?dl=0
SQL Server .MDF file: https://www.dropbox.com/s/j406i1yqp6ce3ol/Lahman%20Baseball%20Database_Data.mdf?dl=0
In Oracle, simply run IMPDP with the SCHEMAS=LAHMAN option, the password for the LAHMAN schema is (guess) LAHMAN.
In SQL Server copy the MDF into the DATA folder and attach it to an existing SQL Engine instance.
Hope this helps, I will often use this data to showcase MicroStrategy and other products in my next posts.