The Lahman’s Baseball Database – real life sample data
The Lahman database is one of the most (if not the most) complete free sports dataset
The Lahman database is one of the most (if not the most) complete free sports dataset
You may already know that Oracle has the capability to get data directly from the Web. Like in: SELECT HTTPURITYPE.createuri (‘http://www.economist.com/rss/leaders_rss.xml’).getclob () RSS FROM DUAL; This offers a lot of possibilities when you need to query external data and, combined with some XML magic, can make a cute RSS view out of a well-formed xmlRead More
If you search on the net for a trick to generate date rangees in Oracle, you most probably will end up here: http://stackoverflow.com/questions/418318/generate-a-range-of-dates-using-sql variuos interesting and mind-challenging ways to achieve the same result. I found this way the most cost effective and readable, yet flexible enough for my datawarehouse ETL processes. WITH LIMITS AS (SELECTRead More
An article I’ve read time ago gave me the idea to attach an extra table to manage the variable number of columns in different data sets*. This article about dimensional modeling was introducing the concept of “helper table”. In few words, I can create a single column and store a reference (a code, an ID) to another table which holds the real dimensions, no matter how many columns they are. I simply need to create a fake code for the dimension group and link the two tables on that code.
The aim of these posts is trying to find an elegant database solution to this: a way to build a common repository for different data sets from different softwares so that data can be easily exchanged and the history of the changes can be visible to everyone involved in data analysis. And -much better- a new interface will help maintain, correct, cleanse the various data sets.
Stay tuned for upcoming articles on how to read/write panel data to an Oracle database and read them from several mathematical or statistical packages.