michael.hausenblas@deri.org | @mhausenblason Twitter | http://profiles.google.com/Michael.Hausenblas on Google+
You’re likely sitting on a treasure chest …
… and the treasure is the data: - about energy usage, election data, products, geographical data, flights, development data, emissions, water quality, waste management, planning applications, education, etc.
But people don’t like data, what people like and use are applications. Applications produce, consume, manipulate, distribute, store, search, access … and sometimes destroy data.
But how do we get applications out of the data, and, for starters … how do we get the data?
Who of you knows this fellow here? I’d like to think of him as a data superstar. His name is Hans Rosling and he is a Swedish guy who deals mainly with statistical data.He coined a term for this problem: the so called database hugging disorder.Meaning: people and institutions, even if they are aware of their data, tend to not share it or hide it in applications.
Tim Berner’s Lee 5-star plan …★ Make your data available on the Web under an open license★★ Make it available as structured data(Excel sheet instead of image scan of a table) ★★★ Use a non-proprietary format (CSV file instead of an Excel sheet) ★★★★ Use Linked Data format (URIs to identify things, structured data such as in microdata, Atom/OData, RDF to represent data)★★★★★ Link your data to otherpeople’s data to provide context
Stepwise migration from inaccessible, locked-down data sources to open, publicly available, structured and ‘pre-integrated’ data sources.
So, how do we really get there? How do we get the data out of the wallet gardens?
People need to be able to come together (also virtually) and exchange thoughts, ideas, etc.
… they need to see what others do about it.
Where do we get the data from?
You might be lucky and find already some data,for example, via http://ie.ckan.net/
… or local data catalogs …
… but typically you need to invest a bit into freeing the data ;)
And how do we make the data available?
You might need to think about in which format you make your data available … there are quite some to choose from.
And of course you’ll need some tools for cleaning and publishing the data …
But there is another problem we’re facing … how to represent and exchange the terms we’re talking about …
No matter if we’re looking at the public sector or in the industry … we need to express the terms and the relationships between the terms
Schema.org, introduced in June 2011 by the ‘big three’, mainly for SEO …
A collection of terms (some 300 concepts and 200 properties or relations between the concepts)
Ah, and don’t forget … we’re talking about OPEN data, so a clear license does matter.