7. The Data Era
• The 2011 Digital University Study:
Extracting Value from Chaos (IDC)
– We have entered the Zettabyte era (a
trillion gigabytes or a billion terabytes)
– The great of information growth appears
to be exceeding Moore’s Law
http://www.emc.com/collateral/demos/
microsites/emc-digital-universe-
2011/index.htm
8. Big Value from
Data
• Big Data: The next frontier for
innovation, competition and
productivity (McKinsey)
– $300 billion potential annual value to
US health care
– €250 billion potential annual value to
Europe’s public sector administration
http://www.mckinsey.com/mgi/publications/big_data/pdfs/M
GI_big_data_full_report.pdf
9. IBM City Forward
The Smarter Cities Challenge is a competitive grant program
awarding $50 million worth of IBM expertise over the next three
years to 100 cities around the globe. Designed to address the
wide range of challenges facing cities today
10. Consumption
• We need to provide efficient ways to
consume data in order to extract the
value out of it, the knowledge
– Syntactic approaches (visual analytics)
• The data is collected, centralized and analysed
• Visualizations for humans to extract knowledge
– Semantic approaches
• The information is distributed / interlinked
• Semantic structures are added to the data so
that machines can better understand it
11. Syntactic
approaches
• Some examples
– Gap Minder
– IBM many eyes
– Google Public Data Explorer
– Google correlate
– Google N-Gram viewer
• What is the most popular hair
colour in the literature?
13. Semantic
approaches
• The Semantic Web is an extension of
the current web in which information
is given well-defined meaning,
better enabling computers and people
to work in cooperation
Tim Berners-Lee, James Hendler,
Ora Lassila, The Semantic Web,
Scientific American, May 2001
14. The SW vision
• Use semantic structures
(ontologies) to represent data.
Provide machines with the ability to
interpret and extract knowledge
=
15. Adding Structure
• Two paths towards the SW vision
– Metadata embedded in HTML
• Microformats
• RDFa
• Microdata
– Linked Data
• Putting the data online in a standard, web
enabled representation (RDF)
• Make the data Web addressable (URIs)
16. Metadata in HTML
<div class="vcard">
• An example <div class="fn org">Knowledge Media Institute</div>
<div class="adr">
Knowledge Media Institute <div class="street-address">Walton Hall</div>
Walton Hall <div>
Milton Keynes <span class="locality">Milton Keynes</span>,
MK7 6AA <span class="postal-code">MK7 6AA</span>
</div>
<div class="country-name">United Kingdom</div>
</div>
</div>
17. Metadata in HTML
• Schema.org
Semantically enhanced Information Retrieval:
an ontology-based approach
http://people.kmi.open.ac.uk/miriam/about/
22. BBC
• Programs
• Music
• Artist
• World Cup
Who won it? ;)
23. Open University
DBPedia RAE
Data from
OpenLearn
Research
Content ORO Outputs
Exposed as linked
data, our data Archive of
Library’s
Course
Currently: OUeach
interlink withgeonames Catalogue
public Material
Of Digital
data sit in the external
other and different Content
data.gov.uk
systemsbecome to
world: – hard part A/V Material
of the “global data
discover, obtain,
Podcasts
iTunesU
space” on the Web
integrate by users.BBC
DBLP
25. The Value
• Recognized as a critical step forward
for the HE sector in the UK
– Favor transparency and reuse of data,
both externally and internally
– Reduces cost of dealing with our own
public data
– Enable both new kinds of applications,
and to make the ones that are already
feasible more cost effective
26.
27. The Value
• Linking educational material across
universities http://smartproducts1.kmi.open.ac.uk/
web-linkeduniversities/index.htm
30. Conclusions
• We have reached the Data Era
– Production: currently more than a
Zettabyte of information in the digital world
and increasing really fast
– Consumption: syntactic and semantic
approaches have emerged to extract the
value (the knowledge) out of the data
– Challenges: Provide machines with the
capabilities to extract the knowledge for us!
31. Conclusions
• Many more challenges ahead…
– Different formats (text vs. multimedia)
– Different dynamics (time / location)
– Different provenance
– Different topics (heterogeneous)
– Distributed, Massive, stream
– Various quality
–…
32. THX!
• Any ideas to make me rich? ☺
=
• Slide_share: http://www.slideshare.net/miriamfs
• Website: http://people.kmi.open.ac.uk/miriam/about/
• Twitter: @miri_fs
Thanks to Fouad Zablith and Mathieu d'Aquin ☺ for sharing with me some of their slides and
for their valuable comments on this presentation