[Webinar] SpiraTest - Setting New Standards in Quality Assurance
Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems
1. Mining Big Data and Open
Knowledge Sources to develop
transparent and serendipitous
content-based adaptive systems
Cataldo Musto, Giovanni Semeraro, Fedelucio Narducci
2. state of the art.
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
3. our research: personalization
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
4. Recommender Systems
Relevant items (movies, news, books, etc.) are pushed to the
user according to her preferences or her needs.
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
5. Amazon.com
Recommendations
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
6. current recommendation technologies share three
important drawbacks.
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
7. (1) training is a bottleneck.
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
8. need for
explicit
information
about
user interests.
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
9. (2) recsys are black boxes.
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
10. (3) suggestions are not surprising.
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
11. exploiting big data to build a novel generation
of content-based adaptive systems
solution
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
12. current work.
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
near future work.
13. C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
14. big data.
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
15. Information
Overload
we can handle 126 bits of information
we deal with 393 bits of information
ratio: more than 3x(Source: Adrian C.Ott,The 24-hour customer)
consequence:
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
16. Information Overload
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
17. Big Data: obstacle or
opportunity?
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
18. cornestone 1
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
exploit social media to
model user
preferences.
19. social media are an opportunity
provide information about user preferences
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
20. example
user preferences in music from Facebook
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
21. implicit preferences
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
example
22. Play.me
playlist
Most popular songs of the artists extracted from Last.fm (as well as
those added through the enrichment) are proposed to the user.
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
23. Myusic
recommendations
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
24. cornestone 2
exploit entity linking algorithms
to make user profiles more
transparent and LOD-aware
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
25. MyFeeds
RSS recommendations
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
26. MyFeeds
transparent user preferences
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
extracted from Facebook.
27. MyFeeds
transparent user preferences
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
further processing
28. MyFeeds
entity linking algorithms
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
• They map free text with structured
information
• Wikipedia pages or DBpedia nodes
• examples
• Tag.me ,Wikipedia Miner, DBpedia
Spotlight, etc.
29. Tag.me
extracts the Wikipedia pages the content refers to.
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
30. Linked Open Data Cloud
Structured
(RDF)
representation
of the information
stored in Wikipedia.
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
31. Linked Open Data Cloud
Profiles based
on Tag.me are
LOD-aware
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
32. cornestone 3
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
exploit open knowledge sources
to make recommendation
techniques more serendipitous.
33. ‘in vitro’ experiments
Watchmi plug-in
developed by Aprico.tv
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
34. From BOW to eBOW
Given a description of a TV show, we exploit ESA to
obtain an enhanced representation
The original set of features is enriched with the set of
Wikipedia articles related the most with theTV show
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
35. TV SHOW
Rad an Rad
Die besten Duelle der MotoGP
(Wheel to wheel
The best duels in the MotoGP)
Wikipedia(Articles(
großer&preis&von&italien&
(motorrad)&
großer&preis&von&malaysia&
(motorrad)&
großer&preis&von&tschechien&
(motorrad)&
scuderia&ferrari&
valen8no&rossi&
motorrad9wm9saison&2005&
motorrad9wm9saison&2006&
max&biaggi&
großer&preis&der&usa&(motorrad)&
motorrad9wm9saison&2008&
rad&(heraldik)&
loris&capirossi&
shin’ya&nakano&
motogp&
example
From BOW to eBOW
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
36. challenges.
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
issues.
recommendations.
37. Challenges and Issues
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
• Main challenge and issue:
• data representation and data filtering
• How to exploit these novel data sylos?
• What information is relevant for personalization?
• What kind of processing do data need?
• Which one is the best representation?
• Do reasoning techniques improve profiles transparency and
personalization accuracy?
• Do people accept the exploitation of these data?
• How to model the context?
38. Recommendations
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
• Cornerstones
• Social media-based user profiling
• LOD-aware user profiles
• Open Knowledge Sources for Serendipitous Encounters
• Recommendations
• Promote the LOD initiative, to publish data in a structured
form, to enable reasoning on the information
• Make data sylos interconnected
• To design applications able to properly model, manage and
exploit the big amount of data coming from social media.