SlideShare una empresa de Scribd logo
1 de 21
Descargar para leer sin conexión
Improving the Search Experience
in a Social Network with Cross
Media Contents
Daniele Cenni, Paolo Nesi
University of Florence
Department of Systems and Informatics
Distributed Systems and Internet Technology Laboratory
Paolo.nesi@unifi.it
cenni@dsi.unifi.it , http://www.disit.dinfo.unifi.it
DMS2013, August 2013, UK, Paolo Nesi 1
ECLAP Social Network
 ECLAP is a Digital Library on Performing
Arts connected with Europeana
 ECLAP is a Best Practice and Social
Network (blogs, forums, comments,
tagging, voting, …)
DMS2013, August 2013, UK, Paolo Nesi 2
Goals/Requirements
 Develop an Indexing/Searching solution for ECLAP Social
Network allowing:
 Indexing multilingual crossmedia content metadata and
data (e.g. documents)
 Indexing portal blogs, forums, events, group pages,
comments, etc.
 Efficient multilingual search (keyword search and advanced
search) supporting:
 misspelled words (e.g. shespeare)
 partial word search
 Sorting and filtering search results
 re-index the whole data without blocking the system
 Log and monitor users activity
 …
 Evaluate the Indexing/Searchig service
DMS2013, August 2013, UK, Paolo Nesi 3
ECLAP ANY content kind
 Informative Content
 Video, audio, images,
documents
 3D, animations, Braille
 Slide, Video-Slide, courses
 eBook, ePub, Mpeg21,
intelligent
 Aggregated Content:
 Playlist, Collections
 Annotations,
Synchronization
 Support and networking
content:
 Blog, WebPage, Events,
comments,
forum, votes, messages, …
4
comments
rating
relationships
technical
Dynamic
recommend
……………
• Performance
• Master classes
• Scene Sketches
• Scenography
• Scenes
• Private lives of
artists
• Scores
• Braille
• BackStage Stills
• Choreography
• Morals
• Poster
• Booklets
• Magazines Music
• Audio ballets
ECLAP Semantic Model 1
DMS2013, August 2013, UK, Paolo Nesi
Media Object
Video Audio
Document
Group/Channel
CollectionPlaylist
0..n
0..n
1..n
0..n
Image
AVObjectAnnotation
0..n 1..2
1..n
0..n
ForumWebPage
CommentContentTaxonomyTerm
0..n 0..n 0..n1
0..n
0..n
Blog
Metadata
Performing
Arts
Dublin Core
Technical
Main
Annotation
Side
Annotation
1..n
1
GeoName
Crossmedia
Archive
Event
epub
3D
IPR
Braille Music
Score
5
ECLAP Semantic Model 2
DMS2013, August 2013, UK, Paolo Nesi
User Group/Channel
Content
Media Object
Comment
Annotation
TaxonomyTerm
foaf:member
admin
isProvidedBy
isFavouriteOf
dc:creator
dc:creator
foaf:topic_interest
isFeaturedBy
foaf:knows
6
Indexing
 Indexing & Search system
 Based on Apache Solr
 Multilingual aspects
 Translate the metadata or translate the query?.. both
 metadata translation
 Query translation
 Indexing schema
 Dublin Core + DCTerms (multi language)
 Performing Arts
 Technical (provider, content type, GPS, IPR, duration, quality, …)
 Groups associations (multi language)
 Taxonomy associations (multi language)
 Comments & multi language tags
 FullText of the textual digital resources
DMS2013, August 2013, UK, Paolo Nesi 7
Indexing
DMS2013, August 2013, UK, Paolo Nesi 8
Metadata Schema Indexing
DMS2013, August 2013, UK, Paolo Nesi 9
Search Facilities
 Full text search
 Uses the catch all fields to search for keywords in
most important fields in all languages (title,
description, text, body, subject,…)
 Fuzzy search
 Allows matching mistyped words
 Deep search
 Allows searching for partial words
 Faceted Search
 Maximasing Precision and Recall:
 Relevance & boosting terms
DMS2013, August 2013, UK, Paolo Nesi 10
Search Facilities vs Information
DMS2013, August 2013, UK, Paolo Nesi 11
Searching
 Faceted search
DMS2013, August 2013, UK, Paolo Nesi 12
Weighted Query Model
 Where for the “q” query
 Weights are boosting fields
 Title is DC.Title, description DC.Description….,
 Body is textual body, subject…,
 taxonomy the full description of the taxonomy
branch
DMS2013, August 2013, UK, Paolo Nesi 13
Model Optimization
 Optimization of the Precision&Recall to
improve search quality
 50 reference queries
 Optimization Methods
 Simulated Annealing
 Genetic Algorithms
 7 parameters
DMS2013, August 2013, UK, Paolo Nesi 14
Monte Carlo Analysis
MAP: Mean Average PrecisionDMS2013, August 2013, UK, Paolo Nesi 15
DMS2013, August 2013, UK, Paolo Nesi 16
Some weights’ Trends
DMS2013, August 2013, UK, Paolo Nesi 17
Comparative Results
MAP: Mean Average PrecisionDMS2013, August 2013, UK, Paolo Nesi 18
Usage Results
 Over than 500.000 visits
 7.29 minutes of permanence on the
portal
DMS2013, August 2013, UK, Paolo Nesi 19
Assessment of Search Facility
 Distribution of performed clicks
First page
DMS2013, August 2013, UK, Paolo Nesi 20
Conclusions
 indexing solution for
 cross media for multilingual metadata and texts
 Improved Searching & filtering results and thus user experience
quality
 Providing: (full text, operators), advanced, faceted, etc.
 Precision and Recall analysis allowed to tune the search
services
 Simulated Annealing and Genetic Algorithms produced similar
results
 User behavior assessment has shown that search facility
appreciation has been improved wrt to early previous
settings, grounded on common sense and classical
metadata relevance
DMS2013, August 2013, UK, Paolo Nesi 21

Más contenido relacionado

Similar a Improving the Search Experience in a Social Network with Cross Media Contents

Indexing and Searching Cross Media Content in a Social Network
Indexing and Searching Cross Media Content in a Social NetworkIndexing and Searching Cross Media Content in a Social Network
Indexing and Searching Cross Media Content in a Social NetworkPaolo Nesi
 
Slawek Korea
Slawek KoreaSlawek Korea
Slawek KoreaSlawek
 
WP3 Further specification of Functionality and Interoperability - Gradmann / ...
WP3 Further specification of Functionality and Interoperability - Gradmann / ...WP3 Further specification of Functionality and Interoperability - Gradmann / ...
WP3 Further specification of Functionality and Interoperability - Gradmann / ...Europeana
 
Intro to Digitization Projects
Intro to Digitization ProjectsIntro to Digitization Projects
Intro to Digitization Projectszsrlibrary
 
Geo-annotations in Semantic Digital Libraries
Geo-annotations in Semantic Digital Libraries Geo-annotations in Semantic Digital Libraries
Geo-annotations in Semantic Digital Libraries mdabrowski
 
NLP on Hadoop: A Distributed Framework for NLP-Based Keyword and Keyphrase Ex...
NLP on Hadoop: A Distributed Framework for NLP-Based Keyword and Keyphrase Ex...NLP on Hadoop: A Distributed Framework for NLP-Based Keyword and Keyphrase Ex...
NLP on Hadoop: A Distributed Framework for NLP-Based Keyword and Keyphrase Ex...Paolo Nesi
 
Usability & User-Centred Design
Usability & User-Centred DesignUsability & User-Centred Design
Usability & User-Centred Designboonious
 
MPEG-7 Services in Community Engines
MPEG-7 Services in Community EnginesMPEG-7 Services in Community Engines
MPEG-7 Services in Community EnginesRalf Klamma
 
Gettingstartedwithdigitalcollectionsweb[1]
Gettingstartedwithdigitalcollectionsweb[1]Gettingstartedwithdigitalcollectionsweb[1]
Gettingstartedwithdigitalcollectionsweb[1]guest410707c
 
Information Architecture
Information ArchitectureInformation Architecture
Information ArchitectureOlivier Tripet
 
Panel: Social Tagging and Folksonomies: Indexing, Retrieving... and Beyond? ...
Panel: Social Tagging and Folksonomies: Indexing, Retrieving... and Beyond? ...Panel: Social Tagging and Folksonomies: Indexing, Retrieving... and Beyond? ...
Panel: Social Tagging and Folksonomies: Indexing, Retrieving... and Beyond? ...jacekg
 
Accessibility, Automation and Metadata
Accessibility, Automation and MetadataAccessibility, Automation and Metadata
Accessibility, Automation and Metadatalisbk
 
RDF Data and Image Annotations in ResearchSpace (paper)
RDF Data and Image Annotations in ResearchSpace (paper)RDF Data and Image Annotations in ResearchSpace (paper)
RDF Data and Image Annotations in ResearchSpace (paper)Vladimir Alexiev, PhD, PMP
 
Modular Documentation Joe Gelb Techshoret 2009
Modular Documentation Joe Gelb Techshoret 2009Modular Documentation Joe Gelb Techshoret 2009
Modular Documentation Joe Gelb Techshoret 2009Suite Solutions
 
Institutional Services and Tools for Content, Metadata and IPR Management
Institutional Services and Tools for Content, Metadata and IPR ManagementInstitutional Services and Tools for Content, Metadata and IPR Management
Institutional Services and Tools for Content, Metadata and IPR ManagementPaolo Nesi
 
A Learning to Rank Project on a Daily Song Ranking Problem
A Learning to Rank Project on a Daily Song Ranking ProblemA Learning to Rank Project on a Daily Song Ranking Problem
A Learning to Rank Project on a Daily Song Ranking ProblemSease
 

Similar a Improving the Search Experience in a Social Network with Cross Media Contents (20)

Indexing and Searching Cross Media Content in a Social Network
Indexing and Searching Cross Media Content in a Social NetworkIndexing and Searching Cross Media Content in a Social Network
Indexing and Searching Cross Media Content in a Social Network
 
Slawek Korea
Slawek KoreaSlawek Korea
Slawek Korea
 
WP3 Further specification of Functionality and Interoperability - Gradmann / ...
WP3 Further specification of Functionality and Interoperability - Gradmann / ...WP3 Further specification of Functionality and Interoperability - Gradmann / ...
WP3 Further specification of Functionality and Interoperability - Gradmann / ...
 
Intro to Digitization Projects
Intro to Digitization ProjectsIntro to Digitization Projects
Intro to Digitization Projects
 
UCIAD overview
UCIAD overviewUCIAD overview
UCIAD overview
 
Geo-annotations in Semantic Digital Libraries
Geo-annotations in Semantic Digital Libraries Geo-annotations in Semantic Digital Libraries
Geo-annotations in Semantic Digital Libraries
 
NLP on Hadoop: A Distributed Framework for NLP-Based Keyword and Keyphrase Ex...
NLP on Hadoop: A Distributed Framework for NLP-Based Keyword and Keyphrase Ex...NLP on Hadoop: A Distributed Framework for NLP-Based Keyword and Keyphrase Ex...
NLP on Hadoop: A Distributed Framework for NLP-Based Keyword and Keyphrase Ex...
 
Semantic Web in Action
Semantic Web in ActionSemantic Web in Action
Semantic Web in Action
 
Usability & User-Centred Design
Usability & User-Centred DesignUsability & User-Centred Design
Usability & User-Centred Design
 
MPEG-7 Services in Community Engines
MPEG-7 Services in Community EnginesMPEG-7 Services in Community Engines
MPEG-7 Services in Community Engines
 
Gettingstartedwithdigitalcollectionsweb[1]
Gettingstartedwithdigitalcollectionsweb[1]Gettingstartedwithdigitalcollectionsweb[1]
Gettingstartedwithdigitalcollectionsweb[1]
 
Information Architecture
Information ArchitectureInformation Architecture
Information Architecture
 
Panel: Social Tagging and Folksonomies: Indexing, Retrieving... and Beyond? ...
Panel: Social Tagging and Folksonomies: Indexing, Retrieving... and Beyond? ...Panel: Social Tagging and Folksonomies: Indexing, Retrieving... and Beyond? ...
Panel: Social Tagging and Folksonomies: Indexing, Retrieving... and Beyond? ...
 
Accessibility, Automation and Metadata
Accessibility, Automation and MetadataAccessibility, Automation and Metadata
Accessibility, Automation and Metadata
 
Tech WG report 2011
Tech WG report 2011Tech WG report 2011
Tech WG report 2011
 
JeromeDL Tutorial
JeromeDL TutorialJeromeDL Tutorial
JeromeDL Tutorial
 
RDF Data and Image Annotations in ResearchSpace (paper)
RDF Data and Image Annotations in ResearchSpace (paper)RDF Data and Image Annotations in ResearchSpace (paper)
RDF Data and Image Annotations in ResearchSpace (paper)
 
Modular Documentation Joe Gelb Techshoret 2009
Modular Documentation Joe Gelb Techshoret 2009Modular Documentation Joe Gelb Techshoret 2009
Modular Documentation Joe Gelb Techshoret 2009
 
Institutional Services and Tools for Content, Metadata and IPR Management
Institutional Services and Tools for Content, Metadata and IPR ManagementInstitutional Services and Tools for Content, Metadata and IPR Management
Institutional Services and Tools for Content, Metadata and IPR Management
 
A Learning to Rank Project on a Daily Song Ranking Problem
A Learning to Rank Project on a Daily Song Ranking ProblemA Learning to Rank Project on a Daily Song Ranking Problem
A Learning to Rank Project on a Daily Song Ranking Problem
 

Último

Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 

Último (20)

Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

Improving the Search Experience in a Social Network with Cross Media Contents

  • 1. Improving the Search Experience in a Social Network with Cross Media Contents Daniele Cenni, Paolo Nesi University of Florence Department of Systems and Informatics Distributed Systems and Internet Technology Laboratory Paolo.nesi@unifi.it cenni@dsi.unifi.it , http://www.disit.dinfo.unifi.it DMS2013, August 2013, UK, Paolo Nesi 1
  • 2. ECLAP Social Network  ECLAP is a Digital Library on Performing Arts connected with Europeana  ECLAP is a Best Practice and Social Network (blogs, forums, comments, tagging, voting, …) DMS2013, August 2013, UK, Paolo Nesi 2
  • 3. Goals/Requirements  Develop an Indexing/Searching solution for ECLAP Social Network allowing:  Indexing multilingual crossmedia content metadata and data (e.g. documents)  Indexing portal blogs, forums, events, group pages, comments, etc.  Efficient multilingual search (keyword search and advanced search) supporting:  misspelled words (e.g. shespeare)  partial word search  Sorting and filtering search results  re-index the whole data without blocking the system  Log and monitor users activity  …  Evaluate the Indexing/Searchig service DMS2013, August 2013, UK, Paolo Nesi 3
  • 4. ECLAP ANY content kind  Informative Content  Video, audio, images, documents  3D, animations, Braille  Slide, Video-Slide, courses  eBook, ePub, Mpeg21, intelligent  Aggregated Content:  Playlist, Collections  Annotations, Synchronization  Support and networking content:  Blog, WebPage, Events, comments, forum, votes, messages, … 4 comments rating relationships technical Dynamic recommend …………… • Performance • Master classes • Scene Sketches • Scenography • Scenes • Private lives of artists • Scores • Braille • BackStage Stills • Choreography • Morals • Poster • Booklets • Magazines Music • Audio ballets
  • 5. ECLAP Semantic Model 1 DMS2013, August 2013, UK, Paolo Nesi Media Object Video Audio Document Group/Channel CollectionPlaylist 0..n 0..n 1..n 0..n Image AVObjectAnnotation 0..n 1..2 1..n 0..n ForumWebPage CommentContentTaxonomyTerm 0..n 0..n 0..n1 0..n 0..n Blog Metadata Performing Arts Dublin Core Technical Main Annotation Side Annotation 1..n 1 GeoName Crossmedia Archive Event epub 3D IPR Braille Music Score 5
  • 6. ECLAP Semantic Model 2 DMS2013, August 2013, UK, Paolo Nesi User Group/Channel Content Media Object Comment Annotation TaxonomyTerm foaf:member admin isProvidedBy isFavouriteOf dc:creator dc:creator foaf:topic_interest isFeaturedBy foaf:knows 6
  • 7. Indexing  Indexing & Search system  Based on Apache Solr  Multilingual aspects  Translate the metadata or translate the query?.. both  metadata translation  Query translation  Indexing schema  Dublin Core + DCTerms (multi language)  Performing Arts  Technical (provider, content type, GPS, IPR, duration, quality, …)  Groups associations (multi language)  Taxonomy associations (multi language)  Comments & multi language tags  FullText of the textual digital resources DMS2013, August 2013, UK, Paolo Nesi 7
  • 9. Metadata Schema Indexing DMS2013, August 2013, UK, Paolo Nesi 9
  • 10. Search Facilities  Full text search  Uses the catch all fields to search for keywords in most important fields in all languages (title, description, text, body, subject,…)  Fuzzy search  Allows matching mistyped words  Deep search  Allows searching for partial words  Faceted Search  Maximasing Precision and Recall:  Relevance & boosting terms DMS2013, August 2013, UK, Paolo Nesi 10
  • 11. Search Facilities vs Information DMS2013, August 2013, UK, Paolo Nesi 11
  • 12. Searching  Faceted search DMS2013, August 2013, UK, Paolo Nesi 12
  • 13. Weighted Query Model  Where for the “q” query  Weights are boosting fields  Title is DC.Title, description DC.Description….,  Body is textual body, subject…,  taxonomy the full description of the taxonomy branch DMS2013, August 2013, UK, Paolo Nesi 13
  • 14. Model Optimization  Optimization of the Precision&Recall to improve search quality  50 reference queries  Optimization Methods  Simulated Annealing  Genetic Algorithms  7 parameters DMS2013, August 2013, UK, Paolo Nesi 14
  • 15. Monte Carlo Analysis MAP: Mean Average PrecisionDMS2013, August 2013, UK, Paolo Nesi 15
  • 16. DMS2013, August 2013, UK, Paolo Nesi 16
  • 17. Some weights’ Trends DMS2013, August 2013, UK, Paolo Nesi 17
  • 18. Comparative Results MAP: Mean Average PrecisionDMS2013, August 2013, UK, Paolo Nesi 18
  • 19. Usage Results  Over than 500.000 visits  7.29 minutes of permanence on the portal DMS2013, August 2013, UK, Paolo Nesi 19
  • 20. Assessment of Search Facility  Distribution of performed clicks First page DMS2013, August 2013, UK, Paolo Nesi 20
  • 21. Conclusions  indexing solution for  cross media for multilingual metadata and texts  Improved Searching & filtering results and thus user experience quality  Providing: (full text, operators), advanced, faceted, etc.  Precision and Recall analysis allowed to tune the search services  Simulated Annealing and Genetic Algorithms produced similar results  User behavior assessment has shown that search facility appreciation has been improved wrt to early previous settings, grounded on common sense and classical metadata relevance DMS2013, August 2013, UK, Paolo Nesi 21