Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Semantically Capturing and Representing News Stories on the Web

1.030 visualizaciones

Publicado el

The lack of context that a multimedia document taken in isolation can provide, hinders a proper understanding of the story being reported. International news items are a good example of such phenomena. Therefore, there is a need of unveiling other story's aspects that, even not being explicitly present in the seed document, are crucial to fully capture the backstory. To deal with this problem, we propose an innovative conceptual model called the News Semantic Snapshot (NSS) that is designed to make explicit the wide context of a news event. Following a process called Named Entity Expansion, we query the Web to bring other viewpoints about what is happening around us, from the thousands of news articles and posts where we could potentially find those missing story details. We have also proposed an innovative Concentric-based approach that better spots those contextual entities by leveraging on the duality between the so-called Core, which contains representative entities that are frequently mentioned in the related documents, and the the ones that hold particular semantic relationships with the Core and shape up the Crust around it.

Publicado en: Datos y análisis
  • Sé el primero en comentar

Semantically Capturing and Representing News Stories on the Web

  1. 1. Semantically Capturing and Representing News Stories on the Web José%Luis%Redondo%García Jluisred.github.io @peputo
  2. 2. Outline  Semantic Annotation of News’ Context Original artwork by Matt Might http://matt.might.net/articles/phd-school-in-pictures/   TOWARDS A SEMANTIC MULTIMEDIA WEB i.  Media annotation ii.  A multimedia model iii.  Semantic media exploitation   CONTEXTUALIZING NEWS STORIES i.  The News Semantic Snapshot (NSS) ii.  The multidimensional nature of the entity relevance iii.  A concentric model for NSS generation iv.  NSS in the consumption of News Future CareerPHDPrevious 1 2
  3. 3. Outline Semantically Capturing and Representing News Stories on the Web 3  Part II: Semantic Annotation of News’ Context Multidimensional Relevancy NSS Generation Concentric Model NSS Gold Standard News Prototypes 2016/03/04
  4. 4. The Use Case: Contextualizing News Semantically Capturing and Representing News Stories on the Web 4 http://www.bbc.com/news/world-europe-23339199#t=34.1,39.8 (Media Fragment URI 1.0) Edward Snowden (NE over Subtitles) Sarah Harrison WikiLeaks Editor Airport in Moscow Sheremetyevo 2016/03/04
  5. 5. Semantically Capturing and Representing News Stories on the Web 5 The Use Case: Contextualizing News 2016/03/04
  6. 6. Semantically Capturing and Representing News Stories on the Web 6 Research Questions  Q1: How can multimedia content be semantically annotated and seamlessly connected with other resources on the Web?  Q2: Can those semantic annotations and linked media resources bring value for the exploitation and consumption of multimedia content?  Q3: Is it possible to automatically contextualize news stories with background information so they can be effectively interpreted by humans and machines? 2016/03/04
  7. 7. Part 1 Towards a Semantic Multimedia Web Semantically Capturing and Representing News Stories on the Web 7 1 Q.1, Q.2 2016/03/04
  8. 8. “ Bringing Multimedia to the Web Why? Semantically Capturing and Representing News Stories on the Web 8   Make video a first citizen of the Web   Make video universally accessible and shareable at different granularities (segments)   Benefit from the vast knowledge already present on the Web 2016/03/04
  9. 9. Semantic Annotation   Alfonseca, E. and Manandhar. An unsupervised method for General Named Entity Recognition and Automated Concept Discovery   Mendes, P., Jakob, M. and Garcia-Silva, A and Bizer, C. DBpedia spotlight: shedding light on the web of documents   Shinyama, Y. and Sekine, S. Named entity discovery using comparable news articles   Chang, S-F, Manmatha, R and Chua, T-S. Combining text and audio-visual features in video indexing   Wang, Richard C. and Cohen, William W. Iterative Set Expansion of Named Entities Using the Web   Talukdar, P-P., Brants, T., Liberman, M. and Pereira, F. A. Context Pattern Induction Method for Named Entity Extraction Multimedia Modeling   MPEG-7 http://mpeg.chiariglione.org/ standards/mpeg-7/mpeg-7.htm   TV-Anytime http://tech.ebu.ch/tvanytime   Synchronized Multimedia Integration Language https://www.w3.org/TR/REC-smil/   Media Fragment URI 1.0 specification (W3C) http://www.w3.org/TR/media-frags ◉  Synote: http://linkeddata.synote.org ◉  Ninsuna: http://ninsuna.elis.ugent.be/   BBC Programmes Ontology http:// www.bbc.co.uk/ontologies/programmes/ 2009-09-07.shtml   Schema.org (SchemaDotOrgTV) http:// www.w3.org/wiki/WebSchemas/   Ontology for Media Resources https:// www.w3.org/TR/mediaont-10/   Web Annotation https://www.w3.org/TR/ annotation-model/ Semantically Capturing and Representing News Stories on the Web 9 State of the Art & Related Work Part 1 Named Entity Multimodal Expansion  2016/03/04
  10. 10. Multimedia Annotations Semantically Capturing and Representing News Stories on the Web 10   Automatic annotation: 300 hours/min YouTube video   What is inside the video? multimodal approach   Semantic annotations, leveraging on Web Resources: more human-like operations 1.a 2016/03/04
  11. 11. 1 ontology http://nerd.eurecom.fr/ontology 2 API http://nerd.eurecom.fr/api/application.wadl 3 UI http://nerd.eurecom.fr Multimedia Annotation: Named Entity Recognition Semantically Capturing and Representing News Stories on the Web 11 nerd:Product S-Bahn nerd:Person Obama nerd:Person Michelle nerd:Location Berlin http://data.linkedtv.eu/media/e2899e7f#t=840,900 Part 1.a https://github.com/giusepperizzo/nerdml ML [Rizzo_LREC’14] 2016/03/04
  12. 12. Other documents similar to DS b) Expanded Entities a) Entities from Seed Document DS Multimedia Annotation: Named Entity Expansion Semantically Capturing and Representing News Stories on the Web 12 [Redondo_SNOW’14] Part 1.a 2016/03/04
  13. 13. Multimedia Annotation: Expansion Pipeline Semantically Capturing and Representing News Stories on the Web 13 [Redondo_SNOW’14] Part 1.a Available @ http://linkedtv.eurecom.fr/entitycontext/api/ 2016/03/04
  14. 14. Multimedia Annotation: Multimodal Approach  Text: ○ Keyword Extraction ○ Topic Recognition ○ From Textual Visual Cues to LSCOM Concepts  Visual: ○ Visual Concept Detection (LSCOM) ○ Shot Segmentation ○ Scene Segmentation ○ Optical Character Recognition (OCR) ○ Automatic Speech Recognition (ASR) ○ Face Detection and Tracking ○ … 14 Multimedia Knowledge Model Part 1.a Semantically Capturing and Representing News Stories on the Web2016/03/04
  15. 15. Multimedia Model Semantically Capturing and Representing News Stories on the Web 15   Explicitly represent video and its annotations   At the level of fragments   Based on well-known vocabularies, flexible and extensible while being Linked Data compliant 1.b 2016/03/04
  16. 16. Multimedia Model: LinkedTV Model Semantically Capturing and Representing News Stories on the Web 16 Annotation Concept KeywordBBC Ontology + SchemaDotOrgTV ANALYSIS RESULTS (Support for segmentation) Media Fragments URI 1.0 (W3C) LSCOM Ontology for Media Resources (W3C) BROADCAST DATA Web Annotations (W3C) EXTERNAL DATASETS Entity NERD Provenance Ontology for Provenance Management Programme Brand Series Episode Version Broadcast ServiceBroadcast Channel Scene Shot MediaFragment Face Part 1.b Available @ http://data.linkedtv.eu/ontologies/core/ 2016/03/04
  17. 17. Semantically Capturing and Representing News Stories on the Web 17 Part 1.b Locator MediaResource MediaFragmentAnnotation Entity URL (hyperlink) Type OffsetBasedString Multimedia Model: LinkedTV Model 2016/03/04
  18. 18. Multimedia Model: TV2RDF Service Semantically Capturing and Representing News Stories on the Web 18 Part 1.b Content Publisher RDF Conversion + NERD TV2RDF AnalysisMetadata RDF Triplestore Available @ http://linkedtv.eurecom.fr/tv2rdf/ 2016/03/04
  19. 19. Exploiting Knowledge Semantically Capturing and Representing News Stories on the Web 19   Leverage on the Model & Annotations for advanced mining tasks   Probe the value of multimodal approach: Evaluation on standard corpora 1.c 2016/03/04
  20. 20. Semantically Capturing and Representing News Stories on the Web 20 Part 1.c Exploitation: Enriching oa:Annotation rbbaktuell_20120809 nerd:Location Berlin Illustrate seed video [Milicic_WWW'13] 2016/03/04
  21. 21. Exploitation: Enriching Services & Prototypes Semantically Capturing and Representing News Stories on the Web 21 Part 1.c Name URL Published @ MediaCollector http://linkedtv.eurecom.fr/api/mediacollector/search/ [Rizzo_SAM’12] MediaFinder http://mediafinder.eurecom.fr/ [Milicic_WWW’13] Italian Elections 2013 http://mediafinder.eurecom.fr/story/elezioni2013 [Milicic_ESWC’13] TVEnricher http://linkedtv.eurecom.fr/tvenricher/api/ [LinkedTV_D2.6’14] TVNewsEnricher http://linkedtv.eurecom.fr/newsenricher/api/ [Redondo_ESWC’14] 2016/03/04
  22. 22. Exploitation: Classifying videos Semantically Capturing and Representing News Stories on the Web 22 Part 1.c 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.fun channel 0 17 85 85 96 106 114 78 117 140 188 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 2.tech channel 0 410 453 402 396 404 353 364 344 374 571 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 3.sport channel 0 192 298 301 288 291 302 260 270 361 231 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 4.news channel 0 527 481 488 469 412 412 434 419 487 792 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 5.creation channel 0 259 272 245 186 149 177 165 165 143 205 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 6.lifestyle channel 0 1128 786 563 525 475 519 465 501 467 1567 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 7.shortfilms channel 0 169216431567156714971234121410991025 4268 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 8.music channel 0 204 222 186 129 166 131 148 137 125 169 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 9.other channel 0 423 495 451 401 404 356 354 368 338 689 Thing Amount Animal Event Function Loc Organization Person Product Time x−Axis: The temporal positions of NEs y−Axis: The number of NEs [Li_LIME'13]Dailymotion Dataset, 805 videos, 46.58% Accuracy0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 4.news channel 0 527 481 488 469 412 412 434 419 487 792 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 5.creation channel 0 259 272 245 186 149 177 16 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 7.shortfilms channel 0 169216431567156714971234121410991025 4268 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 8.music channel 0 204 222 186 129 166 131 14 Thing Amount Animal Event x−Axis: The temporal positions of NEs y−Axis: The number of NEs Temporal distribution of entity types 2016/03/04
  23. 23. Exploitation: Promoting Media Fragments Semantically Capturing and Representing News Stories on the Web 23 Part 1.c Available @ http://linkedtv.eurecom.fr/HyperTED [Redondo_ISWC’14] 2016/03/04
  24. 24. Evaluation: Multimodal @ Mediaeval 2013 Semantically Capturing and Representing News Stories on the Web 24 Part 1.c ~ 1697h of BBC video data, 2323 videos  Different TV shows (news, sports, politics…) from 2012  Subtitles and ASR (English)  Output of some visual algorithms: shot and face detection Anchor Search Task Hyperlinking Task Query T/V v1 v2 v3 vn v1 v2 v3 vn va 2016/03/04
  25. 25. Evaluation: Multimodal @ Mediaeval 2013 Semantically Capturing and Representing News Stories on the Web 25 Part 1.c Annotations Processing Time Type Visual Concept Detection (151) 20 days on 100 cores Visual ** Scene Segmentation 2 days on 6 cores Visual OCR 1 day on 10 cores Visual Keywords Extraction 5 hours Textual ** Named Entities Extraction 4 days Textual Face detection and Tracking 4 days on 160 cores Visual  Data Indexing: ◉  Lucene & Solr ◉  Granularities: Shot, Scenes, Sliding Windows… ◉  Multimodality  Query Formulation: ◉  Search: Text + Visual Cues + Visual Concept Mapping, MLSCOM ◉  Hyperlink: Subtitles, Keywords, LSCOM concepts (MoreLikeThis) Approach: 2016/03/04
  26. 26. 0.19 MRR (Mean R. Rank) Evaluation: Mediaeval 2013 Results Semantically Capturing and Representing News Stories on the Web 26 Part 1.c Search Task Hyperlinking Task [Sahuguet_MediaEval’13] 0,72 P10 2016/03/04
  27. 27. Evaluation: Mediaeval 2014 Results Semantically Capturing and Representing News Stories on the Web 27 Part 1.c Search Task [Hoang_MediaEval’14] Hyperlinking Task   Changes in 2014 edition: ◉  New Dataset from BBC: 2686 hours and 3520 videos ◉  No Visual Cues on Search Queries ◉  New Approach: 22% MAP improvement in 2013 Dataset 0.71 P10 0.67 P10 2016/03/04
  28. 28. “ Narrowing down… From Multimedia Content to News Items Semantically Capturing and Representing News Stories on the Web 282016/03/04
  29. 29. Part 2 Semantically Contextualizing News Stories Semantically Capturing and Representing News Stories on the Web 29 2 Q.3 2016/03/04
  30. 30. The Use Case: Contextualizing News Semantically Capturing and Representing News Stories on the Web 30 Wolfgang Schäuble Finance Minister Ruling Party in Ger. Christian Democratic Union Part 2 2016/03/04
  31. 31. Semantic News Annotation   N. Fernandez, J. A. Fisteus, L. Sanchez, and G. Lopez. Identityrank: Named entity disambiguation in the news domain.   S. Chabra. Entity-centric summarization: Generating text summaries for graph snippets.   A. Fuxman, P. Pantel, Y. Lv, A. Chandra, P. Chilakamarri, M. Gamon, D. Hamilton, B. Kohlmeier, D. Narayanan, E. Papalexakis, and B. Zhao. Contextual insights   N. Kanhabua, R. Blanco, and M. Matthews. Ranking related news predictions.   N. K. Tran, A. Ceroni, N. Kanhabua, and C. Niederee. Back to the past: Supporting interpretations of forgotten stories by time-aware re-contextualization.   N. K. Tran, A. Ceroni, N. Kanhabua, and C. Niederee. Time-travel translator: Automatically contextualizing news articles.   T. Stajner, B. Thomee, A.-M. Popescu, M. Pennacchiotti, and A. Jaimes. Automatic selection of social media responses to news. Semantically Capturing and Representing News Stories on the Web 31 State of the Art & Related Work Part 2 Graph Named Entities in News Contextualizing News Relevancy of Entities 2016/03/04
  32. 32. Semantic Snapshot of News (NSS) Semantically Capturing and Representing News Stories on the Web 32   Definition and Motivation   A Gold Standard of News Entities 2.a 2016/03/04
  33. 33. Semantically Capturing and Representing News Stories on the Web 33 Going deep down… It is always challenging What is on top: Entities explicitly appearing in the documents Laura Poitras Anatoly Kucherena Edward Snowden Part 2.a The News Semantic Snapshot (NSS) 2016/03/04
  34. 34. The News Semantic Snapshot (NSS) Semantically Capturing and Representing News Stories on the Web 34 Part 2.a News Semantic Snapshot (NSS)[Redondo_ICWE’15] 2016/03/04
  35. 35. The News Semantic Snapshot: Gold Standard Semantically Capturing and Representing News Stories on the Web 35 Part 2.a  High Level of detail, significant human Intervention: (Experts in the news domain + users)  Entities in 5 Dimensions: (Visual & Text) (1) Video Subtitles (2) Image in the video (4) Suggestions of an expert (5) Related articles USER SURVEY “We don't have any extradition treaty with Russia. Broadly speaking our policy remains the same: that we'd like him returned (3) Text in the video image (2) (3) (1) [Romero_TVX’14] 2016/03/04
  36. 36. The News Semantic Snapshot: Gold Standard Semantically Capturing and Representing News Stories on the Web 36 Part 2.a Play with the data and help us to extend it at: https://github.com/jluisred/ NewsConceptExpansion/wiki/Golden-Standard- Creation 25 2016/03/04
  37. 37. Automatically Generating the NSS Semantically Capturing and Representing News Stories on the Web 37 2.b   The Selection problem   Approaches: frequency-based, multidimensional, concentric   Experiments and Results 2016/03/04
  38. 38. b) Expanded Entities a) Entities from Seed Document DS Generating the NSS: General Method Semantically Capturing and Representing News Stories on the Web 38 [Redondo_SNOW’14] (2) c) News Semantic Snapshot Part 2.b 2016/03/04
  39. 39. b) Expanded Entities a) Entities from Seed Document DS Generating the NSS: Entity Expansion Semantically Capturing and Representing News Stories on the Web 39 [Redondo_SNOW’14] (2) c) News Semantic Snapshot Part 2.b 2016/03/04
  40. 40. Generating the NSS: Expansion’s Settings Semantically Capturing and Representing News Stories on the Web 40 Part 2.b Query: -  Title -  5 W’s over Subtitles Entities Web sites to be crawled: -  Google -  L1 : A set of 10 internationals English speaking newspapers -  L2 : A set of 3 international newspapers used in GS Temporal Window: -  1W: -  2W: Annotation filtering -  Schema.org [Redondo_ICWE’15] Parameters: 2016/03/04
  41. 41. b) Expanded Entities a) Entities DS Generating the NSS: Expansion’s Settings Semantically Capturing and Representing News Stories on the Web 41 [Redondo_SNOW’14] (2) c) News Semantic Snapshot Part 2.b Recall (E. Expansion) = 0.91 Recall (NER on Subtitles) = 0.42 2016/03/04
  42. 42. b) Expanded Entities a) Entities DS Generating the NSS: Selection Semantically Capturing and Representing News Stories on the Web 42 (2) c) News Semantic Snapshot Part 2.b [Redondo_SNOW’14] 2016/03/04
  43. 43. Generating the NSS: The Selection problem Semantically Capturing and Representing News Stories on the Web 43 Part 2.b (NSS) 0 N FIdeal(ei) (NSS) FX(ei) =?Expansion 2016/03/04
  44. 44. Generating the NSS: Measures Semantically Capturing and Representing News Stories on the Web 44 Part 2.b 1  Precision / Recall @ N -  Popular -  Easy to interpret 2  Mean Normalized Discounted Cumulative Gain (MNDCG) @ N: -  Considers ranking -  Relevant documents at the top positions 3  Compactness for Recall R: -  Compromise between: Recall and NSS size 2016/03/04
  45. 45. Generating the NSS: Compactness Example Semantically Capturing and Representing News Stories on the Web 45 Part 2.b Recall: 22/33 = 0.66 Sa = 27 Sb = 33 Sc = 54 Sa = 27 Sb = 33 Sc= 54 (NSS) A B CA B C > > 2016/03/04
  46. 46. Generating the NSS: The Approaches Semantically Capturing and Representing News Stories on the Web 46 Part 2.b 1  Frequency-Based Ranking -  Leverages on biggest sample provided by expansion -  Prioritizes representativeness 2  Multidimensional Entity Relevance Ranking -  Relevancy of entities is ground on different dimensions 3  Concentric Based Approach -  Core / Crust model -  Alleviates the problem of dealing with many dimensions [Redondo_SNOW’14] [Redondo_ICWE’15] [Redondo_KCAP’15A] 2016/03/04
  47. 47. Generating the NSS: (1) Frequency-Based Semantically Capturing and Representing News Stories on the Web 47 Part 2.b [Redondo_SNOW’14] A 2016/03/04
  48. 48. Generating the NSS: (2) Multidimensional Semantically Capturing and Representing News Stories on the Web 48 Part 2.b [Redondo_ICWE2015] 2016/03/04
  49. 49. Semantically Capturing and Representing News Stories on the Web 49 Part 2.b POPULARITY (FPOP) EXPERT RULES (FEXP) 49 -  Based on Google Trends -  w = 2 months -  µ + 2*σ (2.5%) Example: -  [ Location, = 0.43] -  [ Person, = 0.78] -  [ Organization, = 0.95 ] -  [ < 2 , = 0.0 ] Generating the NSS: (2) Multidimensional 2016/03/04
  50. 50. Experiment 1: Frequency VS Multidimensional Semantically Capturing and Representing News Stories on the Web 50 Part 2.b 20 x 4 x 4 = 320 formulas 2016/03/04
  51. 51. Experiment 1: Frequency VS Multidimensional Semantically Capturing and Representing News Stories on the Web 51 Part 2.b   News Entity Expansion & Dimensions ! Generate NSS   Frequency-based score: 0.473 MNDCG @ 10   Best score: 0.698 MNDCG @ 10 •  Collection: •  CSE (Google + 2W + Schema.org) •  Ranking: •  Expert Rules •  Popularity Multidimensional Nature of the NSS 2016/03/04
  52. 52. Experiment 1: Frequency VS Multidimensional Semantically Capturing and Representing News Stories on the Web 52 Part 2.b (NSS) FREQ 0 (NSS) F(Laura Poitras) = 2 F(Glenn Greenwald) = 1 2016/03/04
  53. 53. Experiment 1: Frequency VS Multidimensional Semantically Capturing and Representing News Stories on the Web 53 Part 2.b (NSS) (Expansion) FREQ POP EXP + + = (NSS) 2016/03/04
  54. 54. Experiment 2: Multidimensional ++ Semantically Capturing and Representing News Stories on the Web 54 Part 2.b 1.  Exploit Google relevance (+1.80%) 2.  Promote subtitle entities (+2.50%) 3.  Exploit named entity extractor’s confidence (+0.20%) 4.  Interpret popularity dimension (+1.40%) 5.  Performing clustering before filtering (-0.60%) - NO SIGNIFICANT IMPROVEMENT - NMDCG @ 10: 2016/03/04
  55. 55. Experiment 2: Multidimensional ++ Semantically Capturing and Representing News Stories on the Web 55 Part 2.b Tune Function XFREQ POP EXP Re-ShuffleOriginal (NSS) 2016/03/04
  56. 56. Semantically Capturing and Representing News Stories on the Web 56 Part 2.b MNDCG: •  Too focused on success at first positions (decay Function) •  NSS intends to be flexible, ranking is application-dependent COMPACTNESS: •  Prioritizes coverage over ranking while minimizing NSS size Re-thinking the problem: measures 2016/03/04
  57. 57. Semantically Capturing and Representing News Stories on the Web 57 Part 2.b Duality in news entity spectrum: •  Representative entities: •  Driving the plot of the story •  Relevant entities •  Related to former via specific reasons •  Exploit the entity semantic relations Suggested by Expert? Informative? Unexpected? Interesting? Explicative? Re-thinking the problem: dimensions 2016/03/04
  58. 58. Semantically Capturing and Representing News Stories on the Web 58 Part 2.b Generating the NSS: (3) Concentric Approach  Core •  Representative entities •  Spottable via frequency dimensions •  High degree of cohesiveness  Crust •  Attached to the Core via semantic relations •  Agnostic to relevancy nature: informativeness, interestingness, etc. [Redondo_KCAP2015A] 2016/03/04
  59. 59. Semantically Capturing and Representing News Stories on the Web 59 Part 2.b Generating the NSS: (3) Core Creation a) Spot representative entities: Frequency Dimension (NSS) b) Cohesiveness (DBpedia) 2016/03/04
  60. 60. Semantically Capturing and Representing News Stories on the Web 60 Part 2.b Generating the NSS: (3) Crust Creation The number of Web documents talking simultaneously about a particular entity e and the Core: ? 2016/03/04
  61. 61. Experiment 3: Multidimensional VS Concentric Semantically Capturing and Representing News Stories on the Web 61 Part 2.b 1.  Entity Frequency ○  Core1: Jaro-Winkler > 0.9 ○  Core2: Frequency based on Exact String matching 2.  Cohesiveness: ○  Everything is Connected Engine, Skb(e1, e2) > 0.125 Everything is Connected Engine: https://github.com/mmlab/eice Concentric Core: 2016/03/04
  62. 62. Experiment 3: Multidimensional VS Concentric Semantically Capturing and Representing News Stories on the Web 62 Part 2.b 1.  Candidates for CRUST generation: ○  Ex1: 1° ICWE2015 by R*(50): L2+Google, F3 1W, Gauss+ POP ○  Ex2: 2° ICWE 2015 by R*(50): L2+Google, F3 1W, Freq + POP 2.  Function for attaching entities to CORE: ○  SWEB(ei, Core) over Google CSE, default configuration Concentric Crust: 2016/03/04
  63. 63. Experiment 3: Multidimensional VS Concentric Semantically Capturing and Representing News Stories on the Web 63 Part 2.b Combining CORE and CRUST: Core+CrustCrustOnly 2016/03/04
  64. 64. Experiment 3: Multidimensional VS Concentric Semantically Capturing and Representing News Stories on the Web 64 Part 2.b 36.9% more compact than Multidimensional (NSS’s size decrease) IdealGT: size of SSN according to Gold Standard (2*2*2 + 2) Runs 2016/03/04
  65. 65. Experiment 3: Multidimensional VS Concentric Semantically Capturing and Representing News Stories on the Web 65 Part 2.b NSS Gold Standard Fukushima Disaster 2013 2016/03/04 n=22
  66. 66. Multidimensional Concentric Semantically Capturing and Representing News Stories on the Web 66 Part 2.b Experiment 3: Multidimensional VS Concentric 2016/03/04
  67. 67. Semantically Capturing and Representing News Stories on the Web 67 Part 2.b NSS: Suitable model for news applications ? 2016/03/04
  68. 68. Consuming the Concentric NSS Semantically Capturing and Representing News Stories on the Web 68 2.c   News consumption phases   The NSS for feeding news prototypes 2016/03/04
  69. 69. Semantically Capturing and Representing News Stories on the Web 69 Part 2.c NSS Consumption: News Prototypes … short summaries, previews, hotspots … … advanced graphs and diagrams, timelines, in- depth summaries … … second screen apps, slideshows, info-boxes … 2016/03/04
  70. 70. Semantically Capturing and Representing News Stories on the Web 70 Part 2.c NSS Consumption: Consumptions Phases The Before The During The After 2016/03/04
  71. 71. Semantically Capturing and Representing News Stories on the Web 71 Part 2.c NSS Consumption: Phases VS Layers [Redondo_KCAP’15B] 2016/03/04
  72. 72. Conclusions & Future Work Semantically Capturing and Representing News Stories on the Web 72   Publications   References 2016/03/04
  73. 73. Semantically Capturing and Representing News Stories on the Web 73 Conclusions a.  Applied NER and NED as semantic annotation techniques in the multimedia domain b.  Developed other techniques such as Named Entity Expansion or Visual Concept Mapping c.  LinkedTV model to harmonize annotations into the Linked Data Web Q1: How can multimedia content be semantically annotated and seamlessly connected with other resources on the Web? Q2: Can those semantic annotations and linked media resources bring value for the exploitation and consumption of multimedia content? a.  Exploiting multimedia semantic techniques: enriching, highlighting media fragments (hotspots), classifying videos… b.  Evaluation of multimodal approaches via Mediaeval 2013/2014 2016/03/04
  74. 74. Semantically Capturing and Representing News Stories on the Web 74 Conclusions a.  Proposed the NSS model and a Gold Standard b.  The multidimensional nature of the entity relevance •  Gaussian function, popularity, experts rules… c.  Concentric model better reproduces the NSS: •  Better Compactness: 36.9% over BAS01 (similar recall, smaller size) •  Core/Crust brings up relevant entities without having to deal with fuzzy dimensions d.  NSS better supports the news consumption phases: (Before, During, After) Q3: Is it possible to automatically contextualize news stories with background information so they can be effectively interpreted by humans and machines? 2016/03/04
  75. 75. Semantically Capturing and Representing News Stories on the Web 75 Future Work •  [S] Publish generated NSS on the Web (Linked Data) •  [S] Extend the Gold Standard: •  From 5 to 23 videos, concentric based model for candidate selection •  Submission to TOIS •  [S] Not depending on “big players” for retrieving knowledge during the expansion phase (Terrier VS Google experiments) 2016/03/04
  76. 76. Semantically Capturing and Representing News Stories on the Web 76 Future Work •  [M] Using the power of crowdsourcing in Gold Standard creation •  Increase size of the Gold Standard without involving experts •  Consider different levels of entity relevancy •  [M] Supervised techniques: Learn to Rank •  Features in entities: surface forms, URL’s, types… •  Features in documents, sources, and other provenance information 2016/03/04
  77. 77. Semantically Capturing and Representing News Stories on the Web 77 Future Work •  [L] Spot not only the strength of the relationships between Crust and the Core, but also the predicates Editor in WikiLeaks Generating Explanations analyzing documents considered in Sweb 2016/03/04
  78. 78. Semantically Capturing and Representing News Stories on the Web 78 Future Work •  [L] Not having to rely on “Big Players” during Crust generation: •  Continuous indexing •  Better curated white lists •  Fresher structured databases: DBpedia events •  [L] Reusing concentric model in context-related tasks: •  Name Entity Extraction/Disambiguation "  As another feature similar to BagOfWords, Word2vec… •  Exploratory Searches "  Diversity, serendipity… ++ [Steiner_ICWE’15] 2016/03/04
  79. 79. José Luis Redondo García http://jluisred.github.io @peputo http://github.com/jluisred “my small dent in the vast ocean of knowledge…” Ph.D. questions?
  80. 80. Semantically Capturing and Representing News Stories on the Web 80 Publications Journals •  Redondo Garcia J. L and Adolfo Lozano-Tello: OntoTV: an Ontology Based System for the Management of Information about Television Content. International Journal of Semantic Computing, 6(01), 111-130, 2012. Conferences •  Redondo Garcia J. L., Rizzo G., Troncy R. (2015) Capturing News Stories Once, Retelling a Thousand Ways. In: 8th International Conference on Knowledge Capture (K-CAP'15), Palisades, NY, USA. •  Redondo Garcia J. L., Rizzo G., Troncy R. (2015) The Concentric Nature of News Semantic Snapshots: Knowledge Extraction for Semantic Annotation of News Items. In: 8th International Conference on Knowledge Capture (K-CAP'15), Palisades, NY, USA. Best Paper Award •  Redondo Garcia J. L., Rizzo G., Romero L. P., Hildebrand M., Troncy R. (2015) Generating Semantic Snapshots of Newscasts using Entity Expansion. In: 15th International Conference on Web Engineering (ICWE'15), Rotterdam, the Netherlands. •  Rizzo G., Steiner T., Troncy R., Verborgh R., Redondo Garcia J. L. and Van de Walle R. (2012), What Fresh Media Are You Looking For? Extracting Media Items from Multiple Social Networks. In (ACM Multimedia) International Workshop on Socially-Aware Multimedia (SAM'12), Nara, Japan Journals (2), Conferences (6), Workshops(5), Demo/Poster(7) 2016/03/04
  81. 81. Semantically Capturing and Representing News Stories on the Web 81 References [Redondo_KCAP’15B] Capturing News Stories Once, Retelling a Thousand Ways [Redondo_KCAP’15A] The Concentric Nature of News Semantic Snapshots [Redondo_ICWE’15] Generating Semantic Snapshots of Newscasts using Entity Expansion [Redondo_ISWC’14] Finding and sharing hot spots in Web Videos [Redondo_ESWC’14] Augmenting TV Newscasts via Entity Expansion [Redondo_SNOW’14] Describing and Contextualizing Events in TV News Show [LinkedTV_D2.6’14] LinkedTV Framework for Generating Video Enrichments with Annotations [Romero_TVX’14] LinkedTV News: A dual mode second screen companion for web-enriched news broadcasts [Hoang_MediaEval’14] LinkedTV at MediaEval 2014 Search and Hyperlinking Task [Rizzo_LREC’14] Benchmarking the Extraction and Disambiguation of Named Entities on the Semantic Web [Li_LIMe'13] Enriching Media Fragments with Named Entities for Video Classification [Milicic_WWW'13] Live Topic Generation from Event Streams [Milicic_ESWC’13] Tracking and Analyzing The 2013 Italian Election [Sahuguet_MediaEval’13] LinkedTV at MediaEval 2013 Search and Hyperlinking Task [Rizzo_SAM’12] What Fresh Media Are You Looking For? Extracting Media Items from Multiple Social Networks 2016/03/04

×