SlideShare una empresa de Scribd logo
1 de 35
Semanticannotation of text: techniques and applications Prof. Luis Sanchez-Fernandez Web Technologies LaboratoryUniversity Carlos III of Madrid http://webtlab.it.uc3m.es 1 http://webtlab.it.uc3m.es
Semantic Web Techniquesforsemanticannotation of text AnapproachtonamedentitydisambiguationusingWikipedia 2 Outline http://webtlab.it.uc3m.es
Short history of the Web 1990: Creation of WorldWide Web infraestructure at CERN by Tim Berners-Lee HTTP, HTML, first Web client, first Web server 1993: Mosaic, firstgraphic Web client 1994: Netscape Navigator 1996: Commercial use of WWWisgeneralized 1999: Tim Berners-Lee proposestheSemantic Web 2002: Weblogs and RSS  Web 2.0 6thOctober 2009: at least 8 billionindexable Web pages 23rdSeptember 2010: at least 15 billionindexable Web pages accordingtohttp://www.worldwidewebsize.com/
The problem of information overload The great success of the web has lead to one of its current problems: information overload Difficult and time costly to find and update relevant information for people and companies Ex.: keep an updated state of the art Company employees can use up to 20% of their working time searching in the Web (Outsell Inc, 2002)
Thegoal of theSemantic Web istoautomate web tasksbyenrichingthecurrent Web contentwith formal representationsthatenablebettercooperationbetweenhumans and computers http://webtlab.it.uc3m.es 5 TheSemantic Web proposal
http://webtlab.it.uc3m.es 6 Semantic Web Stack
Máster interuniversitario en Ingeniería Telemática 7 RDF “ResourceDescription Framework” (RDF) Goal of RDF (alternativeviews): Languageforresourcedescription in the Web Languagefor formal representation of (parts of) informationavailable in a Web document (metadata) Formal => machine readable Vocabularydefinedwithontologies Whatis a resource? Web content: Web pages, images, e-mails, files, … Resourcesmentioned in Web content: Persons, locations, organizations, …
Máster interuniversitario en Ingeniería Telemática 8 RDF basic principles We want to represent a piece of information available in the Web describing a resource Each metadata states a property that can be modelled as a (formal) statement, composed of: subject: resource being described predicate: property of the resource object: value of the property for the resource being described “http://www.example.org has a creator  whose value is John Smith”
Máster interuniversitario en Ingeniería Telemática 9 RDF Model An RDF model (set of RDF statements) can be represented by means of a graf For each statement: subject is a node predicate is an arc object is a node Subject and predicate are resources Object can be either a resource or a literal
Máster interuniversitario en Ingeniería Telemática 10 Example  “http://www.example.org has a creator whose value is John Smith”.
Máster interuniversitario en Ingeniería Telemática 11 Textual notation (triples) <http://www.example.org/index.html> <http://purl.org/dc/elements/1.1/creator> <http://www.example.org/staffid/85740> . <http://www.example.org/index.html> <http://www.example.org/terms/creation-date>  "August 16, 1999" . <http://www.example.org/index.html> <http://www.example.org/terms/language>  "English“ .
Máster interuniversitario en Ingeniería Telemática 12 Ontologies: goal An ontology is a formal, explicit specification of a shared conceptualization An ontology defines the basic terms and relations comprising the vocabulary of a topic area, as well as rules that should be  fulfilled by such terms and relations
Máster interuniversitario en Ingeniería Telemática 13 RDF Schema RDF vocabulary Properties definition and description of properties Classes definition and description Can be used to define simple ontologies
Máster interuniversitario en Ingeniería Telemática 14 Properties in RDF Schema rdfs:subPropertyOf rdfs:range rdfs:domain rdfs:subClassOf
http://webtlab.it.uc3m.es 15 Sampletaxonomy pictureby IanRuotsala
Ontologylanguage More powerfulthanRDF-Schema Examples: Existence/cardinalityconstraints allinstancesof personhave a motherthatisalso a person, orthatpersonshaveexactly 2 parents Transitive, inverseorsymmetricalproperties isPartOfis a transitiveproperty, hasPartistheinverse of isPartOf, touchesissymmetrical http://webtlab.it.uc3m.es 16 OWL
Semantic Web and TechnologyEnhancedLearning http://webtlab.it.uc3m.es 17
Modelling (ontologies) learningprocesses learningcontent learning output (competences) learningagents (students, teachers) Addingmetadata (annotations) accordingtothemodels Use themodels and themetadata in toolstomakedecissions example: personalized, adaptivecontent and/orproblems http://webtlab.it.uc3m.es 18 Typicalapplications
Semanticannotation of text http://webtlab.it.uc3m.es 19
Generalities Goal: extract semantic annotations from free text Natural language is complex and ambiguous Language dependent Domain dependent applications News Literature E-mail Transcriptions of spoken dialogues Some useful results can be achieved nowadays
Taxonomy of semanticannotations Content basedannotations Documentcategorization Namedentities Ontologybaseddomainannotations Concepts and instancesidentification Relationsextraction isGovernor(GaryLocke,WST) Named Entity (Washington, location) <rdf:Description rdf:about=‘WST'>  <rdf:type rdf:resource=‘State'/> </rdf:Description> <rdf:Description rdf:about=‘WDC'>  <rdf:type rdf:resource=‘City'/> </rdf:Description>
basic techniques (i) Semantic Analysis S  NP NP*(X) VBT(Elect) NN(Y) Parsing S  NP NP* VBT NN S hasFunction(X, Y) NP VBT NP NN Symbolic NLP Based on the use of lexicons and grammar rules to process text Example: “Barack Obama Elected President” Lexical Analysis NP  Barack NP  Obama VBT  Elect VBT  VBT + ‘ed’ NN  President hasFunction(BarackObama, President)
Basic techniques (ii) Statistical NLP Based on counting: finding frequent patterns that make likely the occurrence of certain text feature Use of extensive corpora Example:  “Washington” when appearing in the same document with “Hollywood” is likely to represent (Denzel Washington, actor) while Washington” when appearing in the same document with “Obama” is likely to represent (Washington D.C., American capital) We can count the frequency of different meanings of “Washington” when appearing in different contexts
AnapproachtonamedentitydisambiguationwithWikipedia http://webtlab.it.uc3m.es 24
Instance: a particular person, location (GPE), organization, ... http://webtlab.it.uc3m.es 25 Introduction Entity: text + type
http://webtlab.it.uc3m.es 26 Strategy I
Approach Findentities in document Foreachentity, identifycandidateinstancesthat are compatible withtheentityname Assign a ranking valuetoeachcandidateinstance: 0  ≤ r ≤ 1 Greater ranking valuesindicategreaterlikelihood of occurrence http://webtlab.it.uc3m.es 27 Strategy II
Semanticcoherence (in terms of ranking) “Aninstancewouldhave a high ranking valueiftheinstancesthattypicallyco-occurwithitalsohavehigh ranking values” http://webtlab.it.uc3m.es 28 Strategy III
We can add a vector Ethataccountsforothercontextinformation Equation similar to Google PageRank http://webtlab.it.uc3m.es 29 Strategy IV
Alternativeinstancenamesextractedbyprocessing a Wikipediadump Page titles, redirects, disambiguationpages, anchors IndexedbyLucene Candidateinstances are obtainedbyqueryingLucene CandidateinstancesweightedbycombiningLucene scores and PageRankvalues Filteringlimitsthemaximumnumber of candidates http://webtlab.it.uc3m.es 30 Instancefinder & filter
http://webtlab.it.uc3m.es 31 Instanceranker AL: basedondirect links E: candidateinstanceweightspassedbytheinstancefilter AC: basedoninstanceco-occurrence in Wikipediapages
http://webtlab.it.uc3m.es 32 Results I
http://webtlab.it.uc3m.es 33 Results II
Approachbasedoninstanceco-occurrence TextfromWikipediarestrictedto: titles, anchors Resultsconsideredpromising ShouldimproveforGPE http://webtlab.it.uc3m.es 34 Conclusions
http://webtlab.it.uc3m.es 35 ThankYou! Questions?

Más contenido relacionado

La actualidad más candente

La actualidad más candente (16)

Biblissima’s Choices of Tools and Methodology for Interoperability Purposes
Biblissima’s Choices of Tools and Methodology for Interoperability PurposesBiblissima’s Choices of Tools and Methodology for Interoperability Purposes
Biblissima’s Choices of Tools and Methodology for Interoperability Purposes
 
Web Data Management with RDF
Web Data Management with RDFWeb Data Management with RDF
Web Data Management with RDF
 
Widening the limits of cognitive reception with online digital library graph ...
Widening the limits of cognitive reception with online digital library graph ...Widening the limits of cognitive reception with online digital library graph ...
Widening the limits of cognitive reception with online digital library graph ...
 
Linked Data: principles and examples
Linked Data: principles and examples Linked Data: principles and examples
Linked Data: principles and examples
 
SWT Lecture Session 8 - Rules
SWT Lecture Session 8 - RulesSWT Lecture Session 8 - Rules
SWT Lecture Session 8 - Rules
 
BIBFRAME : the future of cataloguing?
BIBFRAME : the future of cataloguing?BIBFRAME : the future of cataloguing?
BIBFRAME : the future of cataloguing?
 
One day workshop Linked Data and Semantic Web
One day workshop Linked Data and Semantic WebOne day workshop Linked Data and Semantic Web
One day workshop Linked Data and Semantic Web
 
Prototype on Illuminated Manuscripts
Prototype on Illuminated ManuscriptsPrototype on Illuminated Manuscripts
Prototype on Illuminated Manuscripts
 
Semantic Web Applications in Libraries: The Road to BIBFRAME
Semantic Web Applications in Libraries: The Road to BIBFRAMESemantic Web Applications in Libraries: The Road to BIBFRAME
Semantic Web Applications in Libraries: The Road to BIBFRAME
 
when the link makes sense
when the link makes sensewhen the link makes sense
when the link makes sense
 
Sem webmaubeuge
Sem webmaubeugeSem webmaubeuge
Sem webmaubeuge
 
Open Data - a goldmine (JavaZone 2009)
Open Data - a goldmine (JavaZone 2009)Open Data - a goldmine (JavaZone 2009)
Open Data - a goldmine (JavaZone 2009)
 
Linked Open Data stuff
Linked Open Data stuffLinked Open Data stuff
Linked Open Data stuff
 
000690
000690000690
000690
 
RDF data model
RDF data modelRDF data model
RDF data model
 
Flagis linked open_data_stijn_goedertier
Flagis linked open_data_stijn_goedertierFlagis linked open_data_stijn_goedertier
Flagis linked open_data_stijn_goedertier
 

Similar a 2011 03 11 (upm) emadrid lsanchez uc3m anotación semántica de texto

Similar a 2011 03 11 (upm) emadrid lsanchez uc3m anotación semántica de texto (20)

Web3uploaded
Web3uploadedWeb3uploaded
Web3uploaded
 
Semantic Web 2.0: Creating Social Semantic Information Spaces
Semantic Web 2.0: Creating Social Semantic Information SpacesSemantic Web 2.0: Creating Social Semantic Information Spaces
Semantic Web 2.0: Creating Social Semantic Information Spaces
 
Understanding the Standards Gap
Understanding the Standards GapUnderstanding the Standards Gap
Understanding the Standards Gap
 
Semantic Web: an Introduction
Semantic Web: an IntroductionSemantic Web: an Introduction
Semantic Web: an Introduction
 
ITWS Capstone: Engineering a Semantic Web (Fall 2022)
ITWS Capstone: Engineering a Semantic Web (Fall 2022)ITWS Capstone: Engineering a Semantic Web (Fall 2022)
ITWS Capstone: Engineering a Semantic Web (Fall 2022)
 
Episode 3(3): Birth & explosion of the World Wide Web - Meetup session11
Episode 3(3): Birth & explosion of the World Wide Web - Meetup session11Episode 3(3): Birth & explosion of the World Wide Web - Meetup session11
Episode 3(3): Birth & explosion of the World Wide Web - Meetup session11
 
Where is the World is my Open Government Data?
Where is the World is my Open Government Data?Where is the World is my Open Government Data?
Where is the World is my Open Government Data?
 
鏈結資料在圖書館的應用20131107
鏈結資料在圖書館的應用20131107鏈結資料在圖書館的應用20131107
鏈結資料在圖書館的應用20131107
 
Linking Open Data
Linking Open DataLinking Open Data
Linking Open Data
 
Social Graphs and Semantic Analytics
Social Graphs and Semantic AnalyticsSocial Graphs and Semantic Analytics
Social Graphs and Semantic Analytics
 
Exploring and using the Semantic Web - SSSW09 tutorial
Exploring and using the Semantic Web - SSSW09 tutorialExploring and using the Semantic Web - SSSW09 tutorial
Exploring and using the Semantic Web - SSSW09 tutorial
 
Ld4 dh tutorial
Ld4 dh tutorialLd4 dh tutorial
Ld4 dh tutorial
 
Engineering a Semantic Web: ITWS Capstone Lecture (Spring 2014)
Engineering a Semantic Web: ITWS Capstone Lecture (Spring 2014)Engineering a Semantic Web: ITWS Capstone Lecture (Spring 2014)
Engineering a Semantic Web: ITWS Capstone Lecture (Spring 2014)
 
A Semantic Multimedia Web (Part 3)
A Semantic Multimedia Web (Part 3)A Semantic Multimedia Web (Part 3)
A Semantic Multimedia Web (Part 3)
 
Information Extraction and Linked Data Cloud
Information Extraction and Linked Data CloudInformation Extraction and Linked Data Cloud
Information Extraction and Linked Data Cloud
 
Semantic Web
Semantic WebSemantic Web
Semantic Web
 
Linked Data and Locah, UKSG2011
Linked Data and Locah, UKSG2011 Linked Data and Locah, UKSG2011
Linked Data and Locah, UKSG2011
 
Linked Data
Linked DataLinked Data
Linked Data
 
Linked data and voyager
Linked data and voyagerLinked data and voyager
Linked data and voyager
 
Internet and Web Technology (CLASS-1) [Introduction]
Internet and Web Technology (CLASS-1) [Introduction]Internet and Web Technology (CLASS-1) [Introduction]
Internet and Web Technology (CLASS-1) [Introduction]
 

Más de eMadrid network

Más de eMadrid network (20)

Recognizing Lifelong Learning Competences: A Report of Two Cases - Edmundo Tovar
Recognizing Lifelong Learning Competences: A Report of Two Cases - Edmundo TovarRecognizing Lifelong Learning Competences: A Report of Two Cases - Edmundo Tovar
Recognizing Lifelong Learning Competences: A Report of Two Cases - Edmundo Tovar
 
A study about the impact of rewards on student's engagement with the flipped ...
A study about the impact of rewards on student's engagement with the flipped ...A study about the impact of rewards on student's engagement with the flipped ...
A study about the impact of rewards on student's engagement with the flipped ...
 
Assessment and recognition in technical massive open on-line courses with and...
Assessment and recognition in technical massive open on-line courses with and...Assessment and recognition in technical massive open on-line courses with and...
Assessment and recognition in technical massive open on-line courses with and...
 
Recognition of learning: Status, experiences and challenges - Carlos Delgado ...
Recognition of learning: Status, experiences and challenges - Carlos Delgado ...Recognition of learning: Status, experiences and challenges - Carlos Delgado ...
Recognition of learning: Status, experiences and challenges - Carlos Delgado ...
 
Bootstrapping serious games to assess learning through analytics - Baltasar F...
Bootstrapping serious games to assess learning through analytics - Baltasar F...Bootstrapping serious games to assess learning through analytics - Baltasar F...
Bootstrapping serious games to assess learning through analytics - Baltasar F...
 
Meta-review of recognition of learning in LMS and MOOCs - Ruth Cobos
Meta-review of recognition of learning in LMS and MOOCs - Ruth CobosMeta-review of recognition of learning in LMS and MOOCs - Ruth Cobos
Meta-review of recognition of learning in LMS and MOOCs - Ruth Cobos
 
Best paper Award - Miguel Castro
Best paper Award - Miguel CastroBest paper Award - Miguel Castro
Best paper Award - Miguel Castro
 
eMadrid Gaming4Coding - Possibilities of game learning analytics for coding l...
eMadrid Gaming4Coding - Possibilities of game learning analytics for coding l...eMadrid Gaming4Coding - Possibilities of game learning analytics for coding l...
eMadrid Gaming4Coding - Possibilities of game learning analytics for coding l...
 
Seminario eMadrid_Curso MOOC_Antonio de Nebrija_Apología del saber.pptx.pdf
Seminario eMadrid_Curso MOOC_Antonio de Nebrija_Apología del saber.pptx.pdfSeminario eMadrid_Curso MOOC_Antonio de Nebrija_Apología del saber.pptx.pdf
Seminario eMadrid_Curso MOOC_Antonio de Nebrija_Apología del saber.pptx.pdf
 
eMadrid-Opportunities and Design Challenges in the Gaming4Coding Project_Pete...
eMadrid-Opportunities and Design Challenges in the Gaming4Coding Project_Pete...eMadrid-Opportunities and Design Challenges in the Gaming4Coding Project_Pete...
eMadrid-Opportunities and Design Challenges in the Gaming4Coding Project_Pete...
 
Open_principles_and_co-creation_for_digital_competences_for_students.pdf
Open_principles_and_co-creation_for_digital_competences_for_students.pdfOpen_principles_and_co-creation_for_digital_competences_for_students.pdf
Open_principles_and_co-creation_for_digital_competences_for_students.pdf
 
Competencias_digitales_del_profesorado_universitario_para_la_educación_abiert...
Competencias_digitales_del_profesorado_universitario_para_la_educación_abiert...Competencias_digitales_del_profesorado_universitario_para_la_educación_abiert...
Competencias_digitales_del_profesorado_universitario_para_la_educación_abiert...
 
eMadrid_KatjaAssaf_DigiCred.pdf
eMadrid_KatjaAssaf_DigiCred.pdfeMadrid_KatjaAssaf_DigiCred.pdf
eMadrid_KatjaAssaf_DigiCred.pdf
 
Presentazione E-Madrid - 12-01-2023 Ruth Kerr.pdf
Presentazione E-Madrid - 12-01-2023 Ruth Kerr.pdfPresentazione E-Madrid - 12-01-2023 Ruth Kerr.pdf
Presentazione E-Madrid - 12-01-2023 Ruth Kerr.pdf
 
EDC-eMadrid_20230113 Ildikó Mázár.pdf
EDC-eMadrid_20230113 Ildikó Mázár.pdfEDC-eMadrid_20230113 Ildikó Mázár.pdf
EDC-eMadrid_20230113 Ildikó Mázár.pdf
 
2022_12_16 «“La informática en la educación escolar en Europa”, informe Euryd...
2022_12_16 «“La informática en la educación escolar en Europa”, informe Euryd...2022_12_16 «“La informática en la educación escolar en Europa”, informe Euryd...
2022_12_16 «“La informática en la educación escolar en Europa”, informe Euryd...
 
2022_12_16 «Informatics – A Fundamental Discipline for the 21st Century»
2022_12_16 «Informatics – A Fundamental Discipline for the 21st Century»2022_12_16 «Informatics – A Fundamental Discipline for the 21st Century»
2022_12_16 «Informatics – A Fundamental Discipline for the 21st Century»
 
2022_12_16 «Efecto del uso de lenguajes basados en bloques en el aprendizaje ...
2022_12_16 «Efecto del uso de lenguajes basados en bloques en el aprendizaje ...2022_12_16 «Efecto del uso de lenguajes basados en bloques en el aprendizaje ...
2022_12_16 «Efecto del uso de lenguajes basados en bloques en el aprendizaje ...
 
2022_11_11 «AI and ML methods for Multimodal Learning Analytics»
2022_11_11 «AI and ML methods for Multimodal Learning Analytics»2022_11_11 «AI and ML methods for Multimodal Learning Analytics»
2022_11_11 «AI and ML methods for Multimodal Learning Analytics»
 
2022_11_11 «The promise and challenges of Multimodal Learning Analytics»
2022_11_11 «The promise and challenges of Multimodal Learning Analytics»2022_11_11 «The promise and challenges of Multimodal Learning Analytics»
2022_11_11 «The promise and challenges of Multimodal Learning Analytics»
 

Último

Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
Chris Hunter
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.
MateoGardella
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
SanaAli374401
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 

Último (20)

Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 

2011 03 11 (upm) emadrid lsanchez uc3m anotación semántica de texto

  • 1. Semanticannotation of text: techniques and applications Prof. Luis Sanchez-Fernandez Web Technologies LaboratoryUniversity Carlos III of Madrid http://webtlab.it.uc3m.es 1 http://webtlab.it.uc3m.es
  • 2. Semantic Web Techniquesforsemanticannotation of text AnapproachtonamedentitydisambiguationusingWikipedia 2 Outline http://webtlab.it.uc3m.es
  • 3. Short history of the Web 1990: Creation of WorldWide Web infraestructure at CERN by Tim Berners-Lee HTTP, HTML, first Web client, first Web server 1993: Mosaic, firstgraphic Web client 1994: Netscape Navigator 1996: Commercial use of WWWisgeneralized 1999: Tim Berners-Lee proposestheSemantic Web 2002: Weblogs and RSS  Web 2.0 6thOctober 2009: at least 8 billionindexable Web pages 23rdSeptember 2010: at least 15 billionindexable Web pages accordingtohttp://www.worldwidewebsize.com/
  • 4. The problem of information overload The great success of the web has lead to one of its current problems: information overload Difficult and time costly to find and update relevant information for people and companies Ex.: keep an updated state of the art Company employees can use up to 20% of their working time searching in the Web (Outsell Inc, 2002)
  • 5. Thegoal of theSemantic Web istoautomate web tasksbyenrichingthecurrent Web contentwith formal representationsthatenablebettercooperationbetweenhumans and computers http://webtlab.it.uc3m.es 5 TheSemantic Web proposal
  • 7. Máster interuniversitario en Ingeniería Telemática 7 RDF “ResourceDescription Framework” (RDF) Goal of RDF (alternativeviews): Languageforresourcedescription in the Web Languagefor formal representation of (parts of) informationavailable in a Web document (metadata) Formal => machine readable Vocabularydefinedwithontologies Whatis a resource? Web content: Web pages, images, e-mails, files, … Resourcesmentioned in Web content: Persons, locations, organizations, …
  • 8. Máster interuniversitario en Ingeniería Telemática 8 RDF basic principles We want to represent a piece of information available in the Web describing a resource Each metadata states a property that can be modelled as a (formal) statement, composed of: subject: resource being described predicate: property of the resource object: value of the property for the resource being described “http://www.example.org has a creator whose value is John Smith”
  • 9. Máster interuniversitario en Ingeniería Telemática 9 RDF Model An RDF model (set of RDF statements) can be represented by means of a graf For each statement: subject is a node predicate is an arc object is a node Subject and predicate are resources Object can be either a resource or a literal
  • 10. Máster interuniversitario en Ingeniería Telemática 10 Example “http://www.example.org has a creator whose value is John Smith”.
  • 11. Máster interuniversitario en Ingeniería Telemática 11 Textual notation (triples) <http://www.example.org/index.html> <http://purl.org/dc/elements/1.1/creator> <http://www.example.org/staffid/85740> . <http://www.example.org/index.html> <http://www.example.org/terms/creation-date> "August 16, 1999" . <http://www.example.org/index.html> <http://www.example.org/terms/language> "English“ .
  • 12. Máster interuniversitario en Ingeniería Telemática 12 Ontologies: goal An ontology is a formal, explicit specification of a shared conceptualization An ontology defines the basic terms and relations comprising the vocabulary of a topic area, as well as rules that should be fulfilled by such terms and relations
  • 13. Máster interuniversitario en Ingeniería Telemática 13 RDF Schema RDF vocabulary Properties definition and description of properties Classes definition and description Can be used to define simple ontologies
  • 14. Máster interuniversitario en Ingeniería Telemática 14 Properties in RDF Schema rdfs:subPropertyOf rdfs:range rdfs:domain rdfs:subClassOf
  • 16. Ontologylanguage More powerfulthanRDF-Schema Examples: Existence/cardinalityconstraints allinstancesof personhave a motherthatisalso a person, orthatpersonshaveexactly 2 parents Transitive, inverseorsymmetricalproperties isPartOfis a transitiveproperty, hasPartistheinverse of isPartOf, touchesissymmetrical http://webtlab.it.uc3m.es 16 OWL
  • 17. Semantic Web and TechnologyEnhancedLearning http://webtlab.it.uc3m.es 17
  • 18. Modelling (ontologies) learningprocesses learningcontent learning output (competences) learningagents (students, teachers) Addingmetadata (annotations) accordingtothemodels Use themodels and themetadata in toolstomakedecissions example: personalized, adaptivecontent and/orproblems http://webtlab.it.uc3m.es 18 Typicalapplications
  • 19. Semanticannotation of text http://webtlab.it.uc3m.es 19
  • 20. Generalities Goal: extract semantic annotations from free text Natural language is complex and ambiguous Language dependent Domain dependent applications News Literature E-mail Transcriptions of spoken dialogues Some useful results can be achieved nowadays
  • 21. Taxonomy of semanticannotations Content basedannotations Documentcategorization Namedentities Ontologybaseddomainannotations Concepts and instancesidentification Relationsextraction isGovernor(GaryLocke,WST) Named Entity (Washington, location) <rdf:Description rdf:about=‘WST'> <rdf:type rdf:resource=‘State'/> </rdf:Description> <rdf:Description rdf:about=‘WDC'> <rdf:type rdf:resource=‘City'/> </rdf:Description>
  • 22. basic techniques (i) Semantic Analysis S  NP NP*(X) VBT(Elect) NN(Y) Parsing S  NP NP* VBT NN S hasFunction(X, Y) NP VBT NP NN Symbolic NLP Based on the use of lexicons and grammar rules to process text Example: “Barack Obama Elected President” Lexical Analysis NP  Barack NP  Obama VBT  Elect VBT  VBT + ‘ed’ NN  President hasFunction(BarackObama, President)
  • 23. Basic techniques (ii) Statistical NLP Based on counting: finding frequent patterns that make likely the occurrence of certain text feature Use of extensive corpora Example: “Washington” when appearing in the same document with “Hollywood” is likely to represent (Denzel Washington, actor) while Washington” when appearing in the same document with “Obama” is likely to represent (Washington D.C., American capital) We can count the frequency of different meanings of “Washington” when appearing in different contexts
  • 25. Instance: a particular person, location (GPE), organization, ... http://webtlab.it.uc3m.es 25 Introduction Entity: text + type
  • 27. Approach Findentities in document Foreachentity, identifycandidateinstancesthat are compatible withtheentityname Assign a ranking valuetoeachcandidateinstance: 0 ≤ r ≤ 1 Greater ranking valuesindicategreaterlikelihood of occurrence http://webtlab.it.uc3m.es 27 Strategy II
  • 28. Semanticcoherence (in terms of ranking) “Aninstancewouldhave a high ranking valueiftheinstancesthattypicallyco-occurwithitalsohavehigh ranking values” http://webtlab.it.uc3m.es 28 Strategy III
  • 29. We can add a vector Ethataccountsforothercontextinformation Equation similar to Google PageRank http://webtlab.it.uc3m.es 29 Strategy IV
  • 30. Alternativeinstancenamesextractedbyprocessing a Wikipediadump Page titles, redirects, disambiguationpages, anchors IndexedbyLucene Candidateinstances are obtainedbyqueryingLucene CandidateinstancesweightedbycombiningLucene scores and PageRankvalues Filteringlimitsthemaximumnumber of candidates http://webtlab.it.uc3m.es 30 Instancefinder & filter
  • 31. http://webtlab.it.uc3m.es 31 Instanceranker AL: basedondirect links E: candidateinstanceweightspassedbytheinstancefilter AC: basedoninstanceco-occurrence in Wikipediapages
  • 34. Approachbasedoninstanceco-occurrence TextfromWikipediarestrictedto: titles, anchors Resultsconsideredpromising ShouldimproveforGPE http://webtlab.it.uc3m.es 34 Conclusions
  • 36. Differentiateaccordingtoentitytype Improveselection of candidateinstances Responsiblebyitself of errors in 12,7% of non-nilqueries http://webtlab.it.uc3m.es 36 Futurework
  • 37. AC: basedoninstancecooccurrence in Wikipediapages AL: basedondirect links E: candidateinstanceweightspassedbytheinstancefilter http://webtlab.it.uc3m.es 37 Instanceranker
  • 38. aCij≈ P(Ii|Ij) Basedoncountingcooccurrence of Ii and Ij in Wikipediapages Ex.: P(PauGasol|Lakers)= #pageswhereboth Pau Gasol and Los AngelesLakers are mentioneddividedby #total pageswhere Los AngelesLakers are mentioned http://webtlab.it.uc3m.es 38 ACComputation
  • 39. Basedondirect links Ex.: TheWikipedia page of Pau Gasol links totheWikipedia page of Los AngelesLakers Initial idea IfIj links many times toIi and Ijislikelytooccur (it has a high ranking), thenIiisalsolikelytooccur Lucene score isusedto compute αLij http://webtlab.it.uc3m.es 39 ALComputation
  • 40. Target entity: 200 or 30 Otherentities: 15 http://webtlab.it.uc3m.es 40 Instancefilter
  • 41. Semanticcoherenceprinciple “Aninstanceusuallycooccurtypicallywithotherrelatedinstances” Ex.: (Pau Gasol, Los AngelesLakers); (Athens, Georgia); (Athens; Greece); (Hillary Clinton, BarackObama) Requirestodisambiguatealldocumententities Slightlydifferenttoentitycooccurrenceapproaches http://webtlab.it.uc3m.es 41 DisambiguationStrategy I