SlideShare una empresa de Scribd logo
1 de 36
The role of Thesauriand Standard Vocabularies in linking data-AGROVOC-UNBIS-EUROVOCA proposal for collaboration between agencies Dr. Johannes Keizer FAO of the UnitedNations Office ofKnowledge Exchange, Research and Extension Knowledge and CapacityforDevelopment
The Developmentof the Internet
[object Object]
Data sources carefully controlled.
Data formats “custom-defined” for an application.
Linked data based on an “open world mindset”
Integrating data from the open Web
Systems designed to incorporate new information incrementally
By design, tolerance of incomplete informationOpen World Mindset
The Linked Data Universe: http://www.linkeddata.org  (july 2009) 4
The Linked Data Universe: http://www.linkeddata.org  (july 2010)
Example: BBC Wildlife Finder
Humboldt Squid page, pulled together from a diversity of Linked Data sources BBC TV Documentary BBC News item Wikipedia Animal Diversity Web:Nocturnal  way of life
RDF– a grammarforthelanguageofdata Resource ResourceA ResourceB relatedTo Resource ResourceA  Some text describedBy Describe resources using interrelated “statements” (“triples”). Use URIs – unique, globally managed identifiers –       as the “words” of statements.
[object Object],RDF as a common format for merging data
[object Object]
Thesauri were based on “terms”,  but terms   represented already concepts in a non explicit way
Hierarchical and associative relationships represented generic ontological domain knowledge
Candidate building blocks for the semantic webRoleofthesauri/conceptschemes
..from thesaurus to Ontologies….
[object Object]
600000 labels in around 20 languages.
one-stop shop for terminological knowledge related to agriculture in general
a knowledge base of related concepts organized in ontological relationships (hierarchical, associative, equivalence)‏
Is a concept/term/string based system
Concepts may be organized in multiple categories.AGROVOC today
Semantic Relationships
AGROVOC conceptual model,in SKOS-XL :bar skos:literalForm “maize” :foo has_synonym :foo skos:literalForm “corn” has_translation maïs (fr) has_synonym :bar AGROVOCConceptScheme Other scheme in FAO skos:inScheme Another scheme in FAO skos:topConceptOf 6211 Further schemes in FAO skos:inScheme skos:broader 8171 skos:broader SKOSConcept 1474 SKOS Label skos:broader 12332 rdf:type skosxl:prefLabel skosxl:altLabel rdf:type has_synonym
http://www.w3.org/2004/02/skos/
SKOS-XL output <rdf:Descriptionrdf:about="http://aims.fao.org/aos/agrovoc/agrovocScheme">	<rdf:typerdf:resource="http://www.w3.org/2004/02/skos/core#ConceptScheme"/></rdf:Description><rdf:Description rdf:about="http://aims.fao.org/aos/agrovoc/c_330829">	<rdf:typerdf:resource="http://www.w3.org/2004/02/skos/core#Concept"/>	<skos:inSchemerdf:resource="http://aims.fao.org/aos/agrovoc/agrovocScheme"/>	<skos:topConceptOfrdf:resource="http://aims.fao.org/aos/agrovoc/agrovocScheme"/></rdf:Description><rdf:Description rdf:about="http://aims.fao.org/aos/agrovoc/xl_en_1278479064610">	<literalForm xmlns="http://www.w3.org/2008/05/skos-xl#" xml:lang="en">subjects</literalForm>	<rdf:typerdf:resource="http://www.w3.org/2008/05/skos-xl#Label"/></rdf:Description> URI of AGROVOC concept
The conceptschemeworkbench
Linkingvocabularies
http://agris.fao.org/agris-search/search/display.do?f=2004/ZA/ZA04002.xml;ZA2004000049
http://eurovoc.europa.eu/218754 http://aims.fao.org/aos/agrovoc/c_7825
Linking data through common URIs Eurovoc Maize Maize skosxl: literalForm skosxl: literalForm owl:sameAs/exactMatch http://aims.fao.org/aos/agrovoc/c_12332 Maize UNBIS http://eurovoc.europa.eu/219871 AGROVOC skosxl: literalForm owl:sameAs/exactMatch Maize  http://agris.fao.org/agris-search/search/display.do?f=1996/TR/TR96001.xml;TR9600026 http://unbisnet.un.org:8080/ipac20/ipac.jsp?session=128F308557F34.283092&profile=bib&uri=full=3100001~!685149~!1&ri=1&aspect=subtab124&menu=search&source=~!horizon http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=OJ:L:2010:202:0011:0015:EN:PDF http://aims.fao.org/aos/agrovoc/c_12332  owl:sameAshttp://eurovoc.europa.eu/219871
What are wedoingwithunstructured data? ,[object Object]

Más contenido relacionado

Destacado

2012 05 usain-johanneskeizer
2012 05 usain-johanneskeizer2012 05 usain-johanneskeizer
2012 05 usain-johanneskeizerJohannes Keizer
 
2015 09 rda-pre-meeting_jk
2015 09 rda-pre-meeting_jk2015 09 rda-pre-meeting_jk
2015 09 rda-pre-meeting_jkJohannes Keizer
 
Init lod brussels_2011-04-12
Init lod brussels_2011-04-12Init lod brussels_2011-04-12
Init lod brussels_2011-04-12Johannes Keizer
 
2012 02 aos-johanneskeizer
2012 02 aos-johanneskeizer2012 02 aos-johanneskeizer
2012 02 aos-johanneskeizerJohannes Keizer
 
Presentation at the VIVO 2011 conference
Presentation at the VIVO 2011 conferencePresentation at the VIVO 2011 conference
Presentation at the VIVO 2011 conferenceJohannes Keizer
 
2011 11 grdi-presentation
2011 11 grdi-presentation2011 11 grdi-presentation
2011 11 grdi-presentationJohannes Keizer
 
Τύποι Σχοινιών
Τύποι Σχοινιών Τύποι Σχοινιών
Τύποι Σχοινιών sjoborre
 
1. Transportation All over The World
1. Transportation All over The World1. Transportation All over The World
1. Transportation All over The WorldRahmat Darsono
 
Presentation at the GODAN CAAS Workshop, Open Data and Agricultural Technolog...
Presentation at the GODAN CAAS Workshop, Open Data and Agricultural Technolog...Presentation at the GODAN CAAS Workshop, Open Data and Agricultural Technolog...
Presentation at the GODAN CAAS Workshop, Open Data and Agricultural Technolog...Johannes Keizer
 

Destacado (18)

2014 04 semic
2014 04 semic2014 04 semic
2014 04 semic
 
Afita Mssa Version 2
Afita Mssa Version 2Afita Mssa Version 2
Afita Mssa Version 2
 
2012 05 usain-johanneskeizer
2012 05 usain-johanneskeizer2012 05 usain-johanneskeizer
2012 05 usain-johanneskeizer
 
2015 09 rda-pre-meeting_jk
2015 09 rda-pre-meeting_jk2015 09 rda-pre-meeting_jk
2015 09 rda-pre-meeting_jk
 
Init lod brussels_2011-04-12
Init lod brussels_2011-04-12Init lod brussels_2011-04-12
Init lod brussels_2011-04-12
 
2012 02 aos-johanneskeizer
2012 02 aos-johanneskeizer2012 02 aos-johanneskeizer
2012 02 aos-johanneskeizer
 
2009 11 icudl
2009 11 icudl2009 11 icudl
2009 11 icudl
 
Presentation at the VIVO 2011 conference
Presentation at the VIVO 2011 conferencePresentation at the VIVO 2011 conference
Presentation at the VIVO 2011 conference
 
presentasi senior
presentasi seniorpresentasi senior
presentasi senior
 
2011 11 grdi-presentation
2011 11 grdi-presentation2011 11 grdi-presentation
2011 11 grdi-presentation
 
Nal 2011 05-19
Nal 2011 05-19Nal 2011 05-19
Nal 2011 05-19
 
Aglr Tf
Aglr TfAglr Tf
Aglr Tf
 
Senior Tourism
Senior TourismSenior Tourism
Senior Tourism
 
The Vocbench Project
The Vocbench ProjectThe Vocbench Project
The Vocbench Project
 
Τύποι Σχοινιών
Τύποι Σχοινιών Τύποι Σχοινιών
Τύποι Σχοινιών
 
1. Transportation All over The World
1. Transportation All over The World1. Transportation All over The World
1. Transportation All over The World
 
2015 11 agris-medes
2015 11 agris-medes2015 11 agris-medes
2015 11 agris-medes
 
Presentation at the GODAN CAAS Workshop, Open Data and Agricultural Technolog...
Presentation at the GODAN CAAS Workshop, Open Data and Agricultural Technolog...Presentation at the GODAN CAAS Workshop, Open Data and Agricultural Technolog...
Presentation at the GODAN CAAS Workshop, Open Data and Agricultural Technolog...
 

Similar a Ksim keizer 2010-10-19

Global Information Systems for Plant Genetic Resources (2009)
Global Information Systems for Plant Genetic Resources (2009)Global Information Systems for Plant Genetic Resources (2009)
Global Information Systems for Plant Genetic Resources (2009)Dag Endresen
 
Global Information Systems for Plant Genetic Resources, SeedNet training cour...
Global Information Systems for Plant Genetic Resources, SeedNet training cour...Global Information Systems for Plant Genetic Resources, SeedNet training cour...
Global Information Systems for Plant Genetic Resources, SeedNet training cour...Dag Endresen
 
Aos china keizer-2010-10-30
Aos china keizer-2010-10-30Aos china keizer-2010-10-30
Aos china keizer-2010-10-30Johannes Keizer
 
Agro know Food Safety Challenge for the Future Food Hack 2015
Agro know Food Safety Challenge for the Future Food Hack 2015Agro know Food Safety Challenge for the Future Food Hack 2015
Agro know Food Safety Challenge for the Future Food Hack 2015cthanopoulos
 
Istic thesaurus ws-keizer_2010-10-22
Istic thesaurus ws-keizer_2010-10-22Istic thesaurus ws-keizer_2010-10-22
Istic thesaurus ws-keizer_2010-10-22Johannes Keizer
 
Bioschemas: Marking up biodiversity websites to improve data discovery and we...
Bioschemas: Marking up biodiversity websites to improve data discovery and we...Bioschemas: Marking up biodiversity websites to improve data discovery and we...
Bioschemas: Marking up biodiversity websites to improve data discovery and we...Franck Michel
 
Trait data mining using FIGS (2006)
Trait data mining using FIGS (2006)Trait data mining using FIGS (2006)
Trait data mining using FIGS (2006)Dag Endresen
 
2007 08 26 Dc Keynote Keizer
2007 08 26 Dc Keynote Keizer2007 08 26 Dc Keynote Keizer
2007 08 26 Dc Keynote KeizerJohannes Keizer
 
Agro-Know & the European agricultural research information ecosystem
Agro-Know & the European agricultural research information ecosystemAgro-Know & the European agricultural research information ecosystem
Agro-Know & the European agricultural research information ecosystemNikos Manouselis
 
Berlin8 keizer 2010-10-25
Berlin8 keizer 2010-10-25Berlin8 keizer 2010-10-25
Berlin8 keizer 2010-10-25Johannes Keizer
 

Similar a Ksim keizer 2010-10-19 (20)

World bank 2011-05
World bank 2011-05World bank 2011-05
World bank 2011-05
 
Global Information Systems for Plant Genetic Resources (2009)
Global Information Systems for Plant Genetic Resources (2009)Global Information Systems for Plant Genetic Resources (2009)
Global Information Systems for Plant Genetic Resources (2009)
 
Agrovoc-Linked Open Data
Agrovoc-Linked Open DataAgrovoc-Linked Open Data
Agrovoc-Linked Open Data
 
AGROVOC: FAO’s multilingual thesaurus as a building block for linked open data
AGROVOC: FAO’s multilingual thesaurus as a building block for linked open dataAGROVOC: FAO’s multilingual thesaurus as a building block for linked open data
AGROVOC: FAO’s multilingual thesaurus as a building block for linked open data
 
Vocabularies and Linked Open Data
Vocabularies and Linked Open DataVocabularies and Linked Open Data
Vocabularies and Linked Open Data
 
Lo c 2011-05-18
Lo c 2011-05-18Lo c 2011-05-18
Lo c 2011-05-18
 
Global Information Systems for Plant Genetic Resources, SeedNet training cour...
Global Information Systems for Plant Genetic Resources, SeedNet training cour...Global Information Systems for Plant Genetic Resources, SeedNet training cour...
Global Information Systems for Plant Genetic Resources, SeedNet training cour...
 
Agrovoc Linked Open Data and the Voc Bench, Potentials for the Community
Agrovoc Linked Open Data and the Voc Bench, Potentials for the CommunityAgrovoc Linked Open Data and the Voc Bench, Potentials for the Community
Agrovoc Linked Open Data and the Voc Bench, Potentials for the Community
 
Aos china keizer-2010-10-30
Aos china keizer-2010-10-30Aos china keizer-2010-10-30
Aos china keizer-2010-10-30
 
Agro know Food Safety Challenge for the Future Food Hack 2015
Agro know Food Safety Challenge for the Future Food Hack 2015Agro know Food Safety Challenge for the Future Food Hack 2015
Agro know Food Safety Challenge for the Future Food Hack 2015
 
10 years Agricultural Ontology Initiative: Building Blocks for a Linked Data ...
10 years Agricultural Ontology Initiative: Building Blocks for a Linked Data ...10 years Agricultural Ontology Initiative: Building Blocks for a Linked Data ...
10 years Agricultural Ontology Initiative: Building Blocks for a Linked Data ...
 
Istic thesaurus ws-keizer_2010-10-22
Istic thesaurus ws-keizer_2010-10-22Istic thesaurus ws-keizer_2010-10-22
Istic thesaurus ws-keizer_2010-10-22
 
The role of Thesauri and Standard Vocabularies in linking data
The role of Thesauri and Standard Vocabularies in linking data The role of Thesauri and Standard Vocabularies in linking data
The role of Thesauri and Standard Vocabularies in linking data
 
AgriOcean DSpace
AgriOcean DSpace AgriOcean DSpace
AgriOcean DSpace
 
Bioschemas: Marking up biodiversity websites to improve data discovery and we...
Bioschemas: Marking up biodiversity websites to improve data discovery and we...Bioschemas: Marking up biodiversity websites to improve data discovery and we...
Bioschemas: Marking up biodiversity websites to improve data discovery and we...
 
Trait data mining using FIGS (2006)
Trait data mining using FIGS (2006)Trait data mining using FIGS (2006)
Trait data mining using FIGS (2006)
 
2007 08 26 Dc Keynote Keizer
2007 08 26 Dc Keynote Keizer2007 08 26 Dc Keynote Keizer
2007 08 26 Dc Keynote Keizer
 
2005 09 Dc Keynote
2005 09 Dc Keynote2005 09 Dc Keynote
2005 09 Dc Keynote
 
Agro-Know & the European agricultural research information ecosystem
Agro-Know & the European agricultural research information ecosystemAgro-Know & the European agricultural research information ecosystem
Agro-Know & the European agricultural research information ecosystem
 
Berlin8 keizer 2010-10-25
Berlin8 keizer 2010-10-25Berlin8 keizer 2010-10-25
Berlin8 keizer 2010-10-25
 

Más de Johannes Keizer (20)

Presentation CABI Beijing 2019 11-04
Presentation CABI Beijing  2019 11-04Presentation CABI Beijing  2019 11-04
Presentation CABI Beijing 2019 11-04
 
eROSA presentation at CAAS, September 2018
eROSA presentation at CAAS, September 2018eROSA presentation at CAAS, September 2018
eROSA presentation at CAAS, September 2018
 
2018 03 apan
2018 03 apan2018 03 apan
2018 03 apan
 
2017 11-15 macs
2017 11-15 macs2017 11-15 macs
2017 11-15 macs
 
2016 10 caas-ats
2016 10 caas-ats2016 10 caas-ats
2016 10 caas-ats
 
2016 08 gxaas
2016 08 gxaas2016 08 gxaas
2016 08 gxaas
 
2016 06 chengdu
2016 06 chengdu2016 06 chengdu
2016 06 chengdu
 
2017 08 apan
2017 08 apan2017 08 apan
2017 08 apan
 
2017 09 caas
2017 09 caas2017 09 caas
2017 09 caas
 
2017 11 wageningen-keizer
2017 11 wageningen-keizer2017 11 wageningen-keizer
2017 11 wageningen-keizer
 
2017 11 eosc-keizer
2017 11 eosc-keizer2017 11 eosc-keizer
2017 11 eosc-keizer
 
2017 11 cascd
2017 11 cascd2017 11 cascd
2017 11 cascd
 
2017 04 igad-jk
2017 04 igad-jk2017 04 igad-jk
2017 04 igad-jk
 
2017 02 apan
2017 02 apan2017 02 apan
2017 02 apan
 
2017 06 itpgrfa
2017 06 itpgrfa2017 06 itpgrfa
2017 06 itpgrfa
 
2017 03 brussels
2017 03 brussels2017 03 brussels
2017 03 brussels
 
2017 076 efita-sponsor-godan
2017 076 efita-sponsor-godan2017 076 efita-sponsor-godan
2017 076 efita-sponsor-godan
 
2017 07 montpellier-keizer
2017 07 montpellier-keizer2017 07 montpellier-keizer
2017 07 montpellier-keizer
 
2017 04 embl
2017 04 embl2017 04 embl
2017 04 embl
 
The FAIR principle in the Big Data World
The FAIR principle in the Big Data WorldThe FAIR principle in the Big Data World
The FAIR principle in the Big Data World
 

Ksim keizer 2010-10-19

Notas del editor

  1. Thisgraphelaboratedby Nova Spivacksfrom Radar Networksispopular at the moment. The Y-Axisisfor the increaseof information connections. The X-Axisisfor the increaseof social connections. Whereas the Web Operating System in 2030 isstill a brilliantguess in the future, the developmentof the Semantic Web, or Web 3.0 hasnowgotconsiderablemomentum
  2. Oneof the key development in the semantic web are “Linked Open Data”. The Linked Open Data paradigmclaimsthatexistingstructured data needtobereleasedfrom the proprietary silos in whichthey are at the moment. With the existenceof RDF (ResourceDescriptionFramework) there are the semantictoolsto do so. Thereisalsotechnologytouse RDF. More tothislater.
  3. Thisis a snapshotoneyearlater. The growthisenormous. A centralpointisDBPedia, “triplified” information fromWikipedia. The differentcoloursrepresent the different information types, being “life sciences” and “publications” the mostpopulatedareas, butwith the area “government” stronglygrowingInterestingnewcomers in the last months are the two VIVO datasetsfrom the UnitedStatesdescriping expertise in Science. Vivo isactually a project thatstarted the agriculturallibraryofCornellUniversity
  4. Whatdoesthismean in practice? I will show thiswithanexamplefrom the BBC. The biggestconsumers (and producers) of LOD are as I know the BBC and the New York times (Butnowalso the US government)
  5. During the Web 1.0 phase, Webpageswerecomposedbyhumans. Todaymostwebpages are drivenbydatabasesthat can bedynamicallyqueried. Theycontainthrough RSS feedsalso data fromotherwebsitesThis BBC webpageis a big jumpfurther. I hasnotbeencomposedbyhumans and itisnotfromone database generated. Itisgeneratedfromdifferentdatasourcesthatwerepresentaslinked open data, linkedonlythrough common URIs
  6. The “technology” thatmakeslinked open data possibleis RDF. Everything in RDF ismadeof “triples”, A triple means a statement with “Subject-Predicate-Object” asshown in thisexample. Ideally, allelementsof a triple are representedbyan URI, anunambiguousdefinitionof a concept, whichismachinereadable, buttriples can bebuiltalsofromsimpleletterstrings.
  7. Whatisnow the roleofthesauri and specifically the roleofourthesauri in this set up?
  8. In our team wehadveryearly the idea thatthesauriwouldbecomeofimportance in the developmentof Web information management. Within the AOS (AgriculturalOntology Service) initiativewehavegone a long and winding road. The Google searchshowsour 2003 paper in JODI.Butnow AGROVOC hasbecome showcase for the useofthesauritobuildconceptschemes
  9. Some auto appreciation
  10. Thisis the AGROVOC SKOS modelthathasbeendeveloped and decided in April 2010 under activecollaborationfrom Tom Baker, whowasmemberof the W3C SKOS workinggroup.
  11. SKOS-XL hasbeenpublishedas a W3C standard oneyear ago. The initialversionsof SKOS werenotsufficientto express the complexicitiesofmultilingualthesauri. Margherita Sini from FAO wasmemberof the SKOS workinggroup and we are vere satisfiedthat at then end a standard emergedthatcatersforourneeds
  12. You can seehere the AGROVOC encoding in SKOS
  13. During the discussions on the AGROVOC model, wealsodid some software engineering. The resultis the conceptschemeworkbench.Is a web-based working environment for managing the AGROVOC Concept Server  Facilitate the collaborative editing of multilingual terminology and semantic concept information  It includes administration and group management features  It includes workflows for maintenance, validation and quality assurance of the data pool  The CS is accessible freely to everybody to facilitates collaborative editing Alreadynownotonly AGROVOC is on the workbench, butalso the FAO OpenArchive authority data. We can hostanyconceptscheme
  14. The tableshows 3 descriptorsthat are in AGROVOC, EUROVOC and UNBIS. In AGROVOC and EUROVOC they are alreadyencodedasURIs. Easilywecouldestablishrelationshipslikeowl.sameAsbetween the concepts or skos:exactMatchbetweenlabels.
  15. In a bibliographical record thereismuch more hidden information thandisplayedwith the metadata. Manyof the highlystructured data are linkingtoother information on the web. In AGRIS wehavenowintroducedsomethingwhatwecall “naivelinking”. An AGRIS record linksautomaticallyto Google Mapsfor the location of the center and to Google toretrieve the full text of the resource, citationlists or otherpublicationsfrom the authors. Thisoftenworks, butclearlynotalway, s asitisnotcontrolledbysemantics, butonlythroughidentyofstrings. Foranuneducatedmachineunfortunately COW and C.O.W. are the same, whereaspeanuts and groundnuts are somethingdifferent.
  16. Ifresources are marked up withsemanticallydefined and machinereadableconcepts, they can belinked and mashed up preciselyaswehaveseen in the examplefrom the BBC.In thisexamplewe start withan AGRIS record on Hazardouswaste, whichisindexedwith AGROVOC. Alreadynowwe can easily link to material indexedwithEurovoc, hereanexamplefromEuroLex. If the UNBIS thesaurus wouldberestructuredto a conceptscheme and publishedas LOD, related UN documentscouldbeattachedautomaticallyby the machine.
  17. How does this work: A resource is connected with each concept URI in the web. The concepts between three vocabularies are having same literal which is connected with owl:sameAS/exactMatch relationship. As we are speakingaboutthesauri and notontologieswekept the relation tobechosenpurposelyvague. The conceptscouldbematchedwithowl:sameAS or the termscouldbematcheswith SKOS:exactMatch. A lotofdiscussion on thisisongoing
  18. Oneof the groundbreakingenterprises in this area isThomsonReuters “Open Calais”. Thisis a webservicethatprovidessemanticmark up foranyunstructured text thatyoufeedintotheir service The service is free ofCharge. Why? I will show youlater.
  19. My team in collaborationwith the IndianInstituteofTechnology in Kanpur isdeveloping a similar service foroursubject area.
  20. Wehavehere a text from 1964 without a bibliographic record at handabout a plantprotectionissue
  21. Open Calais isverygood in thoseareas, in whichtheyhavetheirownelaboratedconceptschemeagainstwhich the texts are analyzed: “Places”, “Persons”, “Business Processes” , “IndustryTerms”, butitisweak in the specifictopicanalysis, whattheycall “social tags”
  22. AgroTaggerstilllacksmanyof the sophisticated featuresof “Open Calais” ,butismuch, muchbetter in the subjectanalysisof the text
  23. Wewillnowtry a life demo