SlideShare una empresa de Scribd logo
1 de 32
Descargar para leer sin conexión
data.cnr.it and the
           Semantic Scout
      CNR Semantic Technology Lab
              ISTC - SI
Aldo Gangemi, Alberto Salvati, Enrico Daga, Gianluca Troiani
Thanks to Claudio Baldassarre (UN-FAO) and Alfio Gliozzo (IBM-Watson)
                       http://stlab.istc.cnr.it
                          http://data.cnr.it
                    http://bit.ly/semanticscout
                                                                        1
data.cnr.it




              2
Enhanced SPARQL endpoint




                      3
Ontologies




             4
Sample class from ontology




                         5
The Semantic Scout
• A framework for search, presentation, and analysis of entities and
  their associated knowledge
• Employs SW, LOD, NLP, IR
• Scientific work goes back to 2006, first presented at ISWC2007
• An evolving prototype for requirements of the EU IP IKS: semantic
  search, hybrid IR/SW identity management, automatic document
  classification (against DBpedia)
• 2009 requirements from the technology transfer office of CNR for the
  NetwOrK initiative




                                                              6
The CNR

• CNR is the largest research institution in Italy
 – about 8000 permanent researchers (+14000)
 – 7 departments focused on the main scientific
   research areas
 – 108 institutes spread all over Italy
   • Subdivided into research units, labs, etc.




                                                  7
The CNR data sources
                          Organizational data
                                                                                           File System
                                                                    DB
        DB
                                                                                        Administration
                          DB                                Frameworks,
  Departments                                                                           documentation
                                                            Programmes,
                                                            Workpackages
                    Institutes,
                  Central admin,
                   Publications

                                                       Activity-related data
                                                                                                Only partly as open data!
   DB                 DB



Curricula        Permanent
                                                                                                                      DB
                 employees                            DB
                                                                               Financial data                    Accounting,
                                                     Other                                                        Contracts,
                                                   research                                                       Invoicing
                                                  employees,
 Personnel-related data                            Externally
                                                funded projects
                                                                                                           8
The CNR tasks
• Strategic objective: matching the research
  demand to the research supply
• Requirements
 – Semantic interoperability between heterogeneous
   data sources
 – Expert finding based on competence
 – Monitoring funding and evolution of different
   research areas and units
 – Browsing and reporting capabilities


                                              9
Architecture




               10
11
Methods for data conversion, extraction, inference,
  integration, linking, publishing, and searching




                                              12
Figures



                 }
  28 modules
 120 classes
                     CNR	
  Ontology
 300 relations




                                         }
1200 axioms
>200K entities
≈3M facts (about 2M inferred or extracted)        CNR	
  Data
≈240 datasets


                                             13
Sources and lifting
• Situation usually not as clean as using a
  unique CMS for most organizational tasks
• DB (e.g. SQL Server) + a lot of textual
  records + HTML Web Site + textual corpus +
  linked open data
• DB + interaction schemata (XML templates
  and HTML scraping, needed because of
  schemata degradation and user perspective
  evolution)

                                      14
Ontology design
• Starting from XML templates as module/pattern drafts
• Reengineering XML and scraped templates
• Reengineering DB schemata (system engineer
  involved)
• Obtained modular, pattern-based, task-based ontology
• Textual DB records with identity: precondition for
  hybridizing IR and SW (see later)
• Alignments to FOAF, SIOC, SKOS, WordNet ontologies
• Used patterns: situation, place, transitive reduction


                                                15
The CNR
ontology




           16
Data design
• Triplifiers based on SQL rules (automatic
  scripting on JDBC drivers not enough because
  of legacy degradation of physical schemata)
 – Cf. also: Semion reengineering tool
• Inferences: OWL (Pellet, HermiT), SPARQL
  CONSTRUCT
• Extraction tool: Semiosearch, categorizer over
  Wikipedia categories
 – Next: deep parsing approach (facts, relations, entities)


                                                   17
Publishing and hybridizing
• Publishing OWL-RDF datasets
  – linked data approach (persistent URIs, triple stores for RDF dataset management,
    linking to common vocabularies: FOAF, DBpedia, Geonames, Bibo, ...)
  – OWL ontologies for dataset generation, querying, inference (new enriched
    datasets)
• Subgraph extraction through SNA
• Virtual semantic corpus
  – IRW to distinguish information and non-information resources
  – SPARQL rules to generate virtual texts associated with entities
• Indexing
  – Lucene+LSA indexing of semantic corpus
  – “Semantic” Lucene extension to produce tight coupling of virtual texts with
    entities
  – Multilinguality

                                                                          18
Consuming
• SPARQL endpoint, with interface enhancement
• Keyword-based search
  – Semantic browsing with SPARQL-based AJAX DHTML, RDF
    relation browser, or XML-based relation browser
• Category-based search
  – Keyword-based result focusing




                                                19
20
21
http://bit.ly/semanticscout




                          22
Expert finding: Task-based testing
• It is based on the ability to materialize on
  demand a contextual network of relevant
  information.
• It is performed with a combination of tools in the
  toolkit to:
 – Identify the main topics of research
 – Recursively search the CNR data cloud




                                              23
Identifying the main topics of research:
              project description
• “Reputation is a social knowledge, on which a number of social decisions are
  accomplished. Regulating society from the morning of mankind becomes more
  crucial with the pace of development of ICT technologies, dramatically
  enlarging the range of interaction and generating new types of aggregation.
  Despite its critical role, reputation generation, transmission and use are
  unclear. The project aims to an interdisciplinary theory of reputation and to
  modeling the interplay between direct evaluations and meta-evaluations in
  three types of decisions, epistemic (whether to form a given evaluation),
  strategic (whether and how interact with target), and memetic (whether and
  which evaluation to transmit).”
  – Project About: Social Knowledge for e-Governance.
  – Topics can be manually annotated, or automatically induced,
    e.g.: ethics, sociology, collaboration, social network,
    reputation



                                                                     24
Identifying the main topics of
        research: text categorization
• Query: “ethics, sociology, collaboration, social network, reputation”




                                                               25
Search the CNR data cloud: identify an
                 entry point
• “Commessa” (programme): “Il Circuito dell’Integrazione: Mente, Relazioni
  e Reti Sociali. Simulazione Sociale e Strumenti di Governance”




                                                                26
Search the CNR data cloud: identify
                   key people
• Ing. Jordi Sabater: Cognitive Science;
• Dott. Mario Paolucci: Sociology, Psichology;
• Gennaro di Tosto: Artificial Intelligence;
• Walter Quattrociocchi: Interdisciplinary Fields;




• Giuseppe Castaldi: Ethics;
                                                          27
• Aldo Gangemi: Semantic Web, Knowledge representation.
Expert Finding: Results
• The description of “eRep project” was adopted as a
  gold standard to evaluate the results when testing the
  Semantic Scout.
• 6 out of 10 CNR researchers, were correctly retrieved
  and a project member affiliated with another
  institution.
 – Project Coordinator: Dott. Mario Paolucci
 – External Member: Jordi Sabater Mir




                                                28
Functional evaluation of Semantic
              Scout (example)
• Expert finding accuracy
 – All the 6 retrieved people scored among the first 10 in the
   result from the search engine.
• Benefit of integrated data cloud
 – The user judged an “activity” to be relevant to his goal and
   used it as entry point to the CNR newtork of resources.




                                                        29
Functional evaluation of Semantic
                   Scout
• Accessibility and Interaction
  – Multiple users interfaces guarantee the users an adaptive level
    of interaction to each specific type of required information
• Completeness of retrieval
  – 4 people have not been included in our result set.
  – Antonietta Di Salvatore: scored below the first 10 people in the
    list;(+1)
  – Giulia Andrighetto was not listed among the people relevant to
    the query, but belongs to the social network of Dr. Rosaria
    Conte.(+1)
  – Marco Capenni and Stefano Picascia: have a technician profile,
    hence they are neither reported among the people relevant to
    the search query, nor belong to the network of any of the other
    researchers.


                                                           30
Ongoing work
• More data linking (e.g. DBLP,
  Georeferencing)
• Synchronization with data sources
• More interaction paradigms
• Privacy issues interlaced with hierarchical
  and idiosyncratic practices




                                          31
Conclusions
• Hybridizing several semantic and retrieval
  technologies provides added value to a
  research organization
• Scalability works for CNR figures
• Interaction is a core selling point
• Try it at http://bit.ly/semanticscout
• @data_cnr_it, @semanticscout,
  @aldogangemi

                                         32

Más contenido relacionado

La actualidad más candente

Everything Self-Service:Linked Data Applications with the Information Workbench
Everything Self-Service:Linked Data Applications with the Information WorkbenchEverything Self-Service:Linked Data Applications with the Information Workbench
Everything Self-Service:Linked Data Applications with the Information Workbench
Peter Haase
 
Data Mining
Data MiningData Mining
Data Mining
swami920
 
Introduction to Data Mining for Newbies
Introduction to Data Mining for NewbiesIntroduction to Data Mining for Newbies
Introduction to Data Mining for Newbies
Eunjeong (Lucy) Park
 
Fundamentals of data mining and its applications
Fundamentals of data mining and its applicationsFundamentals of data mining and its applications
Fundamentals of data mining and its applications
Subrat Swain
 
Open hpi semweb-06-part7
Open hpi semweb-06-part7Open hpi semweb-06-part7
Open hpi semweb-06-part7
Nadine Ludwig
 

La actualidad más candente (20)

Similarity based Dynamic Web Data Extraction and Integration System from Sear...
Similarity based Dynamic Web Data Extraction and Integration System from Sear...Similarity based Dynamic Web Data Extraction and Integration System from Sear...
Similarity based Dynamic Web Data Extraction and Integration System from Sear...
 
Crushing, Blending, and Stretching Transactional Data
Crushing, Blending, and Stretching Transactional DataCrushing, Blending, and Stretching Transactional Data
Crushing, Blending, and Stretching Transactional Data
 
Data mining - GDi Techno Solutions
Data mining - GDi Techno SolutionsData mining - GDi Techno Solutions
Data mining - GDi Techno Solutions
 
20120718 linkedopendataandnextgenerationsciencemcguinnessesip final
20120718 linkedopendataandnextgenerationsciencemcguinnessesip final20120718 linkedopendataandnextgenerationsciencemcguinnessesip final
20120718 linkedopendataandnextgenerationsciencemcguinnessesip final
 
Preparing eScience librarians -- RDAP 2012
Preparing eScience librarians -- RDAP 2012 Preparing eScience librarians -- RDAP 2012
Preparing eScience librarians -- RDAP 2012
 
Everything Self-Service:Linked Data Applications with the Information Workbench
Everything Self-Service:Linked Data Applications with the Information WorkbenchEverything Self-Service:Linked Data Applications with the Information Workbench
Everything Self-Service:Linked Data Applications with the Information Workbench
 
Data-knowledge transition zones within the biomedical research ecosystem
Data-knowledge transition zones within the biomedical research ecosystemData-knowledge transition zones within the biomedical research ecosystem
Data-knowledge transition zones within the biomedical research ecosystem
 
Data Mining
Data MiningData Mining
Data Mining
 
Introducation to metadata
Introducation to metadataIntroducation to metadata
Introducation to metadata
 
Dc sheridan dlf_2011_final
Dc sheridan dlf_2011_finalDc sheridan dlf_2011_final
Dc sheridan dlf_2011_final
 
Adaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of EndpointsAdaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of Endpoints
 
NISO Forum, Denver, Sept. 24, 2012: EZID: Easy dataset identification & manag...
NISO Forum, Denver, Sept. 24, 2012: EZID: Easy dataset identification & manag...NISO Forum, Denver, Sept. 24, 2012: EZID: Easy dataset identification & manag...
NISO Forum, Denver, Sept. 24, 2012: EZID: Easy dataset identification & manag...
 
20120419 linkedopendataandteamsciencemcguinnesschicago
20120419 linkedopendataandteamsciencemcguinnesschicago20120419 linkedopendataandteamsciencemcguinnesschicago
20120419 linkedopendataandteamsciencemcguinnesschicago
 
Introduction to Data Mining for Newbies
Introduction to Data Mining for NewbiesIntroduction to Data Mining for Newbies
Introduction to Data Mining for Newbies
 
Fundamentals of data mining and its applications
Fundamentals of data mining and its applicationsFundamentals of data mining and its applications
Fundamentals of data mining and its applications
 
Small Data: Bridging the Gap Between Generic and Specific Repositories
Small Data: Bridging the Gap Between Generic and Specific RepositoriesSmall Data: Bridging the Gap Between Generic and Specific Repositories
Small Data: Bridging the Gap Between Generic and Specific Repositories
 
Indexing techniques for advanced database systems
Indexing techniques for advanced database systemsIndexing techniques for advanced database systems
Indexing techniques for advanced database systems
 
Libby Bishop, Ethics Of Data Sharing Ncess Jun 09 Final
Libby Bishop, Ethics Of Data Sharing Ncess Jun 09 FinalLibby Bishop, Ethics Of Data Sharing Ncess Jun 09 Final
Libby Bishop, Ethics Of Data Sharing Ncess Jun 09 Final
 
Open hpi semweb-06-part7
Open hpi semweb-06-part7Open hpi semweb-06-part7
Open hpi semweb-06-part7
 
Role of Semantic Web in Health Informatics
Role of Semantic Web in Health InformaticsRole of Semantic Web in Health Informatics
Role of Semantic Web in Health Informatics
 

Destacado

Step by step guidance general overview final_new
Step by step guidance general overview final_newStep by step guidance general overview final_new
Step by step guidance general overview final_new
eTwinning Europe
 
осіння фантазія
осіння фантазіяосіння фантазія
осіння фантазія
Natalya Markova
 
Command keynote! part 2 p1
Command keynote! part 2 p1Command keynote! part 2 p1
Command keynote! part 2 p1
ambersweet95
 
Implementation training updated 9.27.13
Implementation training updated 9.27.13Implementation training updated 9.27.13
Implementation training updated 9.27.13
progroup
 
Iphone app possibilities
Iphone app possibilitiesIphone app possibilities
Iphone app possibilities
Jenny Chang
 
Evaluation Question 4
Evaluation Question 4Evaluation Question 4
Evaluation Question 4
AmyLongworth
 
KSA by Samaiel Bakolka & Rahaf Tawfeeg
KSA by Samaiel Bakolka & Rahaf TawfeegKSA by Samaiel Bakolka & Rahaf Tawfeeg
KSA by Samaiel Bakolka & Rahaf Tawfeeg
liza14
 

Destacado (19)

Tradicii
TradiciiTradicii
Tradicii
 
Innovative teaching manual of surumi
Innovative teaching manual of surumiInnovative teaching manual of surumi
Innovative teaching manual of surumi
 
Question 1
Question 1Question 1
Question 1
 
Step by step guidance general overview final_new
Step by step guidance general overview final_newStep by step guidance general overview final_new
Step by step guidance general overview final_new
 
Animal classification based on Job 39
Animal classification based on Job 39Animal classification based on Job 39
Animal classification based on Job 39
 
#weightloss 2014 vs Old School #Dieting
#weightloss 2014 vs Old School #Dieting#weightloss 2014 vs Old School #Dieting
#weightloss 2014 vs Old School #Dieting
 
The Millennial Shift: Financial Services and the Digial Generation Study Preview
The Millennial Shift: Financial Services and the Digial Generation Study PreviewThe Millennial Shift: Financial Services and the Digial Generation Study Preview
The Millennial Shift: Financial Services and the Digial Generation Study Preview
 
Ch06
Ch06Ch06
Ch06
 
осіння фантазія
осіння фантазіяосіння фантазія
осіння фантазія
 
Command keynote! part 2 p1
Command keynote! part 2 p1Command keynote! part 2 p1
Command keynote! part 2 p1
 
Implementation training updated 9.27.13
Implementation training updated 9.27.13Implementation training updated 9.27.13
Implementation training updated 9.27.13
 
Bs ipa7 semester 2
Bs ipa7 semester 2Bs ipa7 semester 2
Bs ipa7 semester 2
 
Iphone app possibilities
Iphone app possibilitiesIphone app possibilities
Iphone app possibilities
 
Новости недвижимости Майами - Февраль 2016
Новости недвижимости Майами - Февраль 2016Новости недвижимости Майами - Февраль 2016
Новости недвижимости Майами - Февраль 2016
 
Evaluation Question 4
Evaluation Question 4Evaluation Question 4
Evaluation Question 4
 
지정공모(Pt제출) 소셜나눔
지정공모(Pt제출) 소셜나눔지정공모(Pt제출) 소셜나눔
지정공모(Pt제출) 소셜나눔
 
KSA by Samaiel Bakolka & Rahaf Tawfeeg
KSA by Samaiel Bakolka & Rahaf TawfeegKSA by Samaiel Bakolka & Rahaf Tawfeeg
KSA by Samaiel Bakolka & Rahaf Tawfeeg
 
Ucm237512
Ucm237512Ucm237512
Ucm237512
 
Four Ways to Leverage Social Media in Your Marketing
Four Ways to Leverage Social Media in Your MarketingFour Ways to Leverage Social Media in Your Marketing
Four Ways to Leverage Social Media in Your Marketing
 

Similar a Linked Open data: CNR

Data repositories -- Xiamen University 2012 06-08
Data repositories -- Xiamen University 2012 06-08Data repositories -- Xiamen University 2012 06-08
Data repositories -- Xiamen University 2012 06-08
Jian Qin
 

Similar a Linked Open data: CNR (20)

Data Collection and Integration, Linked Data Management
Data Collection and Integration, Linked Data ManagementData Collection and Integration, Linked Data Management
Data Collection and Integration, Linked Data Management
 
Contributing to the Smart City Through Linked Library Data
Contributing to the Smart City Through Linked Library DataContributing to the Smart City Through Linked Library Data
Contributing to the Smart City Through Linked Library Data
 
Educating a New Breed of Data Scientists for Scientific Data Management
Educating a New Breed of Data Scientists for Scientific Data Management Educating a New Breed of Data Scientists for Scientific Data Management
Educating a New Breed of Data Scientists for Scientific Data Management
 
Introduction to the Semantic Web
Introduction to the Semantic WebIntroduction to the Semantic Web
Introduction to the Semantic Web
 
Functional and Architectural Requirements for Metadata: Supporting Discovery...
Functional and Architectural Requirements for Metadata: Supporting Discovery...Functional and Architectural Requirements for Metadata: Supporting Discovery...
Functional and Architectural Requirements for Metadata: Supporting Discovery...
 
Linked Data at the Open University: From Technical Challenges to Organization...
Linked Data at the Open University: From Technical Challenges to Organization...Linked Data at the Open University: From Technical Challenges to Organization...
Linked Data at the Open University: From Technical Challenges to Organization...
 
Why manage research data?
Why manage research data?Why manage research data?
Why manage research data?
 
Metadata in general and Dublin Core in specific; some experiences
Metadata in general and Dublin Core in specific; some experiencesMetadata in general and Dublin Core in specific; some experiences
Metadata in general and Dublin Core in specific; some experiences
 
Simon Hodson
Simon HodsonSimon Hodson
Simon Hodson
 
CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...
CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...
CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...
 
Minimizing the Complexities of Machine Learning with Data Virtualization
Minimizing the Complexities of Machine Learning with Data VirtualizationMinimizing the Complexities of Machine Learning with Data Virtualization
Minimizing the Complexities of Machine Learning with Data Virtualization
 
A Framework for Ontology Usage Analysis
A Framework for Ontology Usage AnalysisA Framework for Ontology Usage Analysis
A Framework for Ontology Usage Analysis
 
ESI Supplemental Webinar 2 - DataONE presentation slides
ESI Supplemental Webinar 2 - DataONE presentation slides ESI Supplemental Webinar 2 - DataONE presentation slides
ESI Supplemental Webinar 2 - DataONE presentation slides
 
Data repositories -- Xiamen University 2012 06-08
Data repositories -- Xiamen University 2012 06-08Data repositories -- Xiamen University 2012 06-08
Data repositories -- Xiamen University 2012 06-08
 
Preparing eScience Librarians for Managing Research Data - Jian Qin - RDAP12
Preparing eScience Librarians for Managing Research Data - Jian Qin - RDAP12Preparing eScience Librarians for Managing Research Data - Jian Qin - RDAP12
Preparing eScience Librarians for Managing Research Data - Jian Qin - RDAP12
 
Building a Data Discovery Network for Sustainability Science
Building a Data Discovery Network for Sustainability ScienceBuilding a Data Discovery Network for Sustainability Science
Building a Data Discovery Network for Sustainability Science
 
NISO Forum, Denver, Sept. 24, 2012: Needs for Data Management & Citation Thro...
NISO Forum, Denver, Sept. 24, 2012: Needs for Data Management & Citation Thro...NISO Forum, Denver, Sept. 24, 2012: Needs for Data Management & Citation Thro...
NISO Forum, Denver, Sept. 24, 2012: Needs for Data Management & Citation Thro...
 
Neuroscience as networked science
Neuroscience as networked scienceNeuroscience as networked science
Neuroscience as networked science
 
Introduction to Object Oriented databases
Introduction to Object Oriented databasesIntroduction to Object Oriented databases
Introduction to Object Oriented databases
 
CS6007 information retrieval - 5 units notes
CS6007   information retrieval - 5 units notesCS6007   information retrieval - 5 units notes
CS6007 information retrieval - 5 units notes
 

Más de DatiGovIT

La carta dei dati aperti
La carta dei dati apertiLa carta dei dati aperti
La carta dei dati aperti
DatiGovIT
 

Más de DatiGovIT (14)

La carta dei dati aperti
La carta dei dati apertiLa carta dei dati aperti
La carta dei dati aperti
 
Big data & opendata
Big data & opendataBig data & opendata
Big data & opendata
 
OPEN DATA, L’ESPERIENZA DI REGIONE LOMBARDIA
OPEN DATA, L’ESPERIENZA DI REGIONE LOMBARDIAOPEN DATA, L’ESPERIENZA DI REGIONE LOMBARDIA
OPEN DATA, L’ESPERIENZA DI REGIONE LOMBARDIA
 
opendata.comune.bari.it
opendata.comune.bari.itopendata.comune.bari.it
opendata.comune.bari.it
 
Contenuti minimi: modalità di pubblicazione
Contenuti minimi: modalità di pubblicazione  Contenuti minimi: modalità di pubblicazione
Contenuti minimi: modalità di pubblicazione
 
(LINKED) OPEN DATA A FIRENZE
(LINKED) OPEN DATA A FIRENZE(LINKED) OPEN DATA A FIRENZE
(LINKED) OPEN DATA A FIRENZE
 
Progetto open data Milano
Progetto open data Milano Progetto open data Milano
Progetto open data Milano
 
Esperienza open data della provincia di Roma
Esperienza open data della provincia di RomaEsperienza open data della provincia di Roma
Esperienza open data della provincia di Roma
 
Il Comune di Senigallia e il progetto OpenMunicipio
Il Comune di Senigallia e il progetto OpenMunicipioIl Comune di Senigallia e il progetto OpenMunicipio
Il Comune di Senigallia e il progetto OpenMunicipio
 
Open Municipio
Open MunicipioOpen Municipio
Open Municipio
 
il portale Dati.gov.it e l’Infografica su open data in Italia
il portale Dati.gov.it e l’Infografica su open data in Italia il portale Dati.gov.it e l’Infografica su open data in Italia
il portale Dati.gov.it e l’Infografica su open data in Italia
 
Open semantic linked data
Open semantic linked dataOpen semantic linked data
Open semantic linked data
 
Open data INPS
Open data INPS Open data INPS
Open data INPS
 
Open data Firenze - opendata.comune.fi.it
Open data Firenze - opendata.comune.fi.itOpen data Firenze - opendata.comune.fi.it
Open data Firenze - opendata.comune.fi.it
 

Último

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 

Linked Open data: CNR

  • 1. data.cnr.it and the Semantic Scout CNR Semantic Technology Lab ISTC - SI Aldo Gangemi, Alberto Salvati, Enrico Daga, Gianluca Troiani Thanks to Claudio Baldassarre (UN-FAO) and Alfio Gliozzo (IBM-Watson) http://stlab.istc.cnr.it http://data.cnr.it http://bit.ly/semanticscout 1
  • 5. Sample class from ontology 5
  • 6. The Semantic Scout • A framework for search, presentation, and analysis of entities and their associated knowledge • Employs SW, LOD, NLP, IR • Scientific work goes back to 2006, first presented at ISWC2007 • An evolving prototype for requirements of the EU IP IKS: semantic search, hybrid IR/SW identity management, automatic document classification (against DBpedia) • 2009 requirements from the technology transfer office of CNR for the NetwOrK initiative 6
  • 7. The CNR • CNR is the largest research institution in Italy – about 8000 permanent researchers (+14000) – 7 departments focused on the main scientific research areas – 108 institutes spread all over Italy • Subdivided into research units, labs, etc. 7
  • 8. The CNR data sources Organizational data File System DB DB Administration DB Frameworks, Departments documentation Programmes, Workpackages Institutes, Central admin, Publications Activity-related data Only partly as open data! DB DB Curricula Permanent DB employees DB Financial data Accounting, Other Contracts, research Invoicing employees, Personnel-related data Externally funded projects 8
  • 9. The CNR tasks • Strategic objective: matching the research demand to the research supply • Requirements – Semantic interoperability between heterogeneous data sources – Expert finding based on competence – Monitoring funding and evolution of different research areas and units – Browsing and reporting capabilities 9
  • 11. 11
  • 12. Methods for data conversion, extraction, inference, integration, linking, publishing, and searching 12
  • 13. Figures } 28 modules 120 classes CNR  Ontology 300 relations } 1200 axioms >200K entities ≈3M facts (about 2M inferred or extracted) CNR  Data ≈240 datasets 13
  • 14. Sources and lifting • Situation usually not as clean as using a unique CMS for most organizational tasks • DB (e.g. SQL Server) + a lot of textual records + HTML Web Site + textual corpus + linked open data • DB + interaction schemata (XML templates and HTML scraping, needed because of schemata degradation and user perspective evolution) 14
  • 15. Ontology design • Starting from XML templates as module/pattern drafts • Reengineering XML and scraped templates • Reengineering DB schemata (system engineer involved) • Obtained modular, pattern-based, task-based ontology • Textual DB records with identity: precondition for hybridizing IR and SW (see later) • Alignments to FOAF, SIOC, SKOS, WordNet ontologies • Used patterns: situation, place, transitive reduction 15
  • 17. Data design • Triplifiers based on SQL rules (automatic scripting on JDBC drivers not enough because of legacy degradation of physical schemata) – Cf. also: Semion reengineering tool • Inferences: OWL (Pellet, HermiT), SPARQL CONSTRUCT • Extraction tool: Semiosearch, categorizer over Wikipedia categories – Next: deep parsing approach (facts, relations, entities) 17
  • 18. Publishing and hybridizing • Publishing OWL-RDF datasets – linked data approach (persistent URIs, triple stores for RDF dataset management, linking to common vocabularies: FOAF, DBpedia, Geonames, Bibo, ...) – OWL ontologies for dataset generation, querying, inference (new enriched datasets) • Subgraph extraction through SNA • Virtual semantic corpus – IRW to distinguish information and non-information resources – SPARQL rules to generate virtual texts associated with entities • Indexing – Lucene+LSA indexing of semantic corpus – “Semantic” Lucene extension to produce tight coupling of virtual texts with entities – Multilinguality 18
  • 19. Consuming • SPARQL endpoint, with interface enhancement • Keyword-based search – Semantic browsing with SPARQL-based AJAX DHTML, RDF relation browser, or XML-based relation browser • Category-based search – Keyword-based result focusing 19
  • 20. 20
  • 21. 21
  • 23. Expert finding: Task-based testing • It is based on the ability to materialize on demand a contextual network of relevant information. • It is performed with a combination of tools in the toolkit to: – Identify the main topics of research – Recursively search the CNR data cloud 23
  • 24. Identifying the main topics of research: project description • “Reputation is a social knowledge, on which a number of social decisions are accomplished. Regulating society from the morning of mankind becomes more crucial with the pace of development of ICT technologies, dramatically enlarging the range of interaction and generating new types of aggregation. Despite its critical role, reputation generation, transmission and use are unclear. The project aims to an interdisciplinary theory of reputation and to modeling the interplay between direct evaluations and meta-evaluations in three types of decisions, epistemic (whether to form a given evaluation), strategic (whether and how interact with target), and memetic (whether and which evaluation to transmit).” – Project About: Social Knowledge for e-Governance. – Topics can be manually annotated, or automatically induced, e.g.: ethics, sociology, collaboration, social network, reputation 24
  • 25. Identifying the main topics of research: text categorization • Query: “ethics, sociology, collaboration, social network, reputation” 25
  • 26. Search the CNR data cloud: identify an entry point • “Commessa” (programme): “Il Circuito dell’Integrazione: Mente, Relazioni e Reti Sociali. Simulazione Sociale e Strumenti di Governance” 26
  • 27. Search the CNR data cloud: identify key people • Ing. Jordi Sabater: Cognitive Science; • Dott. Mario Paolucci: Sociology, Psichology; • Gennaro di Tosto: Artificial Intelligence; • Walter Quattrociocchi: Interdisciplinary Fields; • Giuseppe Castaldi: Ethics; 27 • Aldo Gangemi: Semantic Web, Knowledge representation.
  • 28. Expert Finding: Results • The description of “eRep project” was adopted as a gold standard to evaluate the results when testing the Semantic Scout. • 6 out of 10 CNR researchers, were correctly retrieved and a project member affiliated with another institution. – Project Coordinator: Dott. Mario Paolucci – External Member: Jordi Sabater Mir 28
  • 29. Functional evaluation of Semantic Scout (example) • Expert finding accuracy – All the 6 retrieved people scored among the first 10 in the result from the search engine. • Benefit of integrated data cloud – The user judged an “activity” to be relevant to his goal and used it as entry point to the CNR newtork of resources. 29
  • 30. Functional evaluation of Semantic Scout • Accessibility and Interaction – Multiple users interfaces guarantee the users an adaptive level of interaction to each specific type of required information • Completeness of retrieval – 4 people have not been included in our result set. – Antonietta Di Salvatore: scored below the first 10 people in the list;(+1) – Giulia Andrighetto was not listed among the people relevant to the query, but belongs to the social network of Dr. Rosaria Conte.(+1) – Marco Capenni and Stefano Picascia: have a technician profile, hence they are neither reported among the people relevant to the search query, nor belong to the network of any of the other researchers. 30
  • 31. Ongoing work • More data linking (e.g. DBLP, Georeferencing) • Synchronization with data sources • More interaction paradigms • Privacy issues interlaced with hierarchical and idiosyncratic practices 31
  • 32. Conclusions • Hybridizing several semantic and retrieval technologies provides added value to a research organization • Scalability works for CNR figures • Interaction is a core selling point • Try it at http://bit.ly/semanticscout • @data_cnr_it, @semanticscout, @aldogangemi 32