SlideShare una empresa de Scribd logo
1 de 49
Descargar para leer sin conexión
Automatically
   indexing
science using
   natural-
  language
 processing,
  RDF and
  SPARQL

   Andrew          Automatically indexing science using
Walkingshaw,
  Nick Day,
Peter Corbett,
                  natural-language processing, RDF and
Jim Downing,
     Joe                        SPARQL
  Townsend,
    Peter
 Murray-Rust


Gathering
                 Andrew Walkingshaw, Nick Day, Peter Corbett, Jim
data                Downing, Joe Townsend, Peter Murray-Rust
Extracting
(meta)data

Using the data

Thanks                          February 16, 2008
Automatically
   indexing
science using
   natural-
  language
                                                 Data sources
 processing,
  RDF and
  SPARQL

   Andrew
Walkingshaw,
  Nick Day,
Peter Corbett,
Jim Downing,
     Joe
  Townsend,
                 • Supplemental and experimental data
    Peter
 Murray-Rust


Gathering
data

Extracting
(meta)data

Using the data

Thanks
Automatically
   indexing
science using
   natural-
  language
                                                 Data sources
 processing,
  RDF and
  SPARQL

   Andrew
Walkingshaw,
  Nick Day,
Peter Corbett,
Jim Downing,
     Joe
  Townsend,
                 • Supplemental and experimental data
    Peter
 Murray-Rust     • Journals

Gathering
data

Extracting
(meta)data

Using the data

Thanks
Automatically
   indexing
science using
   natural-
  language
                                                       Data sources
 processing,
  RDF and
  SPARQL

   Andrew
Walkingshaw,
  Nick Day,
Peter Corbett,
Jim Downing,
     Joe
  Townsend,
                 • Supplemental and experimental data
    Peter
 Murray-Rust     • Journals

Gathering
                 • Self-archived papers (e.g. arXiv)
data

Extracting
(meta)data

Using the data

Thanks
Automatically
   indexing
science using
   natural-
  language
                                                       Data sources
 processing,
  RDF and
  SPARQL

   Andrew
Walkingshaw,
  Nick Day,
Peter Corbett,
Jim Downing,
     Joe
  Townsend,
                 • Supplemental and experimental data
    Peter
 Murray-Rust     • Journals

Gathering
                 • Self-archived papers (e.g. arXiv)
data
                 • Mainstream journalism
Extracting
(meta)data

Using the data

Thanks
Automatically
   indexing
science using
   natural-
  language
                                                       Data sources
 processing,
  RDF and
  SPARQL

   Andrew
Walkingshaw,
  Nick Day,
Peter Corbett,
Jim Downing,
     Joe
  Townsend,
                 • Supplemental and experimental data
    Peter
 Murray-Rust     • Journals

Gathering
                 • Self-archived papers (e.g. arXiv)
data
                 • Mainstream journalism
Extracting
(meta)data       • Blogs
Using the data

Thanks
Automatically
   indexing
science using
   natural-
  language
                           Supplemental data: CrystalEye
 processing,
  RDF and
  SPARQL

   Andrew
Walkingshaw,
  Nick Day,
Peter Corbett,
Jim Downing,
     Joe
  Townsend,
    Peter
 Murray-Rust
                 • http://wwmm.ch.cam.ac.uk/crystaleye/
Gathering
data

Extracting
(meta)data

Using the data

Thanks
Automatically
   indexing
science using
   natural-
  language
                             Supplemental data: CrystalEye
 processing,
  RDF and
  SPARQL

   Andrew
Walkingshaw,
  Nick Day,
Peter Corbett,
Jim Downing,
     Joe
  Townsend,
    Peter
 Murray-Rust
                 • http://wwmm.ch.cam.ac.uk/crystaleye/
Gathering        • Repository for crystallographic data
data

Extracting
(meta)data

Using the data

Thanks
Automatically
   indexing
science using
   natural-
  language
                                             Journals and arXiv
 processing,
  RDF and
  SPARQL

   Andrew
Walkingshaw,
  Nick Day,
Peter Corbett,
Jim Downing,
     Joe
  Townsend,
    Peter
 Murray-Rust     • “Traditional” journal articles
Gathering
data

Extracting
(meta)data

Using the data

Thanks
Automatically
   indexing
science using
   natural-
  language
                                               Journals and arXiv
 processing,
  RDF and
  SPARQL

   Andrew
Walkingshaw,
  Nick Day,
Peter Corbett,
Jim Downing,
     Joe
  Townsend,
    Peter
 Murray-Rust     • “Traditional” journal articles
Gathering        • Titles and abstracts. . .
data

Extracting
(meta)data

Using the data

Thanks
Automatically
   indexing
science using
   natural-
  language
                                         Journalism and blogs
 processing,
  RDF and
  SPARQL

   Andrew
Walkingshaw,
  Nick Day,
Peter Corbett,
Jim Downing,
     Joe
  Townsend,
    Peter
 Murray-Rust
                 • Unstructured text with little semantics;
Gathering
data

Extracting
(meta)data

Using the data

Thanks
Automatically
   indexing
science using
   natural-
  language
                                          Journalism and blogs
 processing,
  RDF and
  SPARQL

   Andrew
Walkingshaw,
  Nick Day,
Peter Corbett,
Jim Downing,
     Joe
  Townsend,
    Peter
 Murray-Rust
                 • Unstructured text with little semantics;
Gathering        • . . . hence Google Scholar, Web of Science, etc.
data

Extracting
(meta)data

Using the data

Thanks
Automatically
   indexing
science using
   natural-
  language
                              Semi-structured data: Golem
 processing,
  RDF and
  SPARQL

   Andrew
Walkingshaw,
  Nick Day,      • We’ve got a lot of chemical data as CML
Peter Corbett,
Jim Downing,
     Joe
  Townsend,
    Peter
 Murray-Rust


Gathering
data

Extracting
(meta)data

Using the data

Thanks
Automatically
   indexing
science using
   natural-
  language
                              Semi-structured data: Golem
 processing,
  RDF and
  SPARQL

   Andrew
Walkingshaw,
  Nick Day,      • We’ve got a lot of chemical data as CML
Peter Corbett,
Jim Downing,
     Joe
                 • http://en.wikipedia.org/wiki/Chemical Markup Language
  Townsend,
    Peter
 Murray-Rust


Gathering
data

Extracting
(meta)data

Using the data

Thanks
Automatically
   indexing
science using
   natural-
  language
                                Semi-structured data: Golem
 processing,
  RDF and
  SPARQL

   Andrew
Walkingshaw,
  Nick Day,      • We’ve got a lot of chemical data as CML
Peter Corbett,
Jim Downing,
     Joe
                 • http://en.wikipedia.org/wiki/Chemical Markup Language
  Townsend,
    Peter        • . . . but we still need to get data out of that and into a
 Murray-Rust
                   more useful form
Gathering
data

Extracting
(meta)data

Using the data

Thanks
Automatically
   indexing
science using
   natural-
  language
                                Semi-structured data: Golem
 processing,
  RDF and
  SPARQL

   Andrew
Walkingshaw,
  Nick Day,      • We’ve got a lot of chemical data as CML
Peter Corbett,
Jim Downing,
     Joe
                 • http://en.wikipedia.org/wiki/Chemical Markup Language
  Townsend,
    Peter        • . . . but we still need to get data out of that and into a
 Murray-Rust
                   more useful form
Gathering
data
                 • hence Golem: http://www.lexical.org.uk/science/golem/
Extracting
(meta)data

Using the data

Thanks
Automatically
   indexing
science using
   natural-
  language
                                Semi-structured data: Golem
 processing,
  RDF and
  SPARQL

   Andrew
Walkingshaw,
  Nick Day,      • We’ve got a lot of chemical data as CML
Peter Corbett,
Jim Downing,
     Joe
                 • http://en.wikipedia.org/wiki/Chemical Markup Language
  Townsend,
    Peter        • . . . but we still need to get data out of that and into a
 Murray-Rust
                   more useful form
Gathering
data
                 • hence Golem: http://www.lexical.org.uk/science/golem/
Extracting       • GRDDLish strategy for extracting data from CML files:
(meta)data

Using the data
                   identify dialect-specific concepts with XPath expressions
Thanks
                   and XSLT stylesheets
Automatically
   indexing
science using
   natural-
  language
                                Semi-structured data: Golem
 processing,
  RDF and
  SPARQL

   Andrew
Walkingshaw,
  Nick Day,      • We’ve got a lot of chemical data as CML
Peter Corbett,
Jim Downing,
     Joe
                 • http://en.wikipedia.org/wiki/Chemical Markup Language
  Townsend,
    Peter        • . . . but we still need to get data out of that and into a
 Murray-Rust
                   more useful form
Gathering
data
                 • hence Golem: http://www.lexical.org.uk/science/golem/
Extracting       • GRDDLish strategy for extracting data from CML files:
(meta)data

Using the data
                   identify dialect-specific concepts with XPath expressions
Thanks
                   and XSLT stylesheets
                 • upshot: we can extract JSON objects from CML files.
Automatically
   indexing
science using
   natural-
  language
                                          Free text: OSCAR3
 processing,
  RDF and
  SPARQL

   Andrew
Walkingshaw,
  Nick Day,
Peter Corbett,
Jim Downing,     • http://oscar3-chem.sourceforge.net/
     Joe
  Townsend,
    Peter
 Murray-Rust


Gathering
data

Extracting
(meta)data

Using the data

Thanks
Automatically
   indexing
science using
   natural-
  language
                                          Free text: OSCAR3
 processing,
  RDF and
  SPARQL

   Andrew
Walkingshaw,
  Nick Day,
Peter Corbett,
Jim Downing,     • http://oscar3-chem.sourceforge.net/
     Joe
  Townsend,      • Natural-language parser for documents about chemistry
    Peter
 Murray-Rust


Gathering
data

Extracting
(meta)data

Using the data

Thanks
Automatically
   indexing
science using
   natural-
  language
                                          Free text: OSCAR3
 processing,
  RDF and
  SPARQL

   Andrew
Walkingshaw,
  Nick Day,
Peter Corbett,
Jim Downing,     • http://oscar3-chem.sourceforge.net/
     Joe
  Townsend,      • Natural-language parser for documents about chemistry
    Peter
 Murray-Rust
                 • Dark magic: don’t ask me how it works!
Gathering
data

Extracting
(meta)data

Using the data

Thanks
Automatically
   indexing
science using
   natural-
  language
                                             Free text: OSCAR3
 processing,
  RDF and
  SPARQL

   Andrew
Walkingshaw,
  Nick Day,
Peter Corbett,
Jim Downing,     • http://oscar3-chem.sourceforge.net/
     Joe
  Townsend,      • Natural-language parser for documents about chemistry
    Peter
 Murray-Rust
                 • Dark magic: don’t ask me how it works!
Gathering        • . . . but it can be run as a Jetty webservice so as long as it
data

Extracting
                   does, I’m happy
(meta)data

Using the data

Thanks
Automatically
   indexing
science using
   natural-
  language
                                             Free text: OSCAR3
 processing,
  RDF and
  SPARQL

   Andrew
Walkingshaw,
  Nick Day,
Peter Corbett,
Jim Downing,     • http://oscar3-chem.sourceforge.net/
     Joe
  Townsend,      • Natural-language parser for documents about chemistry
    Peter
 Murray-Rust
                 • Dark magic: don’t ask me how it works!
Gathering        • . . . but it can be run as a Jetty webservice so as long as it
data

Extracting
                   does, I’m happy
(meta)data
                 • Author’s blog:
Using the data
                   http://wwmm.ch.cam.ac.uk/blogs/corbett/
Thanks
Automatically
   indexing
science using
   natural-
  language
                                            Getting the data in
 processing,
  RDF and
  SPARQL

   Andrew
Walkingshaw,
  Nick Day,
Peter Corbett,
Jim Downing,
     Joe
  Townsend,
    Peter
 Murray-Rust
                 • Everything (more or less) talks RSS nowadays. . .

Gathering
data

Extracting
(meta)data

Using the data

Thanks
Automatically
   indexing
science using
   natural-
  language
                                            Getting the data in
 processing,
  RDF and
  SPARQL

   Andrew
Walkingshaw,
  Nick Day,
Peter Corbett,
Jim Downing,
     Joe
  Townsend,
    Peter
 Murray-Rust
                 • Everything (more or less) talks RSS nowadays. . .
                 • RSS 0.91, RSS 1.0 (which one?), Atom, etc etc etc.
Gathering
data

Extracting
(meta)data

Using the data

Thanks
Automatically
   indexing
science using
   natural-
  language
                                            Getting the data in
 processing,
  RDF and
  SPARQL

   Andrew
Walkingshaw,
  Nick Day,
Peter Corbett,
Jim Downing,
     Joe
  Townsend,
    Peter
 Murray-Rust
                 • Everything (more or less) talks RSS nowadays. . .
                 • RSS 0.91, RSS 1.0 (which one?), Atom, etc etc etc.
Gathering
data
                 • Thankfully: feedparser (http://feedparser.org/)
Extracting
(meta)data

Using the data

Thanks
Automatically
   indexing
science using
   natural-
  language
                                  Serializing metadata
 processing,
  RDF and
  SPARQL

   Andrew
Walkingshaw,
  Nick Day,
Peter Corbett,
Jim Downing,
     Joe         • RDF – using:
  Townsend,
    Peter
 Murray-Rust


Gathering
data

Extracting
(meta)data

Using the data

Thanks
Automatically
   indexing
science using
   natural-
  language
                                       Serializing metadata
 processing,
  RDF and
  SPARQL

   Andrew
Walkingshaw,
  Nick Day,
Peter Corbett,
Jim Downing,
     Joe         • RDF – using:
  Townsend,
    Peter        • Dublin Core terms
 Murray-Rust


Gathering
data

Extracting
(meta)data

Using the data

Thanks
Automatically
   indexing
science using
   natural-
  language
                                        Serializing metadata
 processing,
  RDF and
  SPARQL

   Andrew
Walkingshaw,
  Nick Day,
Peter Corbett,
Jim Downing,
     Joe         • RDF – using:
  Townsend,
    Peter        • Dublin Core terms
 Murray-Rust
                 • A homebrew ontology based on the IUCr’s CIF data format
Gathering
data

Extracting
(meta)data

Using the data

Thanks
Automatically
   indexing
science using
   natural-
  language
                                        Serializing metadata
 processing,
  RDF and
  SPARQL

   Andrew
Walkingshaw,
  Nick Day,
Peter Corbett,
Jim Downing,
     Joe         • RDF – using:
  Townsend,
    Peter        • Dublin Core terms
 Murray-Rust
                 • A homebrew ontology based on the IUCr’s CIF data format
Gathering
data             • and another homebrew ontology for OSCAR annotations
Extracting
(meta)data

Using the data

Thanks
Automatically
   indexing
science using
   natural-
  language
                                          Serializing metadata
 processing,
  RDF and
  SPARQL

   Andrew
Walkingshaw,
  Nick Day,
Peter Corbett,
Jim Downing,
     Joe         • RDF – using:
  Townsend,
    Peter        • Dublin Core terms
 Murray-Rust
                 • A homebrew ontology based on the IUCr’s CIF data format
Gathering
data             • and another homebrew ontology for OSCAR annotations
Extracting
(meta)data       • (it’d be good to standardise these, but to be honest, not
Using the data     many people are doing this sort of thing)
Thanks
Automatically
   indexing
science using
   natural-
  language
                                                       The process
 processing,
  RDF and
  SPARQL

   Andrew
Walkingshaw,
  Nick Day,      • For each feed in a list of feeds:
Peter Corbett,
Jim Downing,
     Joe
  Townsend,
    Peter
 Murray-Rust


Gathering
data

Extracting
(meta)data

Using the data

Thanks
Automatically
   indexing
science using
   natural-
  language
                                                       The process
 processing,
  RDF and
  SPARQL

   Andrew
Walkingshaw,
  Nick Day,      • For each feed in a list of feeds:
Peter Corbett,
Jim Downing,     • If it’s supplying CML data, set Golem on each entry, get
     Joe
  Townsend,        the observables out, and turn them into triples; run
    Peter
 Murray-Rust       OSCAR3 over the title and/or abstract
Gathering
data

Extracting
(meta)data

Using the data

Thanks
Automatically
   indexing
science using
   natural-
  language
                                                       The process
 processing,
  RDF and
  SPARQL

   Andrew
Walkingshaw,
  Nick Day,      • For each feed in a list of feeds:
Peter Corbett,
Jim Downing,     • If it’s supplying CML data, set Golem on each entry, get
     Joe
  Townsend,        the observables out, and turn them into triples; run
    Peter
 Murray-Rust       OSCAR3 over the title and/or abstract
Gathering        • If it’s not, extract the free text from each entry, send it to
data
                   the OSCAR web service, and assign triples based on the
Extracting
(meta)data         chemical entities OSCAR finds
Using the data

Thanks
Automatically
   indexing
science using
   natural-
  language
                                                       The process
 processing,
  RDF and
  SPARQL

   Andrew
Walkingshaw,
  Nick Day,      • For each feed in a list of feeds:
Peter Corbett,
Jim Downing,     • If it’s supplying CML data, set Golem on each entry, get
     Joe
  Townsend,        the observables out, and turn them into triples; run
    Peter
 Murray-Rust       OSCAR3 over the title and/or abstract
Gathering        • If it’s not, extract the free text from each entry, send it to
data
                   the OSCAR web service, and assign triples based on the
Extracting
(meta)data         chemical entities OSCAR finds
Using the data   • Upload the RDF to your triple store
Thanks
Automatically
   indexing
science using
   natural-
  language
                                                       The process
 processing,
  RDF and
  SPARQL

   Andrew
Walkingshaw,
  Nick Day,      • For each feed in a list of feeds:
Peter Corbett,
Jim Downing,     • If it’s supplying CML data, set Golem on each entry, get
     Joe
  Townsend,        the observables out, and turn them into triples; run
    Peter
 Murray-Rust       OSCAR3 over the title and/or abstract
Gathering        • If it’s not, extract the free text from each entry, send it to
data
                   the OSCAR web service, and assign triples based on the
Extracting
(meta)data         chemical entities OSCAR finds
Using the data   • Upload the RDF to your triple store
Thanks
                 • (I’m using the Talis platform, so that’s just curl)
Automatically
   indexing
science using
   natural-
  language
                                                       The process
 processing,
  RDF and
  SPARQL

   Andrew
Walkingshaw,
  Nick Day,      • For each feed in a list of feeds:
Peter Corbett,
Jim Downing,     • If it’s supplying CML data, set Golem on each entry, get
     Joe
  Townsend,        the observables out, and turn them into triples; run
    Peter
 Murray-Rust       OSCAR3 over the title and/or abstract
Gathering        • If it’s not, extract the free text from each entry, send it to
data
                   the OSCAR web service, and assign triples based on the
Extracting
(meta)data         chemical entities OSCAR finds
Using the data   • Upload the RDF to your triple store
Thanks
                 • (I’m using the Talis platform, so that’s just curl)
                 • And. . .
Automatically
   indexing
science using
   natural-
  language
                                          SPARQL is great.
 processing,
  RDF and
  SPARQL

   Andrew
Walkingshaw,
  Nick Day,
Peter Corbett,
Jim Downing,     Just post queries at a SPARQL endpoint:
     Joe
  Townsend,      authortemplate=’’’
    Peter
 Murray-Rust     PREFIX dc: <http://purl.org/dc/terms/>
                 PREFIX ce:
Gathering
data             <http://wwmm.ch.cam.ac.uk/crystaleye/dictionary#>
Extracting       DESCRIBE ?file WHERE { ?file dc:contributor
(meta)data

Using the data
                 some author . }
Thanks
                 ’’’
Automatically
   indexing
science using
   natural-
  language
                             SPARQL isn’t (entirely) great.
 processing,
  RDF and
  SPARQL

   Andrew
Walkingshaw,
  Nick Day,
Peter Corbett,
Jim Downing,
     Joe
  Townsend,
    Peter        • Scientists shouldn’t have to know this stuff.
 Murray-Rust


Gathering
data

Extracting
(meta)data

Using the data

Thanks
Automatically
   indexing
science using
   natural-
  language
                             SPARQL isn’t (entirely) great.
 processing,
  RDF and
  SPARQL

   Andrew
Walkingshaw,
  Nick Day,
Peter Corbett,
Jim Downing,
     Joe
  Townsend,
    Peter        • Scientists shouldn’t have to know this stuff.
 Murray-Rust
                 • So we need to build a front end which your average senior
Gathering
data
                   academic might be able to use. . .
Extracting
(meta)data

Using the data

Thanks
Automatically
   indexing
science using
   natural-
  language
                              SPARQL isn’t (entirely) great.
 processing,
  RDF and
  SPARQL

   Andrew
Walkingshaw,
  Nick Day,
Peter Corbett,
Jim Downing,
     Joe
  Townsend,
    Peter        • Scientists shouldn’t have to know this stuff.
 Murray-Rust
                 • So we need to build a front end which your average senior
Gathering
data
                   academic might be able to use. . .
Extracting       • (i.e. it’s got to look like a website.)
(meta)data

Using the data

Thanks
Automatically
   indexing
science using
   natural-
  language
                                  What queries do we want?
 processing,
  RDF and
  SPARQL

   Andrew
Walkingshaw,
  Nick Day,
Peter Corbett,
Jim Downing,
     Joe
  Townsend,      • What experimental data is an author responsible for?
    Peter
 Murray-Rust


Gathering
data

Extracting
(meta)data

Using the data

Thanks
Automatically
   indexing
science using
   natural-
  language
                                  What queries do we want?
 processing,
  RDF and
  SPARQL

   Andrew
Walkingshaw,
  Nick Day,
Peter Corbett,
Jim Downing,
     Joe
  Townsend,      • What experimental data is an author responsible for?
    Peter
 Murray-Rust     • What chemical entities are in some data?
Gathering
data

Extracting
(meta)data

Using the data

Thanks
Automatically
   indexing
science using
   natural-
  language
                                  What queries do we want?
 processing,
  RDF and
  SPARQL

   Andrew
Walkingshaw,
  Nick Day,
Peter Corbett,
Jim Downing,
     Joe
  Townsend,      • What experimental data is an author responsible for?
    Peter
 Murray-Rust     • What chemical entities are in some data?
Gathering        • Where is a given chemical entity talked about?
data

Extracting
(meta)data

Using the data

Thanks
Automatically
   indexing
science using
   natural-
  language
                                  What queries do we want?
 processing,
  RDF and
  SPARQL

   Andrew
Walkingshaw,
  Nick Day,
Peter Corbett,
Jim Downing,
     Joe
  Townsend,      • What experimental data is an author responsible for?
    Peter
 Murray-Rust     • What chemical entities are in some data?
Gathering        • Where is a given chemical entity talked about?
data
                 • So we can build a web app around these queries.
Extracting
(meta)data

Using the data

Thanks
Automatically
   indexing
science using
   natural-
  language
                                  What queries do we want?
 processing,
  RDF and
  SPARQL

   Andrew
Walkingshaw,
  Nick Day,
Peter Corbett,
Jim Downing,
     Joe
  Townsend,      • What experimental data is an author responsible for?
    Peter
 Murray-Rust     • What chemical entities are in some data?
Gathering        • Where is a given chemical entity talked about?
data
                 • So we can build a web app around these queries.
Extracting
(meta)data
                 • django + rdflib + sparql + Talis Platform
Using the data

Thanks
Automatically
   indexing
science using
   natural-
  language
                                   Demo!
 processing,
  RDF and
  SPARQL

   Andrew
Walkingshaw,
  Nick Day,
Peter Corbett,
Jim Downing,
     Joe
  Townsend,
    Peter
 Murray-Rust
                 And here it is.
Gathering
data

Extracting
(meta)data

Using the data

Thanks
Automatically
   indexing
science using
   natural-
  language
                                                    Thanks to. . .
 processing,
  RDF and
  SPARQL

   Andrew
Walkingshaw,
  Nick Day,
Peter Corbett,
Jim Downing,
     Joe
  Townsend,
    Peter
 Murray-Rust     • Talis (http://n2.talis.com/) for access to their platform
Gathering
data

Extracting
(meta)data

Using the data

Thanks
Automatically
   indexing
science using
   natural-
  language
                                                    Thanks to. . .
 processing,
  RDF and
  SPARQL

   Andrew
Walkingshaw,
  Nick Day,
Peter Corbett,
Jim Downing,
     Joe
  Townsend,
    Peter
 Murray-Rust     • Talis (http://n2.talis.com/) for access to their platform
Gathering        • and to the RSC and IUCr for their support of CrystalEye.
data

Extracting
(meta)data

Using the data

Thanks

Más contenido relacionado

La actualidad más candente

Linked Open Data Fundamentals for Libraries, Archives and Museums
Linked Open Data Fundamentals for Libraries, Archives and MuseumsLinked Open Data Fundamentals for Libraries, Archives and Museums
Linked Open Data Fundamentals for Libraries, Archives and Museums
trevorthornton
 
Libraries and Linked Data: Looking to the Future (2)
Libraries and Linked Data: Looking to the Future (2)Libraries and Linked Data: Looking to the Future (2)
Libraries and Linked Data: Looking to the Future (2)
ALATechSource
 
IASSIST identifiers By Joan Starr
IASSIST identifiers By Joan StarrIASSIST identifiers By Joan Starr
IASSIST identifiers By Joan Starr
Carly Strasser
 

La actualidad más candente (20)

Empowering tools for Neuroimaging
Empowering tools for NeuroimagingEmpowering tools for Neuroimaging
Empowering tools for Neuroimaging
 
Exploring the Semantic Web
Exploring the Semantic WebExploring the Semantic Web
Exploring the Semantic Web
 
Publication of raw and curated NMR spectroscopic data for organic molecules
Publication of raw and curated NMR spectroscopic data for organic moleculesPublication of raw and curated NMR spectroscopic data for organic molecules
Publication of raw and curated NMR spectroscopic data for organic molecules
 
Current advances to bridge the usability-expressivity gap in biomedical seman...
Current advances to bridge the usability-expressivity gap in biomedical seman...Current advances to bridge the usability-expressivity gap in biomedical seman...
Current advances to bridge the usability-expressivity gap in biomedical seman...
 
Content Mining of Science and Medicine
Content Mining of Science and MedicineContent Mining of Science and Medicine
Content Mining of Science and Medicine
 
Contributing to the Smart City Through Linked Library Data
Contributing to the Smart City Through Linked Library DataContributing to the Smart City Through Linked Library Data
Contributing to the Smart City Through Linked Library Data
 
PRIDE and ProteomeXchange – Making proteomics data accessible and reusable
PRIDE and ProteomeXchange – Making proteomics data accessible and reusablePRIDE and ProteomeXchange – Making proteomics data accessible and reusable
PRIDE and ProteomeXchange – Making proteomics data accessible and reusable
 
Use of Research (Meta-)Data - Finding researchers in/across organizations -
Use of Research (Meta-)Data  - Finding researchers in/across organizations -Use of Research (Meta-)Data  - Finding researchers in/across organizations -
Use of Research (Meta-)Data - Finding researchers in/across organizations -
 
A Perspective on Archiving the Scholarly Record
A Perspective on Archiving the Scholarly RecordA Perspective on Archiving the Scholarly Record
A Perspective on Archiving the Scholarly Record
 
Sharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsSharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reports
 
Linked Data and Tools
Linked Data and ToolsLinked Data and Tools
Linked Data and Tools
 
Linked Open Data Fundamentals for Libraries, Archives and Museums
Linked Open Data Fundamentals for Libraries, Archives and MuseumsLinked Open Data Fundamentals for Libraries, Archives and Museums
Linked Open Data Fundamentals for Libraries, Archives and Museums
 
NISO Forum, Denver, Sept. 24, 2012: EZID: Easy dataset identification & manag...
NISO Forum, Denver, Sept. 24, 2012: EZID: Easy dataset identification & manag...NISO Forum, Denver, Sept. 24, 2012: EZID: Easy dataset identification & manag...
NISO Forum, Denver, Sept. 24, 2012: EZID: Easy dataset identification & manag...
 
ECCB 2014: Extracting patterns of database and software usage from the bioinf...
ECCB 2014: Extracting patterns of database and software usage from the bioinf...ECCB 2014: Extracting patterns of database and software usage from the bioinf...
ECCB 2014: Extracting patterns of database and software usage from the bioinf...
 
From Data to Knowledge with Workflows & Provenance
From Data to Knowledge with Workflows & ProvenanceFrom Data to Knowledge with Workflows & Provenance
From Data to Knowledge with Workflows & Provenance
 
Text and Data Mining explained at FTDM
Text and Data Mining explained at FTDMText and Data Mining explained at FTDM
Text and Data Mining explained at FTDM
 
Libraries and Linked Data: Looking to the Future (2)
Libraries and Linked Data: Looking to the Future (2)Libraries and Linked Data: Looking to the Future (2)
Libraries and Linked Data: Looking to the Future (2)
 
Consuming Linked Data by Machines - WWW2010
Consuming Linked Data by Machines - WWW2010Consuming Linked Data by Machines - WWW2010
Consuming Linked Data by Machines - WWW2010
 
EZID: Easy Persistent Identifiers and Data Citation
EZID: Easy Persistent Identifiers and Data CitationEZID: Easy Persistent Identifiers and Data Citation
EZID: Easy Persistent Identifiers and Data Citation
 
IASSIST identifiers By Joan Starr
IASSIST identifiers By Joan StarrIASSIST identifiers By Joan Starr
IASSIST identifiers By Joan Starr
 

Destacado (8)

R-Link : Research Content Linked Data Cloud
R-Link : Research Content Linked Data CloudR-Link : Research Content Linked Data Cloud
R-Link : Research Content Linked Data Cloud
 
Indexing languages (2)
Indexing languages (2)Indexing languages (2)
Indexing languages (2)
 
Controlled Vocabulary
Controlled VocabularyControlled Vocabulary
Controlled Vocabulary
 
1. indexing and abstracting
1. indexing and abstracting1. indexing and abstracting
1. indexing and abstracting
 
Chain indexing
Chain indexingChain indexing
Chain indexing
 
Post coordinate indexing .. Library and information science
Post coordinate indexing .. Library and information sciencePost coordinate indexing .. Library and information science
Post coordinate indexing .. Library and information science
 
Introduction to indexing
Introduction to indexingIntroduction to indexing
Introduction to indexing
 
Indexing
IndexingIndexing
Indexing
 

Similar a SemanticCampLondon, 16th February 2008

Similar a SemanticCampLondon, 16th February 2008 (20)

The Materials Project - Combining Science and Informatics to Accelerate Mater...
The Materials Project - Combining Science and Informatics to Accelerate Mater...The Materials Project - Combining Science and Informatics to Accelerate Mater...
The Materials Project - Combining Science and Informatics to Accelerate Mater...
 
Materials Data in the 21st Century: From Mishmash to Moneyball
Materials Data in the 21st Century: From Mishmash to MoneyballMaterials Data in the 21st Century: From Mishmash to Moneyball
Materials Data in the 21st Century: From Mishmash to Moneyball
 
ESWC 2015 Closing and "General Chair's minute of Madness"
ESWC 2015 Closing and "General Chair's minute of Madness"ESWC 2015 Closing and "General Chair's minute of Madness"
ESWC 2015 Closing and "General Chair's minute of Madness"
 
Astromat Update on Developments 2021-01-29
Astromat Update on Developments 2021-01-29Astromat Update on Developments 2021-01-29
Astromat Update on Developments 2021-01-29
 
NISO Forum, Denver, Sept. 24, 2012: Data Equivalence
NISO Forum, Denver, Sept. 24, 2012: Data EquivalenceNISO Forum, Denver, Sept. 24, 2012: Data Equivalence
NISO Forum, Denver, Sept. 24, 2012: Data Equivalence
 
4th Natural Language Interface over the Web of Data (NLIWoD) workshop and QAL...
4th Natural Language Interface over the Web of Data (NLIWoD) workshop and QAL...4th Natural Language Interface over the Web of Data (NLIWoD) workshop and QAL...
4th Natural Language Interface over the Web of Data (NLIWoD) workshop and QAL...
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for Science
 
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data HandlingScott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
 
ACT Talk, Giuseppe Totaro: High Performance Computing for Distributed Indexin...
ACT Talk, Giuseppe Totaro: High Performance Computing for Distributed Indexin...ACT Talk, Giuseppe Totaro: High Performance Computing for Distributed Indexin...
ACT Talk, Giuseppe Totaro: High Performance Computing for Distributed Indexin...
 
The lifecycle of reproducible science data and what provenance has got to do ...
The lifecycle of reproducible science data and what provenance has got to do ...The lifecycle of reproducible science data and what provenance has got to do ...
The lifecycle of reproducible science data and what provenance has got to do ...
 
20140521 sem-tech-biz-guest-lecture
20140521 sem-tech-biz-guest-lecture20140521 sem-tech-biz-guest-lecture
20140521 sem-tech-biz-guest-lecture
 
Scott Edmunds: Revolutionizing Data Dissemination: GigaScience
Scott Edmunds: Revolutionizing Data Dissemination: GigaScienceScott Edmunds: Revolutionizing Data Dissemination: GigaScience
Scott Edmunds: Revolutionizing Data Dissemination: GigaScience
 
Why Data Science Matters - 2014 WDS Data Stewardship Award Lecture
Why Data Science Matters - 2014 WDS Data Stewardship Award LectureWhy Data Science Matters - 2014 WDS Data Stewardship Award Lecture
Why Data Science Matters - 2014 WDS Data Stewardship Award Lecture
 
Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"
 
Dark Data In the Long Tail of Science:   Examples in Biology
Dark Data In the Long Tail of Science:  Examples in BiologyDark Data In the Long Tail of Science:  Examples in Biology
Dark Data In the Long Tail of Science:   Examples in Biology
 
NaturalMSEQueries_presICWI2023.pdf
NaturalMSEQueries_presICWI2023.pdfNaturalMSEQueries_presICWI2023.pdf
NaturalMSEQueries_presICWI2023.pdf
 
Sharing data
Sharing dataSharing data
Sharing data
 
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences research
 
2016 07 12_purdue_bigdatainomics_seandavis
2016 07 12_purdue_bigdatainomics_seandavis2016 07 12_purdue_bigdatainomics_seandavis
2016 07 12_purdue_bigdatainomics_seandavis
 
DNA Storage at AGBT 2018
DNA Storage at AGBT 2018DNA Storage at AGBT 2018
DNA Storage at AGBT 2018
 

Último

The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 

Último (20)

How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 

SemanticCampLondon, 16th February 2008

  • 1. Automatically indexing science using natural- language processing, RDF and SPARQL Andrew Automatically indexing science using Walkingshaw, Nick Day, Peter Corbett, natural-language processing, RDF and Jim Downing, Joe SPARQL Townsend, Peter Murray-Rust Gathering Andrew Walkingshaw, Nick Day, Peter Corbett, Jim data Downing, Joe Townsend, Peter Murray-Rust Extracting (meta)data Using the data Thanks February 16, 2008
  • 2. Automatically indexing science using natural- language Data sources processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, • Supplemental and experimental data Peter Murray-Rust Gathering data Extracting (meta)data Using the data Thanks
  • 3. Automatically indexing science using natural- language Data sources processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, • Supplemental and experimental data Peter Murray-Rust • Journals Gathering data Extracting (meta)data Using the data Thanks
  • 4. Automatically indexing science using natural- language Data sources processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, • Supplemental and experimental data Peter Murray-Rust • Journals Gathering • Self-archived papers (e.g. arXiv) data Extracting (meta)data Using the data Thanks
  • 5. Automatically indexing science using natural- language Data sources processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, • Supplemental and experimental data Peter Murray-Rust • Journals Gathering • Self-archived papers (e.g. arXiv) data • Mainstream journalism Extracting (meta)data Using the data Thanks
  • 6. Automatically indexing science using natural- language Data sources processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, • Supplemental and experimental data Peter Murray-Rust • Journals Gathering • Self-archived papers (e.g. arXiv) data • Mainstream journalism Extracting (meta)data • Blogs Using the data Thanks
  • 7. Automatically indexing science using natural- language Supplemental data: CrystalEye processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, Peter Murray-Rust • http://wwmm.ch.cam.ac.uk/crystaleye/ Gathering data Extracting (meta)data Using the data Thanks
  • 8. Automatically indexing science using natural- language Supplemental data: CrystalEye processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, Peter Murray-Rust • http://wwmm.ch.cam.ac.uk/crystaleye/ Gathering • Repository for crystallographic data data Extracting (meta)data Using the data Thanks
  • 9. Automatically indexing science using natural- language Journals and arXiv processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, Peter Murray-Rust • “Traditional” journal articles Gathering data Extracting (meta)data Using the data Thanks
  • 10. Automatically indexing science using natural- language Journals and arXiv processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, Peter Murray-Rust • “Traditional” journal articles Gathering • Titles and abstracts. . . data Extracting (meta)data Using the data Thanks
  • 11. Automatically indexing science using natural- language Journalism and blogs processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, Peter Murray-Rust • Unstructured text with little semantics; Gathering data Extracting (meta)data Using the data Thanks
  • 12. Automatically indexing science using natural- language Journalism and blogs processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, Peter Murray-Rust • Unstructured text with little semantics; Gathering • . . . hence Google Scholar, Web of Science, etc. data Extracting (meta)data Using the data Thanks
  • 13. Automatically indexing science using natural- language Semi-structured data: Golem processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, • We’ve got a lot of chemical data as CML Peter Corbett, Jim Downing, Joe Townsend, Peter Murray-Rust Gathering data Extracting (meta)data Using the data Thanks
  • 14. Automatically indexing science using natural- language Semi-structured data: Golem processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, • We’ve got a lot of chemical data as CML Peter Corbett, Jim Downing, Joe • http://en.wikipedia.org/wiki/Chemical Markup Language Townsend, Peter Murray-Rust Gathering data Extracting (meta)data Using the data Thanks
  • 15. Automatically indexing science using natural- language Semi-structured data: Golem processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, • We’ve got a lot of chemical data as CML Peter Corbett, Jim Downing, Joe • http://en.wikipedia.org/wiki/Chemical Markup Language Townsend, Peter • . . . but we still need to get data out of that and into a Murray-Rust more useful form Gathering data Extracting (meta)data Using the data Thanks
  • 16. Automatically indexing science using natural- language Semi-structured data: Golem processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, • We’ve got a lot of chemical data as CML Peter Corbett, Jim Downing, Joe • http://en.wikipedia.org/wiki/Chemical Markup Language Townsend, Peter • . . . but we still need to get data out of that and into a Murray-Rust more useful form Gathering data • hence Golem: http://www.lexical.org.uk/science/golem/ Extracting (meta)data Using the data Thanks
  • 17. Automatically indexing science using natural- language Semi-structured data: Golem processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, • We’ve got a lot of chemical data as CML Peter Corbett, Jim Downing, Joe • http://en.wikipedia.org/wiki/Chemical Markup Language Townsend, Peter • . . . but we still need to get data out of that and into a Murray-Rust more useful form Gathering data • hence Golem: http://www.lexical.org.uk/science/golem/ Extracting • GRDDLish strategy for extracting data from CML files: (meta)data Using the data identify dialect-specific concepts with XPath expressions Thanks and XSLT stylesheets
  • 18. Automatically indexing science using natural- language Semi-structured data: Golem processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, • We’ve got a lot of chemical data as CML Peter Corbett, Jim Downing, Joe • http://en.wikipedia.org/wiki/Chemical Markup Language Townsend, Peter • . . . but we still need to get data out of that and into a Murray-Rust more useful form Gathering data • hence Golem: http://www.lexical.org.uk/science/golem/ Extracting • GRDDLish strategy for extracting data from CML files: (meta)data Using the data identify dialect-specific concepts with XPath expressions Thanks and XSLT stylesheets • upshot: we can extract JSON objects from CML files.
  • 19. Automatically indexing science using natural- language Free text: OSCAR3 processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, • http://oscar3-chem.sourceforge.net/ Joe Townsend, Peter Murray-Rust Gathering data Extracting (meta)data Using the data Thanks
  • 20. Automatically indexing science using natural- language Free text: OSCAR3 processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, • http://oscar3-chem.sourceforge.net/ Joe Townsend, • Natural-language parser for documents about chemistry Peter Murray-Rust Gathering data Extracting (meta)data Using the data Thanks
  • 21. Automatically indexing science using natural- language Free text: OSCAR3 processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, • http://oscar3-chem.sourceforge.net/ Joe Townsend, • Natural-language parser for documents about chemistry Peter Murray-Rust • Dark magic: don’t ask me how it works! Gathering data Extracting (meta)data Using the data Thanks
  • 22. Automatically indexing science using natural- language Free text: OSCAR3 processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, • http://oscar3-chem.sourceforge.net/ Joe Townsend, • Natural-language parser for documents about chemistry Peter Murray-Rust • Dark magic: don’t ask me how it works! Gathering • . . . but it can be run as a Jetty webservice so as long as it data Extracting does, I’m happy (meta)data Using the data Thanks
  • 23. Automatically indexing science using natural- language Free text: OSCAR3 processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, • http://oscar3-chem.sourceforge.net/ Joe Townsend, • Natural-language parser for documents about chemistry Peter Murray-Rust • Dark magic: don’t ask me how it works! Gathering • . . . but it can be run as a Jetty webservice so as long as it data Extracting does, I’m happy (meta)data • Author’s blog: Using the data http://wwmm.ch.cam.ac.uk/blogs/corbett/ Thanks
  • 24. Automatically indexing science using natural- language Getting the data in processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, Peter Murray-Rust • Everything (more or less) talks RSS nowadays. . . Gathering data Extracting (meta)data Using the data Thanks
  • 25. Automatically indexing science using natural- language Getting the data in processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, Peter Murray-Rust • Everything (more or less) talks RSS nowadays. . . • RSS 0.91, RSS 1.0 (which one?), Atom, etc etc etc. Gathering data Extracting (meta)data Using the data Thanks
  • 26. Automatically indexing science using natural- language Getting the data in processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, Peter Murray-Rust • Everything (more or less) talks RSS nowadays. . . • RSS 0.91, RSS 1.0 (which one?), Atom, etc etc etc. Gathering data • Thankfully: feedparser (http://feedparser.org/) Extracting (meta)data Using the data Thanks
  • 27. Automatically indexing science using natural- language Serializing metadata processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe • RDF – using: Townsend, Peter Murray-Rust Gathering data Extracting (meta)data Using the data Thanks
  • 28. Automatically indexing science using natural- language Serializing metadata processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe • RDF – using: Townsend, Peter • Dublin Core terms Murray-Rust Gathering data Extracting (meta)data Using the data Thanks
  • 29. Automatically indexing science using natural- language Serializing metadata processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe • RDF – using: Townsend, Peter • Dublin Core terms Murray-Rust • A homebrew ontology based on the IUCr’s CIF data format Gathering data Extracting (meta)data Using the data Thanks
  • 30. Automatically indexing science using natural- language Serializing metadata processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe • RDF – using: Townsend, Peter • Dublin Core terms Murray-Rust • A homebrew ontology based on the IUCr’s CIF data format Gathering data • and another homebrew ontology for OSCAR annotations Extracting (meta)data Using the data Thanks
  • 31. Automatically indexing science using natural- language Serializing metadata processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe • RDF – using: Townsend, Peter • Dublin Core terms Murray-Rust • A homebrew ontology based on the IUCr’s CIF data format Gathering data • and another homebrew ontology for OSCAR annotations Extracting (meta)data • (it’d be good to standardise these, but to be honest, not Using the data many people are doing this sort of thing) Thanks
  • 32. Automatically indexing science using natural- language The process processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, • For each feed in a list of feeds: Peter Corbett, Jim Downing, Joe Townsend, Peter Murray-Rust Gathering data Extracting (meta)data Using the data Thanks
  • 33. Automatically indexing science using natural- language The process processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, • For each feed in a list of feeds: Peter Corbett, Jim Downing, • If it’s supplying CML data, set Golem on each entry, get Joe Townsend, the observables out, and turn them into triples; run Peter Murray-Rust OSCAR3 over the title and/or abstract Gathering data Extracting (meta)data Using the data Thanks
  • 34. Automatically indexing science using natural- language The process processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, • For each feed in a list of feeds: Peter Corbett, Jim Downing, • If it’s supplying CML data, set Golem on each entry, get Joe Townsend, the observables out, and turn them into triples; run Peter Murray-Rust OSCAR3 over the title and/or abstract Gathering • If it’s not, extract the free text from each entry, send it to data the OSCAR web service, and assign triples based on the Extracting (meta)data chemical entities OSCAR finds Using the data Thanks
  • 35. Automatically indexing science using natural- language The process processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, • For each feed in a list of feeds: Peter Corbett, Jim Downing, • If it’s supplying CML data, set Golem on each entry, get Joe Townsend, the observables out, and turn them into triples; run Peter Murray-Rust OSCAR3 over the title and/or abstract Gathering • If it’s not, extract the free text from each entry, send it to data the OSCAR web service, and assign triples based on the Extracting (meta)data chemical entities OSCAR finds Using the data • Upload the RDF to your triple store Thanks
  • 36. Automatically indexing science using natural- language The process processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, • For each feed in a list of feeds: Peter Corbett, Jim Downing, • If it’s supplying CML data, set Golem on each entry, get Joe Townsend, the observables out, and turn them into triples; run Peter Murray-Rust OSCAR3 over the title and/or abstract Gathering • If it’s not, extract the free text from each entry, send it to data the OSCAR web service, and assign triples based on the Extracting (meta)data chemical entities OSCAR finds Using the data • Upload the RDF to your triple store Thanks • (I’m using the Talis platform, so that’s just curl)
  • 37. Automatically indexing science using natural- language The process processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, • For each feed in a list of feeds: Peter Corbett, Jim Downing, • If it’s supplying CML data, set Golem on each entry, get Joe Townsend, the observables out, and turn them into triples; run Peter Murray-Rust OSCAR3 over the title and/or abstract Gathering • If it’s not, extract the free text from each entry, send it to data the OSCAR web service, and assign triples based on the Extracting (meta)data chemical entities OSCAR finds Using the data • Upload the RDF to your triple store Thanks • (I’m using the Talis platform, so that’s just curl) • And. . .
  • 38. Automatically indexing science using natural- language SPARQL is great. processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Just post queries at a SPARQL endpoint: Joe Townsend, authortemplate=’’’ Peter Murray-Rust PREFIX dc: <http://purl.org/dc/terms/> PREFIX ce: Gathering data <http://wwmm.ch.cam.ac.uk/crystaleye/dictionary#> Extracting DESCRIBE ?file WHERE { ?file dc:contributor (meta)data Using the data some author . } Thanks ’’’
  • 39. Automatically indexing science using natural- language SPARQL isn’t (entirely) great. processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, Peter • Scientists shouldn’t have to know this stuff. Murray-Rust Gathering data Extracting (meta)data Using the data Thanks
  • 40. Automatically indexing science using natural- language SPARQL isn’t (entirely) great. processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, Peter • Scientists shouldn’t have to know this stuff. Murray-Rust • So we need to build a front end which your average senior Gathering data academic might be able to use. . . Extracting (meta)data Using the data Thanks
  • 41. Automatically indexing science using natural- language SPARQL isn’t (entirely) great. processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, Peter • Scientists shouldn’t have to know this stuff. Murray-Rust • So we need to build a front end which your average senior Gathering data academic might be able to use. . . Extracting • (i.e. it’s got to look like a website.) (meta)data Using the data Thanks
  • 42. Automatically indexing science using natural- language What queries do we want? processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, • What experimental data is an author responsible for? Peter Murray-Rust Gathering data Extracting (meta)data Using the data Thanks
  • 43. Automatically indexing science using natural- language What queries do we want? processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, • What experimental data is an author responsible for? Peter Murray-Rust • What chemical entities are in some data? Gathering data Extracting (meta)data Using the data Thanks
  • 44. Automatically indexing science using natural- language What queries do we want? processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, • What experimental data is an author responsible for? Peter Murray-Rust • What chemical entities are in some data? Gathering • Where is a given chemical entity talked about? data Extracting (meta)data Using the data Thanks
  • 45. Automatically indexing science using natural- language What queries do we want? processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, • What experimental data is an author responsible for? Peter Murray-Rust • What chemical entities are in some data? Gathering • Where is a given chemical entity talked about? data • So we can build a web app around these queries. Extracting (meta)data Using the data Thanks
  • 46. Automatically indexing science using natural- language What queries do we want? processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, • What experimental data is an author responsible for? Peter Murray-Rust • What chemical entities are in some data? Gathering • Where is a given chemical entity talked about? data • So we can build a web app around these queries. Extracting (meta)data • django + rdflib + sparql + Talis Platform Using the data Thanks
  • 47. Automatically indexing science using natural- language Demo! processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, Peter Murray-Rust And here it is. Gathering data Extracting (meta)data Using the data Thanks
  • 48. Automatically indexing science using natural- language Thanks to. . . processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, Peter Murray-Rust • Talis (http://n2.talis.com/) for access to their platform Gathering data Extracting (meta)data Using the data Thanks
  • 49. Automatically indexing science using natural- language Thanks to. . . processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, Peter Murray-Rust • Talis (http://n2.talis.com/) for access to their platform Gathering • and to the RSC and IUCr for their support of CrystalEye. data Extracting (meta)data Using the data Thanks