SlideShare una empresa de Scribd logo
1 de 35
Descargar para leer sin conexión
PaperMaker: Validation of biomedical scientific
publications


January 19th, 2011

Workshop: „BeyondThePdf“
Dietrich Rebholz-Schuhmann, MD, PhD
Group Leader Rebholz Group
European Bioinformatics Institute
Publishing is about …

    • ... Agreeing / disagreeing about current science
           • Only peer review can judge current science
    • ... Bringing new results
           • Conceptual results are more difficult than new data
    • ... Gaining new knowledge
           • New data and new results can imply new knowledge where even
             the author is still unaware of
    • ... Rewarding the scientist
           • Count whatever you can count that could have an impact.
           • Validating the scientist’s claim is the key reward.
           • Any scientist can fool any system, but (hopefully) only short-term



2   20.01.2011                       Literature and Text Mining
                               BioCreative III, Rebholz
Future of biomedical text mining

    Working towards ...

    • ... Literature integration
           • to have it full fledged as part of bioinformatics data resources
    • ... Cross-domain support
           • to deliver the content to different scientific communities.
    • ... Provenance
           • to carry credit of findings into analytical biomedical research
    • ... Inference & Reasoning
           • to make use of the full semantic support in the scientific literature




3   20.01.2011                       Literature and Text Mining
                               BioCreative III, Rebholz
Literature content in the Semantic Web




4   20.01.2011        Literature and Text Mining
Terminologies vs. Ontologies




                                                      Ontological resources
    Database type Resource building
                                                      Explicit semantics
    Terminologies, collection of terms
                                                      Manual generation
    Automatic generation
                                                      Consistency, inference, reasoning
    Exploitation of terminological features
                                                      Interoperability with all semantic
    Standardisation of TM solutions                   resources
    Interoperability with database                    Working towards a reasoning
    resources                                         infrastructure

5                                        Literature and Text Mining
Efforts in the Rebholz group towards
    interoperability of literature with bioinformatics
    •    Whatizit infrastructure
           •     Biomedical NER as a public, large-scale service
    •    LexEBI / BioLexicon (collab. w. NaCTeM, Pisa-U)
           •     Biomedical terminological resource, standardisation of semantics
    •    IeXML (BioLink SIG 2006, Brasil)
           •     Put the annotations into the document (inline annotations)
    •    CALBC project
           •     Collaborative annotation of a large-scale biomedical corpus
    •    UKPMC: U.K. Pubmed Central (collab. w. NaCTeM, BL)
           •     Use of Whatizit, BioLexicon, IeXML, CALBC alignments for the delivery of quality
                 annotation services to the public
    •    SESL project
           •     Joint project with pharma & publishers, literature content in a triple store
    •    PaperMaker
           •     Validation of the scientific literature against the above


6   20.01.2011                                Literature and Text Mining
                                        BioCreative III, Rebholz
1
                 Whatizit
7   20.01.2011          Literature and Text Mining
                  BioCreative III, Rebholz
Integrating biomedical literature and data
                                                    Rebholz-Schuhmann, D., et
                                                    al. Text Processing through
                                                    Web Services: Calling
                                                    Whatizit. Bioinformatics 24,
                                                    no. 2 (2008): 296-98.




8   20.01.2011         Literature and Text Mining
2
                 BioLexicon
                   LexEBI
9   20.01.2011           Literature and Text Mining
                   BioCreative III, Rebholz
LexEBI: content
                                  # Labels # Variants        Total        Total / # Unique Uniq. T. /
                                                                          Labels    terms   Labels
      Prot.
      Gene




                   GP 7.0          516,113   4,005,040     4,521,153         8.76 1,726,853      3.35
        /




                   GP 6.0          488,577   3,389,316     3,877,893         7.94 1,564,436      3.20
                   Jochem          278,578   1,691,980     1,970,558         7.07 1,527,752      5.48
        Chemi-
         cals




                   ChEBI            19,645      94,748       114,393         5.82 101,307        5.16
                   ChEBI (all)     549,838   1,187,322     1,737,160         3.16
                   Enzymes           4,905       8,082        12,987         2.65    12,377      2.52
           Other




                   Species         643,280     199,130       842,410         1.31 838,135        1.30
                   Interpro         20,671           0        20,671         1.00    20,671      1.00
                   Antineuro.,       4,718       6,488        11,206         2.38
                   Neo
                   Bio. Act.        54,148      87,209       141,357         2.61
           UMLS




                   Enzymes          26,065      56,332        82,397         3.16
                   Lipid, Carb.     11,518       9,770        21,288         1.85
                   Pharm. Act.     104,201     123,840       228,041         2.19
                   Vit., Horm.       6,877      10,258        17,135         2.49

10   20.01.2011                              Literature and Text Mining
3
                  IeXML
11   20.01.2011         Literature and Text Mining
                  BioCreative III, Rebholz
IeXML: Annotating entities in text


     • Inline annotations to any part of the document with the
       annotations
     • No hassle with character or byte counts or layout
       modifications to the document
     • “Alignment” of annotated documtents to
           • Compare annotations
           • Validate annotations
           • Harmonise annotations (SESL project)




12   20.01.2011                     Literature and Text Mining
                              BioCreative III, Rebholz
4
                  CALBC
13   20.01.2011         Literature and Text Mining
                  BioCreative III, Rebholz
The challenge
                                           150,000 documents
                                           or more ...




                                            Test set for all systems
                                            Assessment, benchmarking


14   20.01.2011         Literature and Text Mining
                  BioCreative III, Rebholz
CALBC Challenge II


(1) 75,000 documents training data
(2) 175,000 testing data
(3) Additional 700,000 testing data

•    September 13th 2010:
     Second harmonized corpus available for CALBC
     Challenge II
•    December 15th, 2010: Challenge II closes
•    March 2011: CALBC Workshop II
•    June 30th, 2011:
     Final harmonized corpus available

                           Literature and Text Mining
                     BioCreative III, Rebholz
5
     Ukpmc/Elixir
16   20.01.2011         Literature and Text Mining
                  BioCreative III, Rebholz
17   20.01.2011         Literature and Text Mining
                  BioCreative III, Rebholz
UKPMC




                  ~ 10 % the size of PubMed
18   20.01.2011             Literature and Text Mining
                      BioCreative III, Rebholz
6
                  sesl
19   20.01.2011         Literature and Text Mining
                  BioCreative III, Rebholz
SESL Project: from publisher to pharma
                                                                                    Multiple
                                                                                    Consumers

                                  Disease                                           Knowledge
                                  Dossier                                           Applications

                  Service Layer (RDF, Web 2.0)                        Std Public
Open                                                                                 Common
                  Assertions, SPARQL, Triple Store                   Vocabularies
Stan-                                                                                Service
                  Integration, Inference, Reasoning                   Business
dards                                                                                Broker
                  Sharing of data                                      Rules

                                                                                      Content
                                                                                      Suppliers




20   20.01.2011                         Literature20
                                                   and Text Mining
Literature content in the Semantic Web




21   20.01.2011        Literature and Text Mining
7
      Papermaker
22   20.01.2011         Literature and Text Mining
                  BioCreative III, Rebholz
PaperMaker - Overview

• Inte
• PaperMaker - a tool to support authors writing biomedical
  papers:
• Interactive feedback on the contents of papers (related
  work and concept annotations)
• Formal consistency criteria checking (spelling,
  terminology, acronyms, references)




30.03.2009               Literature and Text Mining
                   BioCreative III, Rebholz
Consistency parameters

Domain-independent


•    General spelling and grammar
•    General readability
•    Appropriate use of references
•    Finding and acknowledging related work




30.03.2009                 Literature and Text Mining
                     BioCreative III, Rebholz
Consistence parameters

Domain-specific

• The use of terminology:

       • Should be consistent with naming domain-specific guidelines
       • Should not be ambiguous
       • Should conform to the conventional usage (possible clashes
         between naming guidelines and common-sense convention)
       • Useful to resolve terminology to reference databases (e. g.
         UniProt for protein names, ChEBI chemical entities, etc.)
       • The special case of acronyms




30.03.2009                      Literature and Text Mining
                          BioCreative III, Rebholz
Content feedback

• Resolving the contents to literature repositories
       • Finding related work (document retrieval)
       • Finding related ideas (passage retrieval)

• Resolving the contents to ontological reference
  databases
       • MeSH descriptors have been demonstrated to improve
         biomedical information retrieval. Can we suggest MeSH terms
         directly to the authors?
       • Gene Ontology (GO) terms are increasingly used in information
         extraction systems.




30.03.2009                      Literature and Text Mining
                          BioCreative III, Rebholz
PaperMaker workflow




30.03.2009         Literature and Text Mining
             BioCreative III, Rebholz
Literature and Text Mining
Literature and Text Mining
Literature and Text Mining
Literature and Text Mining
Conclusions

• PaperMaker can help the author conform to the formal
  requirements of paper writing with special emphasis on
  the domain

• It also provides feedback on the contents by relating it to
  reference resources and literature repositories

• It may improve the indexing of a paper in literature
  repositories (less ambiguous terminology)

• http://www.ebi.ac.uk/Rebholz-srv/PaperMaker
  Work in progress 



30.03.2009                  Literature and Text Mining
                      BioCreative III, Rebholz
8
                  Summary

33   20.01.2011          Literature and Text Mining
                   BioCreative III, Rebholz
Efforts in the Rebholz group towards
     interoperability of literature with bioinformatics
     •    Whatizit infrastructure
            •     Biomedical NER as a public, large-scale service
     •    LexEBI / BioLexicon (collab. w. NaCTeM, Pisa-U)
            •     Biomedical terminological resource, standardisation of semantics
     •    IeXML (BioLink SIG 2006, Brasil)
            •     Put the annotations into the document (inline annotations)
     •    CALBC project
            •     Collaborative annotation of a large-scale biomedical corpus
     •    UKPMC: U.K. Pubmed Central (collab. w. NaCTeM, BL)
            •     Use of Whatizit, BioLexicon, IeXML, CALBC alignments for the delivery of quality
                  annotation services to the public
     •    SESL project
            •     Joint project with pharma & publishers, literature content in a triple store
     •    PaperMaker
            •     Validation of the scientific literature against the above


34   20.01.2011                                Literature and Text Mining
                                         BioCreative III, Rebholz
Literature and Text Mining
BioCreative III, Rebholz

Más contenido relacionado

Destacado

constellation energy Q2 2008 Earnings Presentation 2008 Second Quarter
constellation energy Q2 2008 Earnings Presentation 2008 Second Quarterconstellation energy Q2 2008 Earnings Presentation 2008 Second Quarter
constellation energy Q2 2008 Earnings Presentation 2008 Second Quarterfinance12
 
So You Think Your Having a Bad Day
So You Think Your Having a Bad DaySo You Think Your Having a Bad Day
So You Think Your Having a Bad Dayninedots
 
7 Steps To LinkedIn Enlightenment
7 Steps To LinkedIn Enlightenment7 Steps To LinkedIn Enlightenment
7 Steps To LinkedIn EnlightenmentMark Logan
 
Pest Photo Presentation
Pest Photo PresentationPest Photo Presentation
Pest Photo Presentationmwoodring
 
goodyear Annual Report 2000
goodyear Annual Report 2000goodyear Annual Report 2000
goodyear Annual Report 2000finance12
 
tesoro 2005 Q1
tesoro 2005 Q1tesoro 2005 Q1
tesoro 2005 Q1finance12
 
constellation energy 2006 10K
constellation energy 2006 10K constellation energy 2006 10K
constellation energy 2006 10K finance12
 
Telecom Business Advisory Initial Meeting
Telecom Business Advisory   Initial MeetingTelecom Business Advisory   Initial Meeting
Telecom Business Advisory Initial Meetingkevin_m_watson
 
Data centre strategies in consideration of climate change
Data centre strategies in consideration of climate changeData centre strategies in consideration of climate change
Data centre strategies in consideration of climate changeSimon Perry
 
goodyear 10Q Reports1Q'06 10-Q
goodyear 10Q Reports1Q'06 10-Qgoodyear 10Q Reports1Q'06 10-Q
goodyear 10Q Reports1Q'06 10-Qfinance12
 
goodyear 8K Reports 02/27/08
goodyear 8K Reports 02/27/08goodyear 8K Reports 02/27/08
goodyear 8K Reports 02/27/08finance12
 
Lifehacking presentatie personal branding
Lifehacking presentatie personal brandingLifehacking presentatie personal branding
Lifehacking presentatie personal brandingKees Romkes
 
constellation energy 2008 Fourth Quarter Supporting Materials
constellation energy 2008 Fourth Quarter Supporting Materialsconstellation energy 2008 Fourth Quarter Supporting Materials
constellation energy 2008 Fourth Quarter Supporting Materialsfinance12
 
goodyear 10Q Reports12B-25 -
goodyear 10Q Reports12B-25  - goodyear 10Q Reports12B-25  -
goodyear 10Q Reports12B-25 - finance12
 
constellation energy 2007 Fourth Quarter Form 10-K
constellation energy 2007 Fourth Quarter  	Form 10-Kconstellation energy 2007 Fourth Quarter  	Form 10-K
constellation energy 2007 Fourth Quarter Form 10-Kfinance12
 
goodyear 8K Reports 04/11/08
goodyear 8K Reports 04/11/08goodyear 8K Reports 04/11/08
goodyear 8K Reports 04/11/08finance12
 
WordPress Meetup (Davie, FL) - Top 9 April 2016
WordPress Meetup (Davie, FL) - Top 9 April 2016WordPress Meetup (Davie, FL) - Top 9 April 2016
WordPress Meetup (Davie, FL) - Top 9 April 2016David Bisset
 
international paper 2008 Proxy Statement
international paper 2008 Proxy Statementinternational paper 2008 Proxy Statement
international paper 2008 Proxy Statementfinance12
 

Destacado (20)

OECs Vernal Pool Program D. Celebrezze
OECs Vernal Pool Program  D. CelebrezzeOECs Vernal Pool Program  D. Celebrezze
OECs Vernal Pool Program D. Celebrezze
 
constellation energy Q2 2008 Earnings Presentation 2008 Second Quarter
constellation energy Q2 2008 Earnings Presentation 2008 Second Quarterconstellation energy Q2 2008 Earnings Presentation 2008 Second Quarter
constellation energy Q2 2008 Earnings Presentation 2008 Second Quarter
 
So You Think Your Having a Bad Day
So You Think Your Having a Bad DaySo You Think Your Having a Bad Day
So You Think Your Having a Bad Day
 
Ennio Morricone
Ennio MorriconeEnnio Morricone
Ennio Morricone
 
7 Steps To LinkedIn Enlightenment
7 Steps To LinkedIn Enlightenment7 Steps To LinkedIn Enlightenment
7 Steps To LinkedIn Enlightenment
 
Pest Photo Presentation
Pest Photo PresentationPest Photo Presentation
Pest Photo Presentation
 
goodyear Annual Report 2000
goodyear Annual Report 2000goodyear Annual Report 2000
goodyear Annual Report 2000
 
tesoro 2005 Q1
tesoro 2005 Q1tesoro 2005 Q1
tesoro 2005 Q1
 
constellation energy 2006 10K
constellation energy 2006 10K constellation energy 2006 10K
constellation energy 2006 10K
 
Telecom Business Advisory Initial Meeting
Telecom Business Advisory   Initial MeetingTelecom Business Advisory   Initial Meeting
Telecom Business Advisory Initial Meeting
 
Data centre strategies in consideration of climate change
Data centre strategies in consideration of climate changeData centre strategies in consideration of climate change
Data centre strategies in consideration of climate change
 
goodyear 10Q Reports1Q'06 10-Q
goodyear 10Q Reports1Q'06 10-Qgoodyear 10Q Reports1Q'06 10-Q
goodyear 10Q Reports1Q'06 10-Q
 
goodyear 8K Reports 02/27/08
goodyear 8K Reports 02/27/08goodyear 8K Reports 02/27/08
goodyear 8K Reports 02/27/08
 
Lifehacking presentatie personal branding
Lifehacking presentatie personal brandingLifehacking presentatie personal branding
Lifehacking presentatie personal branding
 
constellation energy 2008 Fourth Quarter Supporting Materials
constellation energy 2008 Fourth Quarter Supporting Materialsconstellation energy 2008 Fourth Quarter Supporting Materials
constellation energy 2008 Fourth Quarter Supporting Materials
 
goodyear 10Q Reports12B-25 -
goodyear 10Q Reports12B-25  - goodyear 10Q Reports12B-25  -
goodyear 10Q Reports12B-25 -
 
constellation energy 2007 Fourth Quarter Form 10-K
constellation energy 2007 Fourth Quarter  	Form 10-Kconstellation energy 2007 Fourth Quarter  	Form 10-K
constellation energy 2007 Fourth Quarter Form 10-K
 
goodyear 8K Reports 04/11/08
goodyear 8K Reports 04/11/08goodyear 8K Reports 04/11/08
goodyear 8K Reports 04/11/08
 
WordPress Meetup (Davie, FL) - Top 9 April 2016
WordPress Meetup (Davie, FL) - Top 9 April 2016WordPress Meetup (Davie, FL) - Top 9 April 2016
WordPress Meetup (Davie, FL) - Top 9 April 2016
 
international paper 2008 Proxy Statement
international paper 2008 Proxy Statementinternational paper 2008 Proxy Statement
international paper 2008 Proxy Statement
 

Similar a PaperMaker, BeyondThePdf, RebholzSchuhmann, 19Jan2011

How Bio Ontologies Enable Open Science
How Bio Ontologies Enable Open ScienceHow Bio Ontologies Enable Open Science
How Bio Ontologies Enable Open Sciencedrnigam
 
Introduction to EOL.org for scientists
Introduction to EOL.org for scientistsIntroduction to EOL.org for scientists
Introduction to EOL.org for scientistsCyndy Parr
 
Bio-ontologies in bioinformatics: Growing up challenges
Bio-ontologies in bioinformatics: Growing up challengesBio-ontologies in bioinformatics: Growing up challenges
Bio-ontologies in bioinformatics: Growing up challengesJanna Hastings
 
Biodiversity Virtual e-Laboratory (BioVeL)
Biodiversity Virtual e-Laboratory (BioVeL)Biodiversity Virtual e-Laboratory (BioVeL)
Biodiversity Virtual e-Laboratory (BioVeL)Alex Hardisty
 
ICBO 2018 Poster - Current Development in the Evidence and Conclusion Ontolog...
ICBO 2018 Poster - Current Development in the Evidence and Conclusion Ontolog...ICBO 2018 Poster - Current Development in the Evidence and Conclusion Ontolog...
ICBO 2018 Poster - Current Development in the Evidence and Conclusion Ontolog...dolleyj
 
Introduction to Ontologies for Environmental Biology
Introduction to Ontologies for Environmental BiologyIntroduction to Ontologies for Environmental Biology
Introduction to Ontologies for Environmental BiologyBarry Smith
 
BHL @ #TDWG09 - with discussion
BHL @ #TDWG09 - with discussionBHL @ #TDWG09 - with discussion
BHL @ #TDWG09 - with discussionChris Freeland
 
Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Inte...
Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Inte...Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Inte...
Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Inte...Amit Sheth
 
Advanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchAdvanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchEuropean Bioinformatics Institute
 
Biodiversity Heritiage Library: progress and process
Biodiversity Heritiage Library: progress and processBiodiversity Heritiage Library: progress and process
Biodiversity Heritiage Library: progress and processPhil Cryer
 
Biodiversity Heritage Library: Content liberator
Biodiversity Heritage Library: Content liberatorBiodiversity Heritage Library: Content liberator
Biodiversity Heritage Library: Content liberatorKeri Thompson
 
Introduction to bioinformatics
Introduction to bioinformaticsIntroduction to bioinformatics
Introduction to bioinformaticsMakarand Bhale
 
Experience with MarkLogic at Elsevier
Experience with MarkLogic at ElsevierExperience with MarkLogic at Elsevier
Experience with MarkLogic at ElsevierDATAVERSITY
 
ONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATION
ONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATIONONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATION
ONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATIONIJwest
 
ONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATION
ONTOLOGY SERVICE CENTER: A DATAHUB FOR  ONTOLOGY APPLICATION ONTOLOGY SERVICE CENTER: A DATAHUB FOR  ONTOLOGY APPLICATION
ONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATION dannyijwest
 
Ontology Web Services for Semantic Applications
Ontology Web Services for Semantic Applications Ontology Web Services for Semantic Applications
Ontology Web Services for Semantic Applications Trish Whetzel
 
Digitizing Entomology: The Biodiversity Heritage Library @ the Smithsonian
Digitizing Entomology: The Biodiversity Heritage Library @ the SmithsonianDigitizing Entomology: The Biodiversity Heritage Library @ the Smithsonian
Digitizing Entomology: The Biodiversity Heritage Library @ the SmithsonianMartin Kalfatovic
 
Literature-data integration in the life sciences – Jo McEntyre, EMBL-EBI
Literature-data integration in the life sciences – Jo McEntyre, EMBL-EBILiterature-data integration in the life sciences – Jo McEntyre, EMBL-EBI
Literature-data integration in the life sciences – Jo McEntyre, EMBL-EBIOpenAIRE
 

Similar a PaperMaker, BeyondThePdf, RebholzSchuhmann, 19Jan2011 (20)

How Bio Ontologies Enable Open Science
How Bio Ontologies Enable Open ScienceHow Bio Ontologies Enable Open Science
How Bio Ontologies Enable Open Science
 
Introduction to EOL.org for scientists
Introduction to EOL.org for scientistsIntroduction to EOL.org for scientists
Introduction to EOL.org for scientists
 
BHL Tech Report
BHL Tech ReportBHL Tech Report
BHL Tech Report
 
Bio-ontologies in bioinformatics: Growing up challenges
Bio-ontologies in bioinformatics: Growing up challengesBio-ontologies in bioinformatics: Growing up challenges
Bio-ontologies in bioinformatics: Growing up challenges
 
Biodiversity Virtual e-Laboratory (BioVeL)
Biodiversity Virtual e-Laboratory (BioVeL)Biodiversity Virtual e-Laboratory (BioVeL)
Biodiversity Virtual e-Laboratory (BioVeL)
 
ICBO 2018 Poster - Current Development in the Evidence and Conclusion Ontolog...
ICBO 2018 Poster - Current Development in the Evidence and Conclusion Ontolog...ICBO 2018 Poster - Current Development in the Evidence and Conclusion Ontolog...
ICBO 2018 Poster - Current Development in the Evidence and Conclusion Ontolog...
 
Tutorial: “How to use ontology repositories and ontology–based services”
Tutorial: “How to use ontology repositories and ontology–based services”Tutorial: “How to use ontology repositories and ontology–based services”
Tutorial: “How to use ontology repositories and ontology–based services”
 
Introduction to Ontologies for Environmental Biology
Introduction to Ontologies for Environmental BiologyIntroduction to Ontologies for Environmental Biology
Introduction to Ontologies for Environmental Biology
 
BHL @ #TDWG09 - with discussion
BHL @ #TDWG09 - with discussionBHL @ #TDWG09 - with discussion
BHL @ #TDWG09 - with discussion
 
Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Inte...
Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Inte...Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Inte...
Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Inte...
 
Advanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchAdvanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven Research
 
Biodiversity Heritiage Library: progress and process
Biodiversity Heritiage Library: progress and processBiodiversity Heritiage Library: progress and process
Biodiversity Heritiage Library: progress and process
 
Biodiversity Heritage Library: Content liberator
Biodiversity Heritage Library: Content liberatorBiodiversity Heritage Library: Content liberator
Biodiversity Heritage Library: Content liberator
 
Introduction to bioinformatics
Introduction to bioinformaticsIntroduction to bioinformatics
Introduction to bioinformatics
 
Experience with MarkLogic at Elsevier
Experience with MarkLogic at ElsevierExperience with MarkLogic at Elsevier
Experience with MarkLogic at Elsevier
 
ONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATION
ONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATIONONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATION
ONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATION
 
ONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATION
ONTOLOGY SERVICE CENTER: A DATAHUB FOR  ONTOLOGY APPLICATION ONTOLOGY SERVICE CENTER: A DATAHUB FOR  ONTOLOGY APPLICATION
ONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATION
 
Ontology Web Services for Semantic Applications
Ontology Web Services for Semantic Applications Ontology Web Services for Semantic Applications
Ontology Web Services for Semantic Applications
 
Digitizing Entomology: The Biodiversity Heritage Library @ the Smithsonian
Digitizing Entomology: The Biodiversity Heritage Library @ the SmithsonianDigitizing Entomology: The Biodiversity Heritage Library @ the Smithsonian
Digitizing Entomology: The Biodiversity Heritage Library @ the Smithsonian
 
Literature-data integration in the life sciences – Jo McEntyre, EMBL-EBI
Literature-data integration in the life sciences – Jo McEntyre, EMBL-EBILiterature-data integration in the life sciences – Jo McEntyre, EMBL-EBI
Literature-data integration in the life sciences – Jo McEntyre, EMBL-EBI
 

Último

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 

Último (20)

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 

PaperMaker, BeyondThePdf, RebholzSchuhmann, 19Jan2011

  • 1. PaperMaker: Validation of biomedical scientific publications January 19th, 2011 Workshop: „BeyondThePdf“ Dietrich Rebholz-Schuhmann, MD, PhD Group Leader Rebholz Group European Bioinformatics Institute
  • 2. Publishing is about … • ... Agreeing / disagreeing about current science • Only peer review can judge current science • ... Bringing new results • Conceptual results are more difficult than new data • ... Gaining new knowledge • New data and new results can imply new knowledge where even the author is still unaware of • ... Rewarding the scientist • Count whatever you can count that could have an impact. • Validating the scientist’s claim is the key reward. • Any scientist can fool any system, but (hopefully) only short-term 2 20.01.2011 Literature and Text Mining BioCreative III, Rebholz
  • 3. Future of biomedical text mining Working towards ... • ... Literature integration • to have it full fledged as part of bioinformatics data resources • ... Cross-domain support • to deliver the content to different scientific communities. • ... Provenance • to carry credit of findings into analytical biomedical research • ... Inference & Reasoning • to make use of the full semantic support in the scientific literature 3 20.01.2011 Literature and Text Mining BioCreative III, Rebholz
  • 4. Literature content in the Semantic Web 4 20.01.2011 Literature and Text Mining
  • 5. Terminologies vs. Ontologies Ontological resources Database type Resource building Explicit semantics Terminologies, collection of terms Manual generation Automatic generation Consistency, inference, reasoning Exploitation of terminological features Interoperability with all semantic Standardisation of TM solutions resources Interoperability with database Working towards a reasoning resources infrastructure 5 Literature and Text Mining
  • 6. Efforts in the Rebholz group towards interoperability of literature with bioinformatics • Whatizit infrastructure • Biomedical NER as a public, large-scale service • LexEBI / BioLexicon (collab. w. NaCTeM, Pisa-U) • Biomedical terminological resource, standardisation of semantics • IeXML (BioLink SIG 2006, Brasil) • Put the annotations into the document (inline annotations) • CALBC project • Collaborative annotation of a large-scale biomedical corpus • UKPMC: U.K. Pubmed Central (collab. w. NaCTeM, BL) • Use of Whatizit, BioLexicon, IeXML, CALBC alignments for the delivery of quality annotation services to the public • SESL project • Joint project with pharma & publishers, literature content in a triple store • PaperMaker • Validation of the scientific literature against the above 6 20.01.2011 Literature and Text Mining BioCreative III, Rebholz
  • 7. 1 Whatizit 7 20.01.2011 Literature and Text Mining BioCreative III, Rebholz
  • 8. Integrating biomedical literature and data Rebholz-Schuhmann, D., et al. Text Processing through Web Services: Calling Whatizit. Bioinformatics 24, no. 2 (2008): 296-98. 8 20.01.2011 Literature and Text Mining
  • 9. 2 BioLexicon LexEBI 9 20.01.2011 Literature and Text Mining BioCreative III, Rebholz
  • 10. LexEBI: content # Labels # Variants Total Total / # Unique Uniq. T. / Labels terms Labels Prot. Gene GP 7.0 516,113 4,005,040 4,521,153 8.76 1,726,853 3.35 / GP 6.0 488,577 3,389,316 3,877,893 7.94 1,564,436 3.20 Jochem 278,578 1,691,980 1,970,558 7.07 1,527,752 5.48 Chemi- cals ChEBI 19,645 94,748 114,393 5.82 101,307 5.16 ChEBI (all) 549,838 1,187,322 1,737,160 3.16 Enzymes 4,905 8,082 12,987 2.65 12,377 2.52 Other Species 643,280 199,130 842,410 1.31 838,135 1.30 Interpro 20,671 0 20,671 1.00 20,671 1.00 Antineuro., 4,718 6,488 11,206 2.38 Neo Bio. Act. 54,148 87,209 141,357 2.61 UMLS Enzymes 26,065 56,332 82,397 3.16 Lipid, Carb. 11,518 9,770 21,288 1.85 Pharm. Act. 104,201 123,840 228,041 2.19 Vit., Horm. 6,877 10,258 17,135 2.49 10 20.01.2011 Literature and Text Mining
  • 11. 3 IeXML 11 20.01.2011 Literature and Text Mining BioCreative III, Rebholz
  • 12. IeXML: Annotating entities in text • Inline annotations to any part of the document with the annotations • No hassle with character or byte counts or layout modifications to the document • “Alignment” of annotated documtents to • Compare annotations • Validate annotations • Harmonise annotations (SESL project) 12 20.01.2011 Literature and Text Mining BioCreative III, Rebholz
  • 13. 4 CALBC 13 20.01.2011 Literature and Text Mining BioCreative III, Rebholz
  • 14. The challenge 150,000 documents or more ... Test set for all systems Assessment, benchmarking 14 20.01.2011 Literature and Text Mining BioCreative III, Rebholz
  • 15. CALBC Challenge II (1) 75,000 documents training data (2) 175,000 testing data (3) Additional 700,000 testing data • September 13th 2010: Second harmonized corpus available for CALBC Challenge II • December 15th, 2010: Challenge II closes • March 2011: CALBC Workshop II • June 30th, 2011: Final harmonized corpus available Literature and Text Mining BioCreative III, Rebholz
  • 16. 5 Ukpmc/Elixir 16 20.01.2011 Literature and Text Mining BioCreative III, Rebholz
  • 17. 17 20.01.2011 Literature and Text Mining BioCreative III, Rebholz
  • 18. UKPMC ~ 10 % the size of PubMed 18 20.01.2011 Literature and Text Mining BioCreative III, Rebholz
  • 19. 6 sesl 19 20.01.2011 Literature and Text Mining BioCreative III, Rebholz
  • 20. SESL Project: from publisher to pharma Multiple Consumers Disease Knowledge Dossier Applications Service Layer (RDF, Web 2.0) Std Public Open Common Assertions, SPARQL, Triple Store Vocabularies Stan- Service Integration, Inference, Reasoning Business dards Broker Sharing of data Rules Content Suppliers 20 20.01.2011 Literature20 and Text Mining
  • 21. Literature content in the Semantic Web 21 20.01.2011 Literature and Text Mining
  • 22. 7 Papermaker 22 20.01.2011 Literature and Text Mining BioCreative III, Rebholz
  • 23. PaperMaker - Overview • Inte • PaperMaker - a tool to support authors writing biomedical papers: • Interactive feedback on the contents of papers (related work and concept annotations) • Formal consistency criteria checking (spelling, terminology, acronyms, references) 30.03.2009 Literature and Text Mining BioCreative III, Rebholz
  • 24. Consistency parameters Domain-independent • General spelling and grammar • General readability • Appropriate use of references • Finding and acknowledging related work 30.03.2009 Literature and Text Mining BioCreative III, Rebholz
  • 25. Consistence parameters Domain-specific • The use of terminology: • Should be consistent with naming domain-specific guidelines • Should not be ambiguous • Should conform to the conventional usage (possible clashes between naming guidelines and common-sense convention) • Useful to resolve terminology to reference databases (e. g. UniProt for protein names, ChEBI chemical entities, etc.) • The special case of acronyms 30.03.2009 Literature and Text Mining BioCreative III, Rebholz
  • 26. Content feedback • Resolving the contents to literature repositories • Finding related work (document retrieval) • Finding related ideas (passage retrieval) • Resolving the contents to ontological reference databases • MeSH descriptors have been demonstrated to improve biomedical information retrieval. Can we suggest MeSH terms directly to the authors? • Gene Ontology (GO) terms are increasingly used in information extraction systems. 30.03.2009 Literature and Text Mining BioCreative III, Rebholz
  • 27. PaperMaker workflow 30.03.2009 Literature and Text Mining BioCreative III, Rebholz
  • 32. Conclusions • PaperMaker can help the author conform to the formal requirements of paper writing with special emphasis on the domain • It also provides feedback on the contents by relating it to reference resources and literature repositories • It may improve the indexing of a paper in literature repositories (less ambiguous terminology) • http://www.ebi.ac.uk/Rebholz-srv/PaperMaker Work in progress  30.03.2009 Literature and Text Mining BioCreative III, Rebholz
  • 33. 8 Summary 33 20.01.2011 Literature and Text Mining BioCreative III, Rebholz
  • 34. Efforts in the Rebholz group towards interoperability of literature with bioinformatics • Whatizit infrastructure • Biomedical NER as a public, large-scale service • LexEBI / BioLexicon (collab. w. NaCTeM, Pisa-U) • Biomedical terminological resource, standardisation of semantics • IeXML (BioLink SIG 2006, Brasil) • Put the annotations into the document (inline annotations) • CALBC project • Collaborative annotation of a large-scale biomedical corpus • UKPMC: U.K. Pubmed Central (collab. w. NaCTeM, BL) • Use of Whatizit, BioLexicon, IeXML, CALBC alignments for the delivery of quality annotation services to the public • SESL project • Joint project with pharma & publishers, literature content in a triple store • PaperMaker • Validation of the scientific literature against the above 34 20.01.2011 Literature and Text Mining BioCreative III, Rebholz
  • 35. Literature and Text Mining BioCreative III, Rebholz