SlideShare una empresa de Scribd logo
1 de 43
Wikidata as a platform for
biocuration
Benjamin Good
The Scripps Research Institute
@bgood
Organization , more information
http://tinyurl.com/biocuration-wikidata
• Part 1: getting to know wikidata
• A platform for biocuration (Ben Good)
• Wikipedia and Wikidata for research (Daniel Mietchen)
• Editing Wikidata (Sebastian Burgstaller-Muelbacher)
• Coffee Break
• Part 2: getting your hands on wikidata: flash
biocuration jamboree!
API
Flatfiles
The dominant paradigm for open biocuration
API
Flatfiles
Your
Database
Your
Database
Your
Databasexrefs
Your
Database
Pain points
• API or flatfile parsing
• Ambiguous or non-existent xrefs
• Persistence of funding
• Too much information to curate
My Web
Application
My Database
My Database Curators
My Research Grants
$
Biomedical
knowledge
A new paradigm for open biocuration?
Our
Applications
Our Database?
Our Database Curators
And our community
Biomedical
knowledge
Our
Applications
Our
Applications
My Research Grants
$
Reducing the pain
• Reduces API/parser proliferation
• Forces up-front integration
• Facilitates coordination
• Ensures that if funding is lost,
data is not
• Invites community input
A new platform for open biocuration?
Our
Applications
Our Database Curators
And our community
Biomedical
knowledge
Our
Applications
Our
Applications
My Research Grants
$
Is to data
as Wikipedia is to text
“Giving more people more access to more knowledge”
A free and open repository of knowledge
• Initiated by WikiMedia Germany
• In transition to the WikiMedia Foundation
• Not a ‘project’… as stable as Wikipedia
It’s a knowledge
base!
• Anyone can edit
(human or robot)
• Anyone can use
(CC0)
Elements of the kb are called ‘items’
• Labels and
descriptions in
many languages
Items are unique concepts,
used to link different language
Wikipedias together
Q146
Af:Kat
En:cat
Als:Hauskatze
Ang:Catte
Av:Keto
Items are described by “statements” that link
together to form the language-independent
wikidata knowledge graph
Cat
Domesticated
Animal
Animal
Subclass Of
Subclass Of
Animalia
Taxon name
Kingdom
Taxon rank
Item: Q84
Item: Q414043
RELN
Genomic start: 103471784
GenLoc assembly:
GRCh38
Stated in:
Ensembl Release 83
Retrieved:
19 January 2016
Value (numeric)
Property
Claim Qualifiers
References
https://www.wikidata.org/wiki/Q414043
Statement
Genomic position for Reelin gene
Item: Q414043
RELN
Encodes: Reelin (protein) Stated in:
NCBI homo sapiens
annotation release 107
Retrieved:
19 January 2016
Value (item)
Property
Claim Qualifiers
References
https://www.wikidata.org/wiki/Q414043
Statement
Linking the Reelin gene to a protein it encodes
Item: Q13561329
Reelin
Cell component: dendrite
Determination method:
• ISS (Sequence or structural
Similarity)
• IEA (Electronic annotation)
Stated in:
Uniprot
Retrieved:
21 March 2016
Value (item)
Property
Claim Qualifiers
References
https://www.wikidata.org/wiki/Q13561329
Statement
Gene ontology annotation for Reelin protein
with evidence codes modeled as qualifiers
graphical view
RELN
Reelin
encodes
dendrite
cellular component
claim
ISS IEA
Determination method:
qualifiers
UniProt
stated in
retrieved
21 March
2016
References
Statement
Questions about wikidata structure?
Inter-item links form a giant knowledge graph
Everything is connected
Reelin, Heart disease,
Barack Obama,
everything..
https://query.wikidata.org
SPARQL endpoint for Wikidata
“GO cellular localization annotations for Reelin with
evidence code ISS” from http://query.wikidata.org
http://tinyurl.com/biowiki-sparql
ISS GO cellular localization annotations for
Reelin
RELN
Reelin
encodes
dendrite
cellular component
claim
ISS IEA
Determination method:
qualifiers
UniProt
stated in
retrieved
21 March
2016
References
Statement
SPARQL graph..
http://tinyurl.com/biowiki-sparql
“GWAS-based disease associations for Reelin”
http://tinyurl.com/biowiki-sparql
“GWAS-based disease associations for Reelin”
A new platform for open biocuration?
Our
Applications
Our Database Curators
And our community
Biomedical
knowledge
Our
Applications
Our
Applications
My Research Grants
$
• SPARQL = a common
API for accessing
content
• 1 endpoint to
maintain…
• Its working
Wikidata and the Semantic Web
• Hub for linked open data
• A lot of initial contents are
identifier links (e.g. we link
drug items to 18 different
schemes)
• e.g. see Vemurafenib
• Supports federated queries
– e.g. you can do one query
that spans wikidata content
and uniprot RDF content
On its way to replacing
Dbpedia as the central node
Questions about wikidata and the semantic
web ?
Social controls
• Anyone can
• Add or edit labels, descriptions, statements, references etc. on existing items
• Create new items
• Link items to Wikipedia articles
• Query using https://query.wikidata.org
• Read and write small numbers of edits with
https://www.wikidata.org/w/api.php
• Propose a new property
• Request a bot account for high-volume automated editing
Here be dragons..
Properties (as of April 10, 2016)
• 2196 active properties
• 114 new properties that have been proposed but not yet approved
Proposal
https://www.wikidata.org/wiki/Wikidata:Property_proposal
After proposal, community discussion
• Each property is left open
for discussion by anyone
until
• An administrator or other
person blessed with the
power either creates it or
decides not to create it
based on the discussion
• People that enjoy ontology
arguments needed here!
Lengthy (cut-off) discussion of proposal for ‘extinct’ property
https://www.wikidata.org/wiki/Wikidata:Property_proposal/
Property proposal on wikidata
Proposal
Community discussion
Bot accounts
• https://www.wikidata.org/wiki/Wikidata:Requests_for_permissions/Bot
• Same basic process.
Proposal discussions
• Can not be avoided
• The discussions are long and tiring but important
• Many of the people involved are quite experienced
• All are trying to make something great
• Persistence and patience required
Questions on community processes?
• (more to come)
The first application built on wikidata, Wikipeda
Our Database Curators
And our community
Biomedical
knowledge
Our
Applications
Our
Applications
Su, Schriml, Pavlidis R01 Grant…
$
Deeply integrated,
(incredible SEO)
Application #1
Burgstaller et al (2016)
Impact of wikidata on Wikipedia
Gene Wiki
Version 1.
{{GNF_Protein_box | Name = Reelin| image = |
image_source = | PDB = {{PDB2|4AD9}} | HGNCid = 18512 |
MGIid = | Symbol = LACTB2 | AltSymbols =; CGI-83 |
IUPHAR = | ChEMBL = | OMIM = None | ECnumber = |
Homologene = 9349 | GeneAtlas_image1 = |
GeneAtlas_image2 = | GeneAtlas_image3 = |
Protein_domain_image = | Function =
{{GNF_GO|id=GO:0005515 |text = protein binding}}
{{GNF_GO|id=GO:0016787 |text = hydrolase activity}}
{{GNF_GO|id=GO:0046872 |text = metal ion binding}} |
Component = {{GNF_GO|id=GO:0005739 |text =
mitochondrion}} | Process = {{GNF_GO|id=GO:0008152
|text = metabolic process}} | Hs_EntrezGene = 51110 |
Hs_Ensembl = ENSG00000147592 | Hs_RefseqmRNA =
NM_016027 | Hs_RefseqProtein = NP_057111 |
Hs_GenLoc_db = hg38 | Hs_GenLoc_chr = 8 |
Hs_GenLoc_start = 70635318 | Hs_GenLoc_end = 70669174
| Hs_Uniprot = Q53H82 | Mm_EntrezGene = 212442 |
Mm_Ensembl = ENSMUSG00000025937 |
Mm_RefseqmRNA = NM_145381 | Mm_RefseqProtein =
NP_663356 | Mm_GenLoc_db = mm10 | Mm_GenLoc_chr =
1 | Mm_GenLoc_start = 13623330 | Mm_GenLoc_end =
13660546 | Mm_Uniprot = Q99KR3 | path = PBB/51110}}
=
Gene Wiki
Version 2.
{{Infobox gene}}
• All data in
Wikidata
• 1 Lua script works
for all genes
=
(1 of these for every gene)
Wikidata use increasing on Wikipedia
• https://en.wikipedia.org/wiki/Category:
Templates_using_data_from_Wikidata
• 81 templates indicate that they use it
Questions about Wikidata and Wikipedia?
Application #2 Web Apollo Genome Browser
39Putman et al (2016)
The next application built on wikidata, yours?
Our Database Curators
And our community
Biomedical
knowledge
???? ????
$
My Research Grants
Its your data as
much as anyone
else’s!
Current state: seeding nodes for the graph
• All human, mouse genes and proteins (swissprot)
• All Gene Ontology terms
• All FDA approved drugs
• All Human Disease Ontology terms
• 109 reference microbial genomes
Burgstaller-Muelbacher et al (2016) Database
Mitraka et al (2015) Semantic Web Applications for the Life Sciences
Putman et al (2016) Database
Next data step: connecting the nodes
• Our group
• Human: Gene-disease (PhenoCARTA team)
• Human: Drug-disease (CHEMBL, NDF-RT)
• Human: Gene-drug (CHEMBL, NDF-RT)
• Expanding microbial information (Putman, Koehurst, Knight lab)
• Your group
• ?
• Today
• A knowledge base for understanding Zika ?
Acknowledgements
Gene Wikidata Team
Andra Waagmeester (Micelio)
* Sebastian Burgstaller (Scripps)
* Tim Putman (Scripps)
* Elvira Mitraka (U Maryland)
Julia Turner (Scripps)
Justin Leong (UBC)
Lynn Schriml (U Maryland)
Paul Pavlidis (UBC)
Andrew Su (Scripps)
Ginger Tsueng (Scripps)
Contact
bgood@scripps.edu
@bgood on twitter* First author on manuscript cited in this presentation
Adapted logo
Su Laboratory at TSRI The 16,950 other active editors of
Wikidata and especially the 693 that
joined last month and the 809 that
joined the month before that and
the 721 that joined the month
before that..

Más contenido relacionado

La actualidad más candente

Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
Semantic approaches for biomedical knowledge discovery - Discovery Science 20...Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
Michel Dumontier
 

La actualidad más candente (20)

2016 ACS Semantic Approaches for Biochemical Knowledge Discovery
2016 ACS Semantic Approaches for Biochemical Knowledge Discovery2016 ACS Semantic Approaches for Biochemical Knowledge Discovery
2016 ACS Semantic Approaches for Biochemical Knowledge Discovery
 
Applied semantic technology and linked data
Applied semantic technology and linked dataApplied semantic technology and linked data
Applied semantic technology and linked data
 
Making it Easier, Possibly Even Pleasant, to Author Rich Experimental Metadata
Making it Easier, Possibly Even Pleasant, to Author Rich Experimental MetadataMaking it Easier, Possibly Even Pleasant, to Author Rich Experimental Metadata
Making it Easier, Possibly Even Pleasant, to Author Rich Experimental Metadata
 
W3C HCLS Dataset Description Guidelines
W3C HCLS Dataset Description GuidelinesW3C HCLS Dataset Description Guidelines
W3C HCLS Dataset Description Guidelines
 
Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
Semantic approaches for biomedical knowledge discovery - Discovery Science 20...Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
 
Better Data for a Better World
Better Data for a Better WorldBetter Data for a Better World
Better Data for a Better World
 
Building a Network of Interoperable and Independently Produced Linked and Ope...
Building a Network of Interoperable and Independently Produced Linked and Ope...Building a Network of Interoperable and Independently Produced Linked and Ope...
Building a Network of Interoperable and Independently Produced Linked and Ope...
 
GigaScience: data and beta-database launch. Announcing GigaDB
GigaScience: data and beta-database launch. Announcing GigaDBGigaScience: data and beta-database launch. Announcing GigaDB
GigaScience: data and beta-database launch. Announcing GigaDB
 
Can machines understand the scientific literature?
Can machines understand the scientific literature?Can machines understand the scientific literature?
Can machines understand the scientific literature?
 
Data Science for the Win
Data Science for the WinData Science for the Win
Data Science for the Win
 
Omdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital AgeOmdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital Age
 
ContentMine: Mining the Scientific Literature
ContentMine: Mining the Scientific LiteratureContentMine: Mining the Scientific Literature
ContentMine: Mining the Scientific Literature
 
Nicole Nogoy's talk at eResearchNZ 2014: Improving data sharing, integration ...
Nicole Nogoy's talk at eResearchNZ 2014: Improving data sharing, integration ...Nicole Nogoy's talk at eResearchNZ 2014: Improving data sharing, integration ...
Nicole Nogoy's talk at eResearchNZ 2014: Improving data sharing, integration ...
 
Scott Edmunds: Revolutionizing Data Dissemination: GigaScience
Scott Edmunds: Revolutionizing Data Dissemination: GigaScienceScott Edmunds: Revolutionizing Data Dissemination: GigaScience
Scott Edmunds: Revolutionizing Data Dissemination: GigaScience
 
GigaScience: a new resource for the big-data community.
GigaScience: a new resource for the big-data community.GigaScience: a new resource for the big-data community.
GigaScience: a new resource for the big-data community.
 
Automatic Extraction of Knowledge from Biomedical literature
Automatic Extraction of Knowledge from Biomedical literatureAutomatic Extraction of Knowledge from Biomedical literature
Automatic Extraction of Knowledge from Biomedical literature
 
Cochrane workshop2016
Cochrane workshop2016Cochrane workshop2016
Cochrane workshop2016
 
Automatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the LiteratureAutomatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the Literature
 
ContentMine + EPMC: Finding Zika!
ContentMine + EPMC: Finding Zika!ContentMine + EPMC: Finding Zika!
ContentMine + EPMC: Finding Zika!
 
Automatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the LiteratureAutomatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the Literature
 

Destacado

Poster: Microtask crowdsourcing for disease mention annotation in PubMed abst...
Poster: Microtask crowdsourcing for disease mention annotation in PubMed abst...Poster: Microtask crowdsourcing for disease mention annotation in PubMed abst...
Poster: Microtask crowdsourcing for disease mention annotation in PubMed abst...
Benjamin Good
 
How open is open? An evaluation rubric for public knowledgebases
How open is open?  An evaluation rubric for public knowledgebasesHow open is open?  An evaluation rubric for public knowledgebases
How open is open? An evaluation rubric for public knowledgebases
mhaendel
 
Eishi Company Profile 修改好的
Eishi Company Profile 修改好的Eishi Company Profile 修改好的
Eishi Company Profile 修改好的
eishimachinery
 
Resume 2009 Compatible V2 1
Resume 2009 Compatible V2 1 Resume 2009 Compatible V2 1
Resume 2009 Compatible V2 1
schelby
 
Human Guided Forests (HGF)
Human Guided Forests (HGF)Human Guided Forests (HGF)
Human Guided Forests (HGF)
Benjamin Good
 

Destacado (20)

Poster: Microtask crowdsourcing for disease mention annotation in PubMed abst...
Poster: Microtask crowdsourcing for disease mention annotation in PubMed abst...Poster: Microtask crowdsourcing for disease mention annotation in PubMed abst...
Poster: Microtask crowdsourcing for disease mention annotation in PubMed abst...
 
Dataset description using the W3C HCLS standard
Dataset description using the W3C HCLS standardDataset description using the W3C HCLS standard
Dataset description using the W3C HCLS standard
 
Predicting Pre-click Quality for Native Advertisements
Predicting Pre-click Quality for Native AdvertisementsPredicting Pre-click Quality for Native Advertisements
Predicting Pre-click Quality for Native Advertisements
 
How open is open? An evaluation rubric for public knowledgebases
How open is open?  An evaluation rubric for public knowledgebasesHow open is open?  An evaluation rubric for public knowledgebases
How open is open? An evaluation rubric for public knowledgebases
 
The Application of the Human Phenotype Ontology
The Application of the Human Phenotype Ontology The Application of the Human Phenotype Ontology
The Application of the Human Phenotype Ontology
 
Mark Hopper Product And Marketing Exec 2010
Mark Hopper Product And Marketing Exec 2010Mark Hopper Product And Marketing Exec 2010
Mark Hopper Product And Marketing Exec 2010
 
IMSafer Angel Round
IMSafer Angel RoundIMSafer Angel Round
IMSafer Angel Round
 
Dagens Næringslivs overgang til Lucene/Solr søk
Dagens Næringslivs overgang til Lucene/Solr søkDagens Næringslivs overgang til Lucene/Solr søk
Dagens Næringslivs overgang til Lucene/Solr søk
 
Citizen sciencepanel2015 pdf
Citizen sciencepanel2015 pdfCitizen sciencepanel2015 pdf
Citizen sciencepanel2015 pdf
 
Oslo Solr MeetUp March 2012 - Solr4 alpha
Oslo Solr MeetUp March 2012 - Solr4 alphaOslo Solr MeetUp March 2012 - Solr4 alpha
Oslo Solr MeetUp March 2012 - Solr4 alpha
 
Eishi Company Profile 修改好的
Eishi Company Profile 修改好的Eishi Company Profile 修改好的
Eishi Company Profile 修改好的
 
First oslo solr community meetup lightning talk janhoy
First oslo solr community meetup lightning talk janhoyFirst oslo solr community meetup lightning talk janhoy
First oslo solr community meetup lightning talk janhoy
 
Resume 2009 Compatible V2 1
Resume 2009 Compatible V2 1 Resume 2009 Compatible V2 1
Resume 2009 Compatible V2 1
 
genegames.org
genegames.orggenegames.org
genegames.org
 
B2B Branding Explained
B2B Branding ExplainedB2B Branding Explained
B2B Branding Explained
 
2to3
2to32to3
2to3
 
Bio Logical Mass Collaboration3
Bio Logical Mass Collaboration3Bio Logical Mass Collaboration3
Bio Logical Mass Collaboration3
 
EISHI CO. main eps machine catalogue
EISHI CO. main eps machine catalogueEISHI CO. main eps machine catalogue
EISHI CO. main eps machine catalogue
 
Gene wiki jamboree
Gene wiki jamboreeGene wiki jamboree
Gene wiki jamboree
 
Human Guided Forests (HGF)
Human Guided Forests (HGF)Human Guided Forests (HGF)
Human Guided Forests (HGF)
 

Similar a Wikidata workshop for ISB Biocuration 2016

Research Data Alliance (RDA) Webinar: What do you really know about that anti...
Research Data Alliance (RDA) Webinar: What do you really know about that anti...Research Data Alliance (RDA) Webinar: What do you really know about that anti...
Research Data Alliance (RDA) Webinar: What do you really know about that anti...
dkNET
 
dkNET Webinar: Discovering and Evaluating Antibodies, Cell Lines, Software To...
dkNET Webinar: Discovering and Evaluating Antibodies, Cell Lines, Software To...dkNET Webinar: Discovering and Evaluating Antibodies, Cell Lines, Software To...
dkNET Webinar: Discovering and Evaluating Antibodies, Cell Lines, Software To...
dkNET
 
2012 05 usain-johanneskeizer
2012 05 usain-johanneskeizer2012 05 usain-johanneskeizer
2012 05 usain-johanneskeizer
Johannes Keizer
 

Similar a Wikidata workshop for ISB Biocuration 2016 (20)

Overview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data AnalysisOverview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data Analysis
 
(Bio)Hackathons
(Bio)Hackathons(Bio)Hackathons
(Bio)Hackathons
 
High-performance web services for gene and variant annotations
High-performance web services for gene and variant annotationsHigh-performance web services for gene and variant annotations
High-performance web services for gene and variant annotations
 
A Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical ResearchA Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical Research
 
The Big Picture: The Industrial Revolutiona talk in berlin, 2008, about indus...
The Big Picture: The Industrial Revolutiona talk in berlin, 2008, about indus...The Big Picture: The Industrial Revolutiona talk in berlin, 2008, about indus...
The Big Picture: The Industrial Revolutiona talk in berlin, 2008, about indus...
 
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
 
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...
 
Research Data Alliance (RDA) Webinar: What do you really know about that anti...
Research Data Alliance (RDA) Webinar: What do you really know about that anti...Research Data Alliance (RDA) Webinar: What do you really know about that anti...
Research Data Alliance (RDA) Webinar: What do you really know about that anti...
 
Big data's impact on healthcare
Big data's impact on healthcareBig data's impact on healthcare
Big data's impact on healthcare
 
Nicole Nogoy at the Auckland BMC RoadShow
Nicole Nogoy at the Auckland BMC RoadShowNicole Nogoy at the Auckland BMC RoadShow
Nicole Nogoy at the Auckland BMC RoadShow
 
dkNET Webinar: Discovering and Evaluating Antibodies, Cell Lines, Software To...
dkNET Webinar: Discovering and Evaluating Antibodies, Cell Lines, Software To...dkNET Webinar: Discovering and Evaluating Antibodies, Cell Lines, Software To...
dkNET Webinar: Discovering and Evaluating Antibodies, Cell Lines, Software To...
 
Verifiable, linked open knowledge that anyone can edit
Verifiable, linked open knowledge that anyone can editVerifiable, linked open knowledge that anyone can edit
Verifiable, linked open knowledge that anyone can edit
 
2012 05 usain-johanneskeizer
2012 05 usain-johanneskeizer2012 05 usain-johanneskeizer
2012 05 usain-johanneskeizer
 
Providing support for JC Bradleys vision of open science using RSC cheminform...
Providing support for JC Bradleys vision of open science using RSC cheminform...Providing support for JC Bradleys vision of open science using RSC cheminform...
Providing support for JC Bradleys vision of open science using RSC cheminform...
 
Directions in Open Science
Directions in Open ScienceDirections in Open Science
Directions in Open Science
 
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data HandlingScott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
 
i5k Workspace Workshop - AGS2017
i5k Workspace Workshop - AGS2017i5k Workspace Workshop - AGS2017
i5k Workspace Workshop - AGS2017
 
NHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-LifeNHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-Life
 
NHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-LifeNHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-Life
 
Ouellette elixir 2017
Ouellette elixir 2017Ouellette elixir 2017
Ouellette elixir 2017
 

Más de Benjamin Good

Integrating Pathway Databases with Gene Ontology Causal Activity Models
Integrating Pathway Databases with Gene Ontology Causal Activity ModelsIntegrating Pathway Databases with Gene Ontology Causal Activity Models
Integrating Pathway Databases with Gene Ontology Causal Activity Models
Benjamin Good
 
(Poster) Knowledge.Bio: an Interactive Tool for Literature-based Discovery
(Poster) Knowledge.Bio: an Interactive Tool for Literature-based Discovery (Poster) Knowledge.Bio: an Interactive Tool for Literature-based Discovery
(Poster) Knowledge.Bio: an Interactive Tool for Literature-based Discovery
Benjamin Good
 
Building a massive biomedical knowledge graph with citizen science
Building a massive biomedical knowledge graph with citizen scienceBuilding a massive biomedical knowledge graph with citizen science
Building a massive biomedical knowledge graph with citizen science
Benjamin Good
 
Branch: An interactive, web-based tool for building decision tree classifiers
Branch: An interactive, web-based tool for building decision tree classifiersBranch: An interactive, web-based tool for building decision tree classifiers
Branch: An interactive, web-based tool for building decision tree classifiers
Benjamin Good
 
The Cure: Making a game of gene selection for breast cancer survival prediction
The Cure: Making a game of gene selection for breast cancer survival predictionThe Cure: Making a game of gene selection for breast cancer survival prediction
The Cure: Making a game of gene selection for breast cancer survival prediction
Benjamin Good
 
Microtask crowdsourcing for disease mention annotation in PubMed abstracts
Microtask crowdsourcing for disease mention annotation in PubMed abstractsMicrotask crowdsourcing for disease mention annotation in PubMed abstracts
Microtask crowdsourcing for disease mention annotation in PubMed abstracts
Benjamin Good
 
Mark2Cure: a crowdsourcing platform for biomedical literature annotation
Mark2Cure: a crowdsourcing platform for biomedical literature annotationMark2Cure: a crowdsourcing platform for biomedical literature annotation
Mark2Cure: a crowdsourcing platform for biomedical literature annotation
Benjamin Good
 
Gene Wiki at Phenotype RCN annual meeting
Gene Wiki at Phenotype RCN annual meetingGene Wiki at Phenotype RCN annual meeting
Gene Wiki at Phenotype RCN annual meeting
Benjamin Good
 
Short update on The Cure game first week
Short update on The Cure game first weekShort update on The Cure game first week
Short update on The Cure game first week
Benjamin Good
 
An online game for human phenotype prediction
An online game for human phenotype predictionAn online game for human phenotype prediction
An online game for human phenotype prediction
Benjamin Good
 

Más de Benjamin Good (20)

Representing and reasoning with biological knowledge
Representing and reasoning with biological knowledgeRepresenting and reasoning with biological knowledge
Representing and reasoning with biological knowledge
 
Integrating Pathway Databases with Gene Ontology Causal Activity Models
Integrating Pathway Databases with Gene Ontology Causal Activity ModelsIntegrating Pathway Databases with Gene Ontology Causal Activity Models
Integrating Pathway Databases with Gene Ontology Causal Activity Models
 
Pathways2GO: Converting BioPax pathways to GO-CAMs
Pathways2GO: Converting BioPax pathways to GO-CAMsPathways2GO: Converting BioPax pathways to GO-CAMs
Pathways2GO: Converting BioPax pathways to GO-CAMs
 
Knowledge Beacons
Knowledge BeaconsKnowledge Beacons
Knowledge Beacons
 
Science Game Lab
Science Game LabScience Game Lab
Science Game Lab
 
Scripps bioinformatics seminar_day_2
Scripps bioinformatics seminar_day_2Scripps bioinformatics seminar_day_2
Scripps bioinformatics seminar_day_2
 
Computing on the shoulders of giants
Computing on the shoulders of giantsComputing on the shoulders of giants
Computing on the shoulders of giants
 
(Poster) Knowledge.Bio: an Interactive Tool for Literature-based Discovery
(Poster) Knowledge.Bio: an Interactive Tool for Literature-based Discovery (Poster) Knowledge.Bio: an Interactive Tool for Literature-based Discovery
(Poster) Knowledge.Bio: an Interactive Tool for Literature-based Discovery
 
Gene Wiki and Mark2Cure update for BD2K
Gene Wiki and Mark2Cure update for BD2KGene Wiki and Mark2Cure update for BD2K
Gene Wiki and Mark2Cure update for BD2K
 
2015 6 bd2k_biobranch_knowbio
2015 6 bd2k_biobranch_knowbio2015 6 bd2k_biobranch_knowbio
2015 6 bd2k_biobranch_knowbio
 
Building a massive biomedical knowledge graph with citizen science
Building a massive biomedical knowledge graph with citizen scienceBuilding a massive biomedical knowledge graph with citizen science
Building a massive biomedical knowledge graph with citizen science
 
Branch: An interactive, web-based tool for building decision tree classifiers
Branch: An interactive, web-based tool for building decision tree classifiersBranch: An interactive, web-based tool for building decision tree classifiers
Branch: An interactive, web-based tool for building decision tree classifiers
 
Serious games for bioinformatics education. ISMB 2014 education workshop
Serious games for bioinformatics education.  ISMB 2014 education workshopSerious games for bioinformatics education.  ISMB 2014 education workshop
Serious games for bioinformatics education. ISMB 2014 education workshop
 
The Cure: Making a game of gene selection for breast cancer survival prediction
The Cure: Making a game of gene selection for breast cancer survival predictionThe Cure: Making a game of gene selection for breast cancer survival prediction
The Cure: Making a game of gene selection for breast cancer survival prediction
 
Microtask crowdsourcing for disease mention annotation in PubMed abstracts
Microtask crowdsourcing for disease mention annotation in PubMed abstractsMicrotask crowdsourcing for disease mention annotation in PubMed abstracts
Microtask crowdsourcing for disease mention annotation in PubMed abstracts
 
Mark2Cure: a crowdsourcing platform for biomedical literature annotation
Mark2Cure: a crowdsourcing platform for biomedical literature annotationMark2Cure: a crowdsourcing platform for biomedical literature annotation
Mark2Cure: a crowdsourcing platform for biomedical literature annotation
 
The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Surviva...
The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Surviva...The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Surviva...
The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Surviva...
 
Gene Wiki at Phenotype RCN annual meeting
Gene Wiki at Phenotype RCN annual meetingGene Wiki at Phenotype RCN annual meeting
Gene Wiki at Phenotype RCN annual meeting
 
Short update on The Cure game first week
Short update on The Cure game first weekShort update on The Cure game first week
Short update on The Cure game first week
 
An online game for human phenotype prediction
An online game for human phenotype predictionAn online game for human phenotype prediction
An online game for human phenotype prediction
 

Último

SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
RizalinePalanog2
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
levieagacer
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
PirithiRaju
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
Sérgio Sacani
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
PirithiRaju
 
Introduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptxIntroduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptx
Bhagirath Gogikar
 

Último (20)

SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
 
Unit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 oUnit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 o
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Introduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptxIntroduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptx
 

Wikidata workshop for ISB Biocuration 2016

  • 1. Wikidata as a platform for biocuration Benjamin Good The Scripps Research Institute @bgood
  • 2. Organization , more information http://tinyurl.com/biocuration-wikidata • Part 1: getting to know wikidata • A platform for biocuration (Ben Good) • Wikipedia and Wikidata for research (Daniel Mietchen) • Editing Wikidata (Sebastian Burgstaller-Muelbacher) • Coffee Break • Part 2: getting your hands on wikidata: flash biocuration jamboree!
  • 3. API Flatfiles The dominant paradigm for open biocuration API Flatfiles Your Database Your Database Your Databasexrefs Your Database Pain points • API or flatfile parsing • Ambiguous or non-existent xrefs • Persistence of funding • Too much information to curate My Web Application My Database My Database Curators My Research Grants $ Biomedical knowledge
  • 4. A new paradigm for open biocuration? Our Applications Our Database? Our Database Curators And our community Biomedical knowledge Our Applications Our Applications My Research Grants $ Reducing the pain • Reduces API/parser proliferation • Forces up-front integration • Facilitates coordination • Ensures that if funding is lost, data is not • Invites community input
  • 5. A new platform for open biocuration? Our Applications Our Database Curators And our community Biomedical knowledge Our Applications Our Applications My Research Grants $
  • 6. Is to data as Wikipedia is to text “Giving more people more access to more knowledge” A free and open repository of knowledge • Initiated by WikiMedia Germany • In transition to the WikiMedia Foundation • Not a ‘project’… as stable as Wikipedia
  • 7. It’s a knowledge base! • Anyone can edit (human or robot) • Anyone can use (CC0)
  • 8. Elements of the kb are called ‘items’ • Labels and descriptions in many languages
  • 9. Items are unique concepts, used to link different language Wikipedias together Q146 Af:Kat En:cat Als:Hauskatze Ang:Catte Av:Keto
  • 10. Items are described by “statements” that link together to form the language-independent wikidata knowledge graph Cat Domesticated Animal Animal Subclass Of Subclass Of Animalia Taxon name Kingdom Taxon rank
  • 12. Item: Q414043 RELN Genomic start: 103471784 GenLoc assembly: GRCh38 Stated in: Ensembl Release 83 Retrieved: 19 January 2016 Value (numeric) Property Claim Qualifiers References https://www.wikidata.org/wiki/Q414043 Statement Genomic position for Reelin gene
  • 13. Item: Q414043 RELN Encodes: Reelin (protein) Stated in: NCBI homo sapiens annotation release 107 Retrieved: 19 January 2016 Value (item) Property Claim Qualifiers References https://www.wikidata.org/wiki/Q414043 Statement Linking the Reelin gene to a protein it encodes
  • 14. Item: Q13561329 Reelin Cell component: dendrite Determination method: • ISS (Sequence or structural Similarity) • IEA (Electronic annotation) Stated in: Uniprot Retrieved: 21 March 2016 Value (item) Property Claim Qualifiers References https://www.wikidata.org/wiki/Q13561329 Statement Gene ontology annotation for Reelin protein with evidence codes modeled as qualifiers
  • 15. graphical view RELN Reelin encodes dendrite cellular component claim ISS IEA Determination method: qualifiers UniProt stated in retrieved 21 March 2016 References Statement
  • 17. Inter-item links form a giant knowledge graph Everything is connected Reelin, Heart disease, Barack Obama, everything.. https://query.wikidata.org SPARQL endpoint for Wikidata
  • 18. “GO cellular localization annotations for Reelin with evidence code ISS” from http://query.wikidata.org http://tinyurl.com/biowiki-sparql
  • 19. ISS GO cellular localization annotations for Reelin
  • 20. RELN Reelin encodes dendrite cellular component claim ISS IEA Determination method: qualifiers UniProt stated in retrieved 21 March 2016 References Statement SPARQL graph.. http://tinyurl.com/biowiki-sparql
  • 21. “GWAS-based disease associations for Reelin” http://tinyurl.com/biowiki-sparql
  • 23. A new platform for open biocuration? Our Applications Our Database Curators And our community Biomedical knowledge Our Applications Our Applications My Research Grants $ • SPARQL = a common API for accessing content • 1 endpoint to maintain… • Its working
  • 24. Wikidata and the Semantic Web • Hub for linked open data • A lot of initial contents are identifier links (e.g. we link drug items to 18 different schemes) • e.g. see Vemurafenib • Supports federated queries – e.g. you can do one query that spans wikidata content and uniprot RDF content On its way to replacing Dbpedia as the central node
  • 25. Questions about wikidata and the semantic web ?
  • 26. Social controls • Anyone can • Add or edit labels, descriptions, statements, references etc. on existing items • Create new items • Link items to Wikipedia articles • Query using https://query.wikidata.org • Read and write small numbers of edits with https://www.wikidata.org/w/api.php • Propose a new property • Request a bot account for high-volume automated editing Here be dragons..
  • 27. Properties (as of April 10, 2016) • 2196 active properties • 114 new properties that have been proposed but not yet approved Proposal https://www.wikidata.org/wiki/Wikidata:Property_proposal
  • 28. After proposal, community discussion • Each property is left open for discussion by anyone until • An administrator or other person blessed with the power either creates it or decides not to create it based on the discussion • People that enjoy ontology arguments needed here! Lengthy (cut-off) discussion of proposal for ‘extinct’ property
  • 31. Proposal discussions • Can not be avoided • The discussions are long and tiring but important • Many of the people involved are quite experienced • All are trying to make something great • Persistence and patience required
  • 32. Questions on community processes? • (more to come)
  • 33. The first application built on wikidata, Wikipeda Our Database Curators And our community Biomedical knowledge Our Applications Our Applications Su, Schriml, Pavlidis R01 Grant… $
  • 36. Impact of wikidata on Wikipedia Gene Wiki Version 1. {{GNF_Protein_box | Name = Reelin| image = | image_source = | PDB = {{PDB2|4AD9}} | HGNCid = 18512 | MGIid = | Symbol = LACTB2 | AltSymbols =; CGI-83 | IUPHAR = | ChEMBL = | OMIM = None | ECnumber = | Homologene = 9349 | GeneAtlas_image1 = | GeneAtlas_image2 = | GeneAtlas_image3 = | Protein_domain_image = | Function = {{GNF_GO|id=GO:0005515 |text = protein binding}} {{GNF_GO|id=GO:0016787 |text = hydrolase activity}} {{GNF_GO|id=GO:0046872 |text = metal ion binding}} | Component = {{GNF_GO|id=GO:0005739 |text = mitochondrion}} | Process = {{GNF_GO|id=GO:0008152 |text = metabolic process}} | Hs_EntrezGene = 51110 | Hs_Ensembl = ENSG00000147592 | Hs_RefseqmRNA = NM_016027 | Hs_RefseqProtein = NP_057111 | Hs_GenLoc_db = hg38 | Hs_GenLoc_chr = 8 | Hs_GenLoc_start = 70635318 | Hs_GenLoc_end = 70669174 | Hs_Uniprot = Q53H82 | Mm_EntrezGene = 212442 | Mm_Ensembl = ENSMUSG00000025937 | Mm_RefseqmRNA = NM_145381 | Mm_RefseqProtein = NP_663356 | Mm_GenLoc_db = mm10 | Mm_GenLoc_chr = 1 | Mm_GenLoc_start = 13623330 | Mm_GenLoc_end = 13660546 | Mm_Uniprot = Q99KR3 | path = PBB/51110}} = Gene Wiki Version 2. {{Infobox gene}} • All data in Wikidata • 1 Lua script works for all genes = (1 of these for every gene)
  • 37. Wikidata use increasing on Wikipedia • https://en.wikipedia.org/wiki/Category: Templates_using_data_from_Wikidata • 81 templates indicate that they use it
  • 38. Questions about Wikidata and Wikipedia?
  • 39. Application #2 Web Apollo Genome Browser 39Putman et al (2016)
  • 40. The next application built on wikidata, yours? Our Database Curators And our community Biomedical knowledge ???? ???? $ My Research Grants Its your data as much as anyone else’s!
  • 41. Current state: seeding nodes for the graph • All human, mouse genes and proteins (swissprot) • All Gene Ontology terms • All FDA approved drugs • All Human Disease Ontology terms • 109 reference microbial genomes Burgstaller-Muelbacher et al (2016) Database Mitraka et al (2015) Semantic Web Applications for the Life Sciences Putman et al (2016) Database
  • 42. Next data step: connecting the nodes • Our group • Human: Gene-disease (PhenoCARTA team) • Human: Drug-disease (CHEMBL, NDF-RT) • Human: Gene-drug (CHEMBL, NDF-RT) • Expanding microbial information (Putman, Koehurst, Knight lab) • Your group • ? • Today • A knowledge base for understanding Zika ?
  • 43. Acknowledgements Gene Wikidata Team Andra Waagmeester (Micelio) * Sebastian Burgstaller (Scripps) * Tim Putman (Scripps) * Elvira Mitraka (U Maryland) Julia Turner (Scripps) Justin Leong (UBC) Lynn Schriml (U Maryland) Paul Pavlidis (UBC) Andrew Su (Scripps) Ginger Tsueng (Scripps) Contact bgood@scripps.edu @bgood on twitter* First author on manuscript cited in this presentation Adapted logo Su Laboratory at TSRI The 16,950 other active editors of Wikidata and especially the 693 that joined last month and the 809 that joined the month before that and the 721 that joined the month before that..

Notas del editor

  1. Wikidata: Advancing science through semantic integration of genes, diseases, and drugs
  2. This is the central point I want to make. Wikidata can be used to to build knowledge-based applications, lowering the barrier to entry for building apps and reducing challenges of downstream data integration. Before coming back to this, I will explain why.
  3. This is the central point I want to make. Wikidata can be used to to build knowledge-based applications, lowering the barrier to entry for building apps and reducing challenges of downstream data integration. May
  4. This is the central point I want to make. Wikidata can be used to to build knowledge-based applications, lowering the barrier to entry for building apps and reducing challenges of downstream data integration. Before coming back to this, I will explain why.
  5. By mixing the data into wikidata, we reduce API proliferation, easing application formation. Over 1 billion triples Fast Stable since around September 2015
  6. Successor to DBpedia Simplest and first use case are identifier mapping Supports linking of distributed linked data sets
  7. By mixing the data into wikidata, we reduce API proliferation, easing application formation. Over 1 billion triples Fast Stable since around September 2015
  8. This is the first application of the work that we have done
  9. By mixing the data into wikidata, we reduce API proliferation, easing application formation. Over 1 billion triples Fast Stable since around September 2015