SlideShare a Scribd company logo
1 of 130
Download to read offline
T H E W O R L D O F
B I O C U R AT I O N
O P T I M I Z I N G I T S I M PA C T
April 7, 2014—Seventh International Biocuration Conference
S O M E O N E W H O I S R E S P O N S I B L E F O R T H E
C A R E A N D S U P E R V I S I O N O F B I O L O G I C A L
K N O W L E D G E R E S O U R C E S A N D T H E I R U S E
W H A T I S A B I O C U R A T O R ?
W H AT D O B I O C U R AT O R S D O T O D AY ?
• Credits to Kaveh Bazargan ᔥ
• @kaveh1000
F R U I T I N F O O D
P R O C E S S O R
S M O O T H I E
R E S E A R C H
R E S E A R C H I N
W O R D P R O C E S S O R
P D F
F R U I T ? ?
R E S E A R C H ? ?
?
R E S E A R C H ? ?
Y O U , T H E
B I O C U R AT O R
B I O C U R AT O R S O F T H E W O R L D U N I T E !
• You have nothing to lose but your PDF files
!
!
X
O U R R O L E I N T H E
R E S E A R C H L I F E C Y C L E
T H E W O R L D O F B I O C U R A T I O N
http://www.nbcnews.com/id/49258816/ns/technology_and_science-science/t/live-concert-microbial-data-turned-song-lab/#.UzSB9ceT4_E
D E S I G N I N G E X P E R I M E N T S
http://www.nbcnews.com/id/49258816/ns/technology_and_science-science/t/live-concert-microbial-data-turned-song-lab/#.UzSB9ceT4_E
D E S I G N I N G E X P E R I M E N T S
http://www.langdonbiology.org/AP/labs/Notebook/AP_notebook.htm
C O L L E C T I N G D ATA
Thomas Nast - http://www.victorianweb.org/art/illustration/nast/51.jpg
W R I T I N G
U P
R E S U LT S
http://rrresearch.fieldofscience.com/2012_02_01_archive.html
R E V I E W I N G
C O N C L U S I O N S
C A P T U R I N G
K N O W L E D G E
I S B
C A P T U R I N G
K N O W L E D G E
D E S I G N I N G E X P E R I M E N T S C O L L E C T I N G D ATA
R E V I E W I N G
C O N C L U S I O N S
W R I T I N G
U P
R E S U LT S
~ 3 0 0 B I O C U R A T O R S
B I O C U R AT I O N I N V E R S I O N
D E S I G N I N G
E X P E R I M E N T S
C O L L E C T I N G D ATA
W R I T I N G U P R E S U LT S
R E V I E W I N G C O N C L U S I O N S
C A P T U R I N G K N O W L E D G E
http://www.nsf.gov/statistics/nsf13331/pdf/nsf13331.pdf
H U N D R E D S O F T H O U S A N D S O F G R A D
S T U D E N T S
P O S T- D O C S
L A B O R AT O R I E S
J O U R N A L S
I N T H E L A B
E A R LY I N T E R V E N T I O N —
S U P P O R T I N G S TA N D A R D S
• Promote community-accepted identifiers, ontologies,
& formats
S U P P O R T S TA N D A R D S , T H E Y ’ R E O U R
F R I E N D
• November, 1999
• 45 biologists
• 14 days
• 140 megabases of Drosophila genome
!
• Published in March 2000
G E N E O N T O L O G Y, E T A L .
Q U E S T F O R
O R T H O L O G S
questfororthologs.org/ — www.ebi.ac.uk/reference_proteomes
Q U E S T F O R
O R T H O L O G S
• 30 phylogenomic databases
questfororthologs.org/ — www.ebi.ac.uk/reference_proteomes
Q U E S T F O R
O R T H O L O G S
• 30 phylogenomic databases
• Vary in # of species, taxonomic range, sampling density,
and methodology
questfororthologs.org/ — www.ebi.ac.uk/reference_proteomes
Q U E S T F O R
O R T H O L O G S
• 30 phylogenomic databases
• Vary in # of species, taxonomic range, sampling density,
and methodology
• Joint benchmarking effort
questfororthologs.org/ — www.ebi.ac.uk/reference_proteomes
Q U E S T F O R
O R T H O L O G S
• 30 phylogenomic databases
• Vary in # of species, taxonomic range, sampling density,
and methodology
• Joint benchmarking effort
• Only possible through the use of shared reference
proteomes and formats
questfororthologs.org/ — www.ebi.ac.uk/reference_proteomes
Q U E S T F O R
O R T H O L O G S
• 30 phylogenomic databases
• Vary in # of species, taxonomic range, sampling density,
and methodology
• Joint benchmarking effort
• Only possible through the use of shared reference
proteomes and formats
questfororthologs.org/ — www.ebi.ac.uk/reference_proteomes
E A R LY I N T E R V E N T I O N —
S U P P O R T I N G S TA N D A R D S
• Promote community-accepted identifiers, ontologies, & formats
• Develop and follow guidelines (paper and web-based)
• e.g. Gaudet, P., et al. Towards BioDBcore: a community-defined
information specification for biological databases. Database
2011. PMCID: PMC3017395
• Resource Identification Initiative
• www.force11.org/Resource_identification_initiative
• Vasilevsky NA, et al. On the reproducibility of science: unique
identification of research resources in the biomedical literature.
PeerJ. 2013 Sep 5;1:e148. doi: 10.7717/peerj.148. PubMed
PMID: 24032093; PubMed Central PMCID: PMC3771067.
E A R LY I N T E R V E N T I O N —
S U P P O R T I N G S TA N D A R D S
• Promote community-accepted identifiers, ontologies,
& formats
• Embed community accepted standards in the lab
environment
K N O C K O U T
M O U S E
P R O J E C T 2
• Broad standardized phenotyping of knockout mice on a
standard genetic background
• Data collection from many centres
• www.mousephenotype.org
K N O C K O U T
M O U S E
P R O J E C T 2
• Broad standardized phenotyping of knockout mice on a
standard genetic background
• Data collection from many centres
• www.mousephenotype.org
Cindy Smith
P R O T O C O L S A R E S TA N D A R D I Z E D
R E Q U I R E U S E O F PA R T I C U L A R O N T O L O G Y
T E R M S T O D E S C R I B E P H E N O T Y P E
E A R LY I N T E R V E N T I O N —
S U P P O R T I N G S TA N D A R D S
• Promote community-accepted identifiers, ontologies,
& formats
• Embed community accepted standards in the lab
environment
• Work with labs to embed standards into their data
generation pipeline
E A R LY I N T E R V E N T I O N —
S U P P O R T I N G S TA N D A R D S
• Promote community-accepted identifiers, ontologies,
& formats
• Embed community accepted standards in the lab
environment
• Stealth standards
S TA N D A R D S T H R O U G H U T I L I T Y —
A P O L L O
C S I R O V I D E O — D E M O A T G E N O M E A R C H I T E C T. O R G
S TA N D A R D S T H R O U G H U T I L I T Y —
A P O L L O
C S I R O V I D E O — D E M O A T G E N O M E A R C H I T E C T. O R G
T O O L S F O R T H E C O M M U N I T Y
T O O L S F O R T H E C O M M U N I T Y
• Web-based so researchers anywhere have access
T O O L S F O R T H E C O M M U N I T Y
• Web-based so researchers anywhere have access
• Concurrent access supports real-time collaboration
T O O L S F O R T H E C O M M U N I T Y
• Web-based so researchers anywhere have access
• Concurrent access supports real-time collaboration
• Built-in support for standards (transparently compliant)
T O O L S F O R T H E C O M M U N I T Y
• Web-based so researchers anywhere have access
• Concurrent access supports real-time collaboration
• Built-in support for standards (transparently compliant)
• Automatic generation of ready-made computable
data
T O O L S F O R T H E C O M M U N I T Y
• Web-based so researchers anywhere have access
• Concurrent access supports real-time collaboration
• Built-in support for standards (transparently compliant)
• Automatic generation of ready-made computable
data
• Client-side application relieves server bottleneck and
supports privacy
E A R LY I N T E R V E N T I O N —
S U P P O R T I N G S TA N D A R D S
• Promote community-accepted identifiers, ontologies, & formats
• Embed community accepted standards in the lab environment
• Stealth standards
• Re-purpose internal curation tools for external users
• Provide on-line documentation, hands-on training and rapid-response user
help
• Work with educators to make these tools an integral part of the curriculum
• e.g. CACAO (Critical Assessment of Community Annotation using
Ontologies), ecoliwiki.net/colipedia/index.php/CACAO_0.1
• DNA subway (Apollo)
S U B M I S S I O N
• CANTO: curation.pombase.org
• Structured Digital Abstracts
• Identifiers for all named genes, proteins, metabolites or other objects in the
article
• Main results described in simple ontology terms
• Experimental evidence types
• Not only a synopsis of the results but computer-readable
• Gerstein, M., et al. Structured digital abstract makes text mining easy.
Nature 447, 142 (10 May 2007) | doi:10.1038/447142a.
• Minimal Information reporting guidelines
• http://mibbi.sourceforge.net/portal.shtml
S U B M I T T I N G D ATA —
I N A S T R U C T U R E D WAY
P U B L I S H I N G
P U B L I S H I N G
P U B L I S H I N G
• First there were letters
P U B L I S H I N G
• First there were letters
• Then Henry Oldenburg created the first scientific journal in 1665
P U B L I S H I N G
• First there were letters
• Then Henry Oldenburg created the first scientific journal in 1665
• Result: too much to absorb
P U B L I S H I N G
• First there were letters
• Then Henry Oldenburg created the first scientific journal in 1665
• Result: too much to absorb
Washed away on the sea of information
P E E R A N D E D I T O R I A L
R E V I E W B E C A M E A F I LT E R
C O N S E Q U E N T LY …
• Figshare: figshare.org
• iDigBio: www.idigbio.org
• Dryad: datadryad.org
• eLife: www.elifesciences.org
• Unlike journal articles, the scale of web-native
publishing may overwhelm attempts at manual
curation (using current strategies)
T H E M E D I U M O F P U B L I C AT I O N I S
C H A N G I N G
D O W E N E E D T O
C U R AT E ?
S C H O L A R S H I P : B E Y O N D T H E PA P E R . J A S O N P R I E M .
N AT U R E 4 9 5 , 4 3 7 – 4 4 0 ( 2 8 M A R C H 2 0 1 4 )
“…powerful, online filters will distill communities
impact judgements algorithmically”
S O M E S AY N O …
D O W E N E E D T O C U R AT E ?
• Resolution of differences
• Clarity, eliminating noise
• Validation & design of automated methods
E V E N A P L A C E L I K E G O O G L E U S E S
C U R AT O R S ( * A N D S O F T WA R E )
• Hundreds of operators per country
• Multiple kinds of errors: overlapping jurisdictions, accidental
merges, road maps to satellite images mismatch, etc.
• Every road that you see has been hand-massaged
!
!
http://www.theatlantic.com/technology/archive/2012/09/how-google-builds-its-maps-and-what-it-means-for-the-future-of-everything/
261913/
D O W E N E E D T O C U R AT E ?
• Resolution of differences
• Clarity, eliminating noise
• Validation & design of automated methods
C L A R I T Y
• Answer boxes: Quick answers to concrete questions
!
!
!
!
C L A R I T Y
• Answer boxes: Quick answers to concrete questions
!
!
!
!
C L A R I T Y
• Answer boxes: Quick answers to concrete questions
!
!
!
!
C L A R I T Y
• Answer boxes: Quick answers to concrete questions
!
!
!
!
• Much of this information comes
from Freebase which is structured
in terms of entities and properties
C L A R I T Y
• Answer boxes: Quick answers to concrete questions
!
!
!
!
• Much of this information comes
from Freebase which is structured
in terms of entities and properties
Robert West, et al. Knowledge Base Completion via Search-Based
Question Answering. http://www.cs.ubc.ca/~murphyk/Papers/www14.pdf
WWW’14 April 7–11, 2014, Seoul, Korea. ACM 978-1-4503-2744-2/14/04.
DOI:2568032
D O W E N E E D T O C U R AT E ?
• Resolution of differences
• Clarity, eliminating noise
• Validation & design of automated methods
• PDF is still the dominant form of distribution
• PDF “Annotation”
• UTOPIA, www.utopiadocs.com
• DOMEO, swan.mindinformatics.org
• Textpresso, www.textpresso.org
• All of these are still lacking domain specifics (or need to be taught)
• FORCE11, www.force11.org
• Common goal is advancing scientific communications
• Beyond the PDF
L I T E R AT U R E I S I N F O R M AT I V E
B U T I S N O T I N F O R M AT I O N
X
VA L I D AT I O N A N D D E S I G N O F
A U T O M AT E D M E T H O D S
VA L I D AT I O N A N D D E S I G N O F
A U T O M AT E D M E T H O D S
VA L I D AT I O N A N D D E S I G N O F
A U T O M AT E D M E T H O D S
Write/modify
software
VA L I D AT I O N A N D D E S I G N O F
A U T O M AT E D M E T H O D S
Run the algorithm
Write/modify
software
VA L I D AT I O N A N D D E S I G N O F
A U T O M AT E D M E T H O D S
Run the algorithm
Write/modify
software
Evaluate results
VA L I D AT I O N A N D D E S I G N O F
A U T O M AT E D M E T H O D S
• Requires trusted reference datasets!
Run the algorithm
Write/modify
software
Evaluate results
VA L I D AT I O N A N D D E S I G N O F
A U T O M AT E D M E T H O D S
• Requires trusted reference datasets!
• Biocurators are partners with developers!
Run the algorithm
Write/modify
software
Evaluate results
S C H O L A R S H I P : B E Y O N D T H E PA P E R . J A S O N P R I E M .
N AT U R E 4 9 5 , 4 3 7 – 4 4 0 ( 2 8 M A R C H 2 0 1 4 )
“…powerful, online filters will distill communities
impact judgements algorithmically”
D O W E N E E D T O
C U R AT E ?
T H E PA R A B L E O F G O O G L E F L U : T R A P S I N B I G D ATA
A N A LY S I S . D AV I D L A Z E R E T A L . S C I E N C E 1 4 M A R C H 2 0 1 4 :
V O L . 3 4 3 N O . 6 1 7 6 P P. 1 2 0 3 - 1 2 0 5
“‘Big data hubris” is the often implicit assumption that
big data are a substitute for, rather than a supplement
to, traditional data collection and analysis.”
D O W E N E E D T O
C U R AT E ?
D O W E N E E D T O C U R AT E ?
• Yes
!
!
!
!
D O W E N E E D T O C U R AT E ?
• Yes
!
!
!
!
• But…
S Y S T E M AT I C R E V I E W &
C R I T I C I S M I S R E Q U I R E D
O U R S T R E N G T H I S I N Q U A L I T Y O F T H E I N F O R M A T I O N W E C A N
P R O V I D E
C U S I C K , M . , E T A L . L I T E R AT U R E - C U R AT E D P R O T E I N
I N T E R A C T I O N D ATA S E T S
N AT M E T H O D S . J A N 2 0 0 9 ; 6 ( 1 ) : 3 9 – 4 6 .
P M C I D : P M C 2 6 8 3 7 4 5
“…literature curated datasets have inherent
reliability difficulties…”
H O W C A N B I O C U R AT O R S
A D D R E S S C R I T I C I S M S ?
G R E E N B E R G , S . , H O W C I TAT I O N D I S T O R T I O N S C R E AT E U N F O U N D E D
A U T H O R I T Y: A N A LY S I S O F A C I TAT I O N N E T W O R K
B M J J U LY 2 0 0 9 ; 3 3 9 D O I : H T T P : / / D X . D O I . O R G / 1 0 . 1 1 3 6 /
T H E R I S K ( B Y A N A L O G Y )
56
W E ' R E R E S P O N S I B L E F O R T H E Q U A L I T Y
• “Reviewing the quality of the data is an obligation of
any entity that assumes responsibility over the data.”
• Limor Peer et al., IDCC 2014
PA I N T A P O P T O S I S - S U M M A RY
• 52 families annotated: 

- 8 were par$cipants in execution phase of apoptosis;
• 44 others are either:
A. upstream	
  of	
  apoptosis	
  	
  
B. phenotypes	
  
C. targets

Example 1: Protein (cytochrome c) upstream of
apoptosis execution
Cytochrome c is directly involved in apoptotic DNA fragmentation
Example 1: Protein (cytochrome c) upstream of
apoptosis execution
Cytochrome c is directly involved in apoptotic DNA fragmentation
➢ [Cells] – [cytochrome c] = No apoptotic DNA fragmentation
Example 1: Protein (cytochrome c) upstream of
apoptosis execution
Cytochrome c is directly involved in apoptotic DNA fragmentation
➢ [Cells] – [cytochrome c] = No apoptotic DNA fragmentation
➢ [Cells] – [cytochrome c] + [cytochrome c] = apoptotic DNA fragmentation
Example 2: Phenotype of reduced cell survival and
increased DNA fragmentation
• E3 ubiquitin-protein ligase TRAF7

was annotated to execution phase of apoptosis
➢ Exogenous expression of TRAF7
➢ No other data in terms of where
in apoptosis this may be.
!
➢ All we know is altering TRAF7
levels affects apoptosis.
Example 3: Target
DSG2 was annotated to execution phase of
apoptosis
Example 3: Target
DSG2 was annotated to execution phase of
apoptosis
Example 3: Target
DSG2 was annotated to execution phase of
apoptosis
DSG2 is a *target* of a protease (caspase), and
although its degradation indeed seems to be a part of
apoptosis it does not *mediate* apoptosis.
P R O V E T H E N E E D F O R B I O C U R AT I O N
• Publish: Quantitative improvements before/after
• Publish: Curator consistency studies
• Publish: Independent external reviews
R E C O G N I T I O N & C R E D I T
O R C I D . O R G
E N A B L I N G
R E S E A R C H
W H AT I S A B I O C U R AT O R ?
W H AT I S A B I O C U R AT O R ?
W H AT I S A B I O C U R AT O R ?
W H AT I S A B I O C U R AT O R ?
• A highly skilled and trained keeper of our biological
heritage of knowledge.
W H AT I S A B I O C U R AT O R ?
• A highly skilled and trained keeper of our biological
heritage of knowledge.
• A content specialist who understands the research and
can succinctly distill biological research results into
computable data
W H AT I S A B I O C U R AT O R ?
• A highly skilled and trained keeper of our biological
heritage of knowledge.
• A content specialist who understands the research and
can succinctly distill biological research results into
computable data
• Considers the ease of finding this information, its
relatedness to other information, and its research and
educational usability
 B6.Cg-­‐Alms1foz/fox/J
increased	
  weight,	
  
adipose	
  tissue	
  volume,	
  	
  
glucose	
  homeostasis	
  altered
ALSM1(NM_015120.4)	
  
[c.10775delC]	
  +	
  [-­‐]
GENOTYPE
PHENOTYPE
obesity,	
  
diabetes	
  mellitus,	
  
	
  insulin	
  resistance
increased	
  food	
  intake,	
  	
  
hyperglycemia,	
  
insulin	
  resistance
kcnj11c14/c14;	
  insrt143/+(AB)
M O D E L S R E C A P I T U L AT E VA R I O U S
P H E N O T Y P I C A S P E C T S O F D I S E A S E
 B6.Cg-­‐Alms1foz/fox/J
increased	
  weight,	
  
adipose	
  tissue	
  volume,	
  	
  
glucose	
  homeostasis	
  altered
GENOTYPE
PHENOTYPE
obesity,	
  
diabetes	
  mellitus,	
  
	
  insulin	
  resistance
increased	
  food	
  intake,	
  	
  
hyperglycemia,	
  
insulin	
  resistance
kcnj11c14/c14;	
  insrt143/+(AB)
M O D E L S R E C A P I T U L AT E VA R I O U S
P H E N O T Y P I C A S P E C T S O F D I S E A S E
?
R E S E A R C H R E S O U R C E S
Doelken S C et al. Dis. Model.
Mech. 2013;6:358-372
Smedley D et al. Database. 2013; bat025
Mungall CJ et al. Genome Biol. 2010; 11(1):R2
Washington N et al. Plos Biol 2009; e1000247
C R O S S - S P E C I E S P H E N O T Y P E C O M PA R I S O N S 

B Y S E M A N T I C S I M I L A R I T Y
CANDIDATE GENE PRIORITIZATION
PHENOTYPIC INTERPRETATION OF VARIANTS IN EXOMES (PHIVE)
Whole exome
Remove off-target and
common variants
Variant score
from allele freq and pathogenicity
Phenotype score
from phenotypic similarity
PhenIX/PhIVE score
to give final candidates
http://monarchinitiative.org	
  
C O N F I R M E D D I A G N O S E S
• Infantile Parkinsonism-dystonia
• Wiedemann Steiner syndrome
• de novo SYNGAP1 mutation leading autosomal dominant
mental retardation
• Frank-ter Haar syndrome
• Infantile hypophosphatasia
• … (~28%)
R E L AT E D N E S S A C R O S S B I O L O G Y
R E L AT E D N E S S A C R O S S B I O L O G Y
• Bio-Curator, not bio-Archivist
• Actively trying to represent current best understanding
R E L AT E D N E S S A C R O S S B I O L O G Y
• Bio-Curator, not bio-Archivist
• Actively trying to represent current best understanding
• Support interoperability
R E L AT E D N E S S A C R O S S B I O L O G Y
• Bio-Curator, not bio-Archivist
• Actively trying to represent current best understanding
• Support interoperability
• Support research and educational usability
R E L AT E D N E S S A C R O S S B I O L O G Y
• Bio-Curator, not bio-Archivist
• Actively trying to represent current best understanding
• Support interoperability
• Support research and educational usability
• Support inference
R E L AT E D N E S S A C R O S S B I O L O G Y
• Bio-Curator, not bio-Archivist
• Actively trying to represent current best understanding
• Support interoperability
• Support research and educational usability
• Support inference
• Not just for supporting searches, not just for finding
PDF/online papers!
W H AT C A N
B E D O N E ?
W H AT C A N
B E D O N E ?
W H AT C A N
B E D O N E ?
W H AT C A N
B E D O N E ?
W H AT C A N
B E D O N E ?
B I O D I V E R S I T Y D ATA J O U R N A L
B I O D I V E R S I T Y D ATA J O U R N A L
B I O D I V E R S I T Y D ATA J O U R N A L
F R O M W R I T I N G , S U B M I S S I O N , P E E R - R E V I E W, E D I T I N G , P U B L I C AT I O N T O D I S S E M I N AT I O N !
W H AT C A N I S B D O ?
W H AT C A N I S B D O ?
• Tangible support of standards efforts
• QfO, RII, MI, publish guidelines, validators …
W H AT C A N I S B D O ?
• Tangible support of standards efforts
• QfO, RII, MI, publish guidelines, validators …
• Create a curation mindset across the entire life cycle
• Support embedded/repurposed software, education, actively
engage with text-miners, provide on-line support …
W H AT C A N I S B D O ?
• Tangible support of standards efforts
• QfO, RII, MI, publish guidelines, validators …
• Create a curation mindset across the entire life cycle
• Support embedded/repurposed software, education, actively
engage with text-miners, provide on-line support …
• Prove the necessity for curation
• Publish studies, greater emphasis on review and quality (assessment)
W H AT C A N I S B D O ?
• Tangible support of standards efforts
• QfO, RII, MI, publish guidelines, validators …
• Create a curation mindset across the entire life cycle
• Support embedded/repurposed software, education, actively
engage with text-miners, provide on-line support …
• Prove the necessity for curation
• Publish studies, greater emphasis on review and quality (assessment)
• Work with traditional publishers
• FORCE11, structured submissions
W H AT C A N Y O U D O ?
• Consider
• The ease of finding information
• Its relatedness to other information
• Its research and educational usability
R E S E A R C H ? ?
Y O U , T H E
B I O C U R AT O R
I S B
A C K N O W L E D G E M E N T S A N D T H A N K S
Y O U A R E N O T A L O N E

More Related Content

Viewers also liked

Quest for Orthologs: anchoring comparative biology research (TDWG 2013)
Quest for Orthologs: anchoring comparative biology research (TDWG 2013)Quest for Orthologs: anchoring comparative biology research (TDWG 2013)
Quest for Orthologs: anchoring comparative biology research (TDWG 2013)
Suzanna Lewis
 

Viewers also liked (7)

Quest for Orthologs: anchoring comparative biology research (TDWG 2013)
Quest for Orthologs: anchoring comparative biology research (TDWG 2013)Quest for Orthologs: anchoring comparative biology research (TDWG 2013)
Quest for Orthologs: anchoring comparative biology research (TDWG 2013)
 
Ditch Your Smartphone?
Ditch Your Smartphone?Ditch Your Smartphone?
Ditch Your Smartphone?
 
10 Insightful Quotes On Designing A Better Customer Experience
10 Insightful Quotes On Designing A Better Customer Experience10 Insightful Quotes On Designing A Better Customer Experience
10 Insightful Quotes On Designing A Better Customer Experience
 
Learn BEM: CSS Naming Convention
Learn BEM: CSS Naming ConventionLearn BEM: CSS Naming Convention
Learn BEM: CSS Naming Convention
 
How to Build a Dynamic Social Media Plan
How to Build a Dynamic Social Media PlanHow to Build a Dynamic Social Media Plan
How to Build a Dynamic Social Media Plan
 
Lightning Talk #9: How UX and Data Storytelling Can Shape Policy by Mika Aldaba
Lightning Talk #9: How UX and Data Storytelling Can Shape Policy by Mika AldabaLightning Talk #9: How UX and Data Storytelling Can Shape Policy by Mika Aldaba
Lightning Talk #9: How UX and Data Storytelling Can Shape Policy by Mika Aldaba
 
Succession “Losers”: What Happens to Executives Passed Over for the CEO Job?
Succession “Losers”: What Happens to Executives Passed Over for the CEO Job? Succession “Losers”: What Happens to Executives Passed Over for the CEO Job?
Succession “Losers”: What Happens to Executives Passed Over for the CEO Job?
 

Similar to Lewis isb 7 april2014

From Content Strategy to Drupal Site Building - Connecting the Dots
From Content Strategy to Drupal Site Building - Connecting the DotsFrom Content Strategy to Drupal Site Building - Connecting the Dots
From Content Strategy to Drupal Site Building - Connecting the Dots
Ronald Ashri
 

Similar to Lewis isb 7 april2014 (20)

A Central Role for DOAJ in the Global Ecosystem of Open Access infrastructures
A Central Role for DOAJ in the Global Ecosystem of Open Access infrastructuresA Central Role for DOAJ in the Global Ecosystem of Open Access infrastructures
A Central Role for DOAJ in the Global Ecosystem of Open Access infrastructures
 
ResearchGate - How do 'Social Networks for Scientists' Affect Libraries?
ResearchGate - How do 'Social Networks for Scientists' Affect Libraries?ResearchGate - How do 'Social Networks for Scientists' Affect Libraries?
ResearchGate - How do 'Social Networks for Scientists' Affect Libraries?
 
How to improve your research impact and who is talking about (or using) your...
How to improve your research impact  and who is talking about (or using) your...How to improve your research impact  and who is talking about (or using) your...
How to improve your research impact and who is talking about (or using) your...
 
Open access developments in Russia
Open access developments in Russia  Open access developments in Russia
Open access developments in Russia
 
Open Access developments in Russia and other important regions in the world
Open Access developments in Russia and other important regions in the worldOpen Access developments in Russia and other important regions in the world
Open Access developments in Russia and other important regions in the world
 
Data Interoperability for Learning Analytics and Lifelong Learning
Data Interoperability for Learning Analytics and Lifelong LearningData Interoperability for Learning Analytics and Lifelong Learning
Data Interoperability for Learning Analytics and Lifelong Learning
 
Criteria for open access publishing and indexing in DOAJ
Criteria for open access publishing and indexing in DOAJCriteria for open access publishing and indexing in DOAJ
Criteria for open access publishing and indexing in DOAJ
 
The role of DOAJ in quality assurance of OA publishing
The role of DOAJ in quality assurance of OA publishingThe role of DOAJ in quality assurance of OA publishing
The role of DOAJ in quality assurance of OA publishing
 
The role of DOAJ in quality assurance of OA publishing
The role of DOAJ in quality assurance of OA publishingThe role of DOAJ in quality assurance of OA publishing
The role of DOAJ in quality assurance of OA publishing
 
How To Improve Your Research Impact? 30+ tips to use befóre, whíle and áfter ...
How To Improve Your Research Impact? 30+ tips to use befóre, whíle and áfter ...How To Improve Your Research Impact? 30+ tips to use befóre, whíle and áfter ...
How To Improve Your Research Impact? 30+ tips to use befóre, whíle and áfter ...
 
30 tips How to (possibly) Improve Your Research Impact
30 tips How to (possibly) Improve Your Research Impact 30 tips How to (possibly) Improve Your Research Impact
30 tips How to (possibly) Improve Your Research Impact
 
From Content Strategy to Drupal Site Building - Connecting the Dots
From Content Strategy to Drupal Site Building - Connecting the DotsFrom Content Strategy to Drupal Site Building - Connecting the Dots
From Content Strategy to Drupal Site Building - Connecting the Dots
 
SCAR biodiversity information ecosystems
SCAR biodiversity information ecosystemsSCAR biodiversity information ecosystems
SCAR biodiversity information ecosystems
 
GSMS PhD Development Speaker Series: how to improve your research impact? an...
GSMS PhD Development Speaker Series:  how to improve your research impact? an...GSMS PhD Development Speaker Series:  how to improve your research impact? an...
GSMS PhD Development Speaker Series: how to improve your research impact? an...
 
Assessing the quality of scholarly publishing
Assessing the quality of scholarly publishing  Assessing the quality of scholarly publishing
Assessing the quality of scholarly publishing
 
Pure in Groningen & Horizon Report 2015 Library Edition
Pure in Groningen & Horizon Report 2015 Library EditionPure in Groningen & Horizon Report 2015 Library Edition
Pure in Groningen & Horizon Report 2015 Library Edition
 
Researchers Night Frascati Scienza
Researchers Night  Frascati ScienzaResearchers Night  Frascati Scienza
Researchers Night Frascati Scienza
 
Listening To a Forest for Project Health
Listening To a Forest for Project HealthListening To a Forest for Project Health
Listening To a Forest for Project Health
 
How to Make your Research Process more Effective? 4 Must-Use Tools for Resear...
How to Make your Research Process more Effective? 4 Must-Use Tools for Resear...How to Make your Research Process more Effective? 4 Must-Use Tools for Resear...
How to Make your Research Process more Effective? 4 Must-Use Tools for Resear...
 
Pedagogical v. pathfinder: reimagining course and research guides for student...
Pedagogical v. pathfinder: reimagining course and research guides for student...Pedagogical v. pathfinder: reimagining course and research guides for student...
Pedagogical v. pathfinder: reimagining course and research guides for student...
 

Recently uploaded

Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
FIDO Alliance
 
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfBreaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
UK Journal
 

Recently uploaded (20)

2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch Tuesday
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptx
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptx
 
Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - Questionnaire
 
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The InsideCollecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage Intacct
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
 
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
 
TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfBreaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptx
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdf
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
 

Lewis isb 7 april2014

  • 1. T H E W O R L D O F B I O C U R AT I O N O P T I M I Z I N G I T S I M PA C T April 7, 2014—Seventh International Biocuration Conference
  • 2. S O M E O N E W H O I S R E S P O N S I B L E F O R T H E C A R E A N D S U P E R V I S I O N O F B I O L O G I C A L K N O W L E D G E R E S O U R C E S A N D T H E I R U S E W H A T I S A B I O C U R A T O R ?
  • 3. W H AT D O B I O C U R AT O R S D O T O D AY ? • Credits to Kaveh Bazargan ᔥ • @kaveh1000
  • 4.
  • 5. F R U I T I N F O O D P R O C E S S O R
  • 6. S M O O T H I E
  • 7. R E S E A R C H
  • 8. R E S E A R C H I N W O R D P R O C E S S O R
  • 10. F R U I T ? ?
  • 11. R E S E A R C H ? ? ?
  • 12. R E S E A R C H ? ? Y O U , T H E B I O C U R AT O R
  • 13. B I O C U R AT O R S O F T H E W O R L D U N I T E ! • You have nothing to lose but your PDF files ! ! X
  • 14. O U R R O L E I N T H E R E S E A R C H L I F E C Y C L E T H E W O R L D O F B I O C U R A T I O N
  • 18. Thomas Nast - http://www.victorianweb.org/art/illustration/nast/51.jpg W R I T I N G U P R E S U LT S
  • 20. C A P T U R I N G K N O W L E D G E
  • 21. I S B C A P T U R I N G K N O W L E D G E D E S I G N I N G E X P E R I M E N T S C O L L E C T I N G D ATA R E V I E W I N G C O N C L U S I O N S W R I T I N G U P R E S U LT S
  • 22. ~ 3 0 0 B I O C U R A T O R S B I O C U R AT I O N I N V E R S I O N D E S I G N I N G E X P E R I M E N T S C O L L E C T I N G D ATA W R I T I N G U P R E S U LT S R E V I E W I N G C O N C L U S I O N S C A P T U R I N G K N O W L E D G E http://www.nsf.gov/statistics/nsf13331/pdf/nsf13331.pdf H U N D R E D S O F T H O U S A N D S O F G R A D S T U D E N T S P O S T- D O C S L A B O R AT O R I E S J O U R N A L S
  • 23. I N T H E L A B
  • 24. E A R LY I N T E R V E N T I O N — S U P P O R T I N G S TA N D A R D S • Promote community-accepted identifiers, ontologies, & formats
  • 25. S U P P O R T S TA N D A R D S , T H E Y ’ R E O U R F R I E N D • November, 1999 • 45 biologists • 14 days • 140 megabases of Drosophila genome ! • Published in March 2000 G E N E O N T O L O G Y, E T A L .
  • 26. Q U E S T F O R O R T H O L O G S questfororthologs.org/ — www.ebi.ac.uk/reference_proteomes
  • 27. Q U E S T F O R O R T H O L O G S • 30 phylogenomic databases questfororthologs.org/ — www.ebi.ac.uk/reference_proteomes
  • 28. Q U E S T F O R O R T H O L O G S • 30 phylogenomic databases • Vary in # of species, taxonomic range, sampling density, and methodology questfororthologs.org/ — www.ebi.ac.uk/reference_proteomes
  • 29. Q U E S T F O R O R T H O L O G S • 30 phylogenomic databases • Vary in # of species, taxonomic range, sampling density, and methodology • Joint benchmarking effort questfororthologs.org/ — www.ebi.ac.uk/reference_proteomes
  • 30. Q U E S T F O R O R T H O L O G S • 30 phylogenomic databases • Vary in # of species, taxonomic range, sampling density, and methodology • Joint benchmarking effort • Only possible through the use of shared reference proteomes and formats questfororthologs.org/ — www.ebi.ac.uk/reference_proteomes
  • 31. Q U E S T F O R O R T H O L O G S • 30 phylogenomic databases • Vary in # of species, taxonomic range, sampling density, and methodology • Joint benchmarking effort • Only possible through the use of shared reference proteomes and formats questfororthologs.org/ — www.ebi.ac.uk/reference_proteomes
  • 32. E A R LY I N T E R V E N T I O N — S U P P O R T I N G S TA N D A R D S • Promote community-accepted identifiers, ontologies, & formats • Develop and follow guidelines (paper and web-based) • e.g. Gaudet, P., et al. Towards BioDBcore: a community-defined information specification for biological databases. Database 2011. PMCID: PMC3017395 • Resource Identification Initiative • www.force11.org/Resource_identification_initiative • Vasilevsky NA, et al. On the reproducibility of science: unique identification of research resources in the biomedical literature. PeerJ. 2013 Sep 5;1:e148. doi: 10.7717/peerj.148. PubMed PMID: 24032093; PubMed Central PMCID: PMC3771067.
  • 33. E A R LY I N T E R V E N T I O N — S U P P O R T I N G S TA N D A R D S • Promote community-accepted identifiers, ontologies, & formats • Embed community accepted standards in the lab environment
  • 34. K N O C K O U T M O U S E P R O J E C T 2 • Broad standardized phenotyping of knockout mice on a standard genetic background • Data collection from many centres • www.mousephenotype.org
  • 35. K N O C K O U T M O U S E P R O J E C T 2 • Broad standardized phenotyping of knockout mice on a standard genetic background • Data collection from many centres • www.mousephenotype.org Cindy Smith
  • 36. P R O T O C O L S A R E S TA N D A R D I Z E D R E Q U I R E U S E O F PA R T I C U L A R O N T O L O G Y T E R M S T O D E S C R I B E P H E N O T Y P E
  • 37. E A R LY I N T E R V E N T I O N — S U P P O R T I N G S TA N D A R D S • Promote community-accepted identifiers, ontologies, & formats • Embed community accepted standards in the lab environment • Work with labs to embed standards into their data generation pipeline
  • 38. E A R LY I N T E R V E N T I O N — S U P P O R T I N G S TA N D A R D S • Promote community-accepted identifiers, ontologies, & formats • Embed community accepted standards in the lab environment • Stealth standards
  • 39. S TA N D A R D S T H R O U G H U T I L I T Y — A P O L L O C S I R O V I D E O — D E M O A T G E N O M E A R C H I T E C T. O R G
  • 40. S TA N D A R D S T H R O U G H U T I L I T Y — A P O L L O C S I R O V I D E O — D E M O A T G E N O M E A R C H I T E C T. O R G
  • 41. T O O L S F O R T H E C O M M U N I T Y
  • 42. T O O L S F O R T H E C O M M U N I T Y • Web-based so researchers anywhere have access
  • 43. T O O L S F O R T H E C O M M U N I T Y • Web-based so researchers anywhere have access • Concurrent access supports real-time collaboration
  • 44. T O O L S F O R T H E C O M M U N I T Y • Web-based so researchers anywhere have access • Concurrent access supports real-time collaboration • Built-in support for standards (transparently compliant)
  • 45. T O O L S F O R T H E C O M M U N I T Y • Web-based so researchers anywhere have access • Concurrent access supports real-time collaboration • Built-in support for standards (transparently compliant) • Automatic generation of ready-made computable data
  • 46. T O O L S F O R T H E C O M M U N I T Y • Web-based so researchers anywhere have access • Concurrent access supports real-time collaboration • Built-in support for standards (transparently compliant) • Automatic generation of ready-made computable data • Client-side application relieves server bottleneck and supports privacy
  • 47. E A R LY I N T E R V E N T I O N — S U P P O R T I N G S TA N D A R D S • Promote community-accepted identifiers, ontologies, & formats • Embed community accepted standards in the lab environment • Stealth standards • Re-purpose internal curation tools for external users • Provide on-line documentation, hands-on training and rapid-response user help • Work with educators to make these tools an integral part of the curriculum • e.g. CACAO (Critical Assessment of Community Annotation using Ontologies), ecoliwiki.net/colipedia/index.php/CACAO_0.1 • DNA subway (Apollo)
  • 48. S U B M I S S I O N
  • 49. • CANTO: curation.pombase.org • Structured Digital Abstracts • Identifiers for all named genes, proteins, metabolites or other objects in the article • Main results described in simple ontology terms • Experimental evidence types • Not only a synopsis of the results but computer-readable • Gerstein, M., et al. Structured digital abstract makes text mining easy. Nature 447, 142 (10 May 2007) | doi:10.1038/447142a. • Minimal Information reporting guidelines • http://mibbi.sourceforge.net/portal.shtml S U B M I T T I N G D ATA — I N A S T R U C T U R E D WAY
  • 50. P U B L I S H I N G
  • 51. P U B L I S H I N G
  • 52. P U B L I S H I N G • First there were letters
  • 53. P U B L I S H I N G • First there were letters • Then Henry Oldenburg created the first scientific journal in 1665
  • 54. P U B L I S H I N G • First there were letters • Then Henry Oldenburg created the first scientific journal in 1665 • Result: too much to absorb
  • 55. P U B L I S H I N G • First there were letters • Then Henry Oldenburg created the first scientific journal in 1665 • Result: too much to absorb Washed away on the sea of information
  • 56. P E E R A N D E D I T O R I A L R E V I E W B E C A M E A F I LT E R C O N S E Q U E N T LY …
  • 57. • Figshare: figshare.org • iDigBio: www.idigbio.org • Dryad: datadryad.org • eLife: www.elifesciences.org • Unlike journal articles, the scale of web-native publishing may overwhelm attempts at manual curation (using current strategies) T H E M E D I U M O F P U B L I C AT I O N I S C H A N G I N G
  • 58. D O W E N E E D T O C U R AT E ?
  • 59. S C H O L A R S H I P : B E Y O N D T H E PA P E R . J A S O N P R I E M . N AT U R E 4 9 5 , 4 3 7 – 4 4 0 ( 2 8 M A R C H 2 0 1 4 ) “…powerful, online filters will distill communities impact judgements algorithmically” S O M E S AY N O …
  • 60. D O W E N E E D T O C U R AT E ? • Resolution of differences • Clarity, eliminating noise • Validation & design of automated methods
  • 61. E V E N A P L A C E L I K E G O O G L E U S E S C U R AT O R S ( * A N D S O F T WA R E ) • Hundreds of operators per country • Multiple kinds of errors: overlapping jurisdictions, accidental merges, road maps to satellite images mismatch, etc. • Every road that you see has been hand-massaged ! ! http://www.theatlantic.com/technology/archive/2012/09/how-google-builds-its-maps-and-what-it-means-for-the-future-of-everything/ 261913/
  • 62. D O W E N E E D T O C U R AT E ? • Resolution of differences • Clarity, eliminating noise • Validation & design of automated methods
  • 63. C L A R I T Y • Answer boxes: Quick answers to concrete questions ! ! ! !
  • 64. C L A R I T Y • Answer boxes: Quick answers to concrete questions ! ! ! !
  • 65. C L A R I T Y • Answer boxes: Quick answers to concrete questions ! ! ! !
  • 66. C L A R I T Y • Answer boxes: Quick answers to concrete questions ! ! ! ! • Much of this information comes from Freebase which is structured in terms of entities and properties
  • 67. C L A R I T Y • Answer boxes: Quick answers to concrete questions ! ! ! ! • Much of this information comes from Freebase which is structured in terms of entities and properties Robert West, et al. Knowledge Base Completion via Search-Based Question Answering. http://www.cs.ubc.ca/~murphyk/Papers/www14.pdf WWW’14 April 7–11, 2014, Seoul, Korea. ACM 978-1-4503-2744-2/14/04. DOI:2568032
  • 68. D O W E N E E D T O C U R AT E ? • Resolution of differences • Clarity, eliminating noise • Validation & design of automated methods
  • 69. • PDF is still the dominant form of distribution • PDF “Annotation” • UTOPIA, www.utopiadocs.com • DOMEO, swan.mindinformatics.org • Textpresso, www.textpresso.org • All of these are still lacking domain specifics (or need to be taught) • FORCE11, www.force11.org • Common goal is advancing scientific communications • Beyond the PDF L I T E R AT U R E I S I N F O R M AT I V E B U T I S N O T I N F O R M AT I O N X
  • 70. VA L I D AT I O N A N D D E S I G N O F A U T O M AT E D M E T H O D S
  • 71. VA L I D AT I O N A N D D E S I G N O F A U T O M AT E D M E T H O D S
  • 72. VA L I D AT I O N A N D D E S I G N O F A U T O M AT E D M E T H O D S Write/modify software
  • 73. VA L I D AT I O N A N D D E S I G N O F A U T O M AT E D M E T H O D S Run the algorithm Write/modify software
  • 74. VA L I D AT I O N A N D D E S I G N O F A U T O M AT E D M E T H O D S Run the algorithm Write/modify software Evaluate results
  • 75. VA L I D AT I O N A N D D E S I G N O F A U T O M AT E D M E T H O D S • Requires trusted reference datasets! Run the algorithm Write/modify software Evaluate results
  • 76. VA L I D AT I O N A N D D E S I G N O F A U T O M AT E D M E T H O D S • Requires trusted reference datasets! • Biocurators are partners with developers! Run the algorithm Write/modify software Evaluate results
  • 77. S C H O L A R S H I P : B E Y O N D T H E PA P E R . J A S O N P R I E M . N AT U R E 4 9 5 , 4 3 7 – 4 4 0 ( 2 8 M A R C H 2 0 1 4 ) “…powerful, online filters will distill communities impact judgements algorithmically” D O W E N E E D T O C U R AT E ?
  • 78. T H E PA R A B L E O F G O O G L E F L U : T R A P S I N B I G D ATA A N A LY S I S . D AV I D L A Z E R E T A L . S C I E N C E 1 4 M A R C H 2 0 1 4 : V O L . 3 4 3 N O . 6 1 7 6 P P. 1 2 0 3 - 1 2 0 5 “‘Big data hubris” is the often implicit assumption that big data are a substitute for, rather than a supplement to, traditional data collection and analysis.” D O W E N E E D T O C U R AT E ?
  • 79. D O W E N E E D T O C U R AT E ? • Yes ! ! ! !
  • 80. D O W E N E E D T O C U R AT E ? • Yes ! ! ! ! • But…
  • 81. S Y S T E M AT I C R E V I E W & C R I T I C I S M I S R E Q U I R E D O U R S T R E N G T H I S I N Q U A L I T Y O F T H E I N F O R M A T I O N W E C A N P R O V I D E
  • 82. C U S I C K , M . , E T A L . L I T E R AT U R E - C U R AT E D P R O T E I N I N T E R A C T I O N D ATA S E T S N AT M E T H O D S . J A N 2 0 0 9 ; 6 ( 1 ) : 3 9 – 4 6 . P M C I D : P M C 2 6 8 3 7 4 5 “…literature curated datasets have inherent reliability difficulties…” H O W C A N B I O C U R AT O R S A D D R E S S C R I T I C I S M S ?
  • 83. G R E E N B E R G , S . , H O W C I TAT I O N D I S T O R T I O N S C R E AT E U N F O U N D E D A U T H O R I T Y: A N A LY S I S O F A C I TAT I O N N E T W O R K B M J J U LY 2 0 0 9 ; 3 3 9 D O I : H T T P : / / D X . D O I . O R G / 1 0 . 1 1 3 6 / T H E R I S K ( B Y A N A L O G Y ) 56
  • 84. W E ' R E R E S P O N S I B L E F O R T H E Q U A L I T Y • “Reviewing the quality of the data is an obligation of any entity that assumes responsibility over the data.” • Limor Peer et al., IDCC 2014
  • 85. PA I N T A P O P T O S I S - S U M M A RY • 52 families annotated: 
 - 8 were par$cipants in execution phase of apoptosis; • 44 others are either: A. upstream  of  apoptosis     B. phenotypes   C. targets

  • 86. Example 1: Protein (cytochrome c) upstream of apoptosis execution Cytochrome c is directly involved in apoptotic DNA fragmentation
  • 87. Example 1: Protein (cytochrome c) upstream of apoptosis execution Cytochrome c is directly involved in apoptotic DNA fragmentation ➢ [Cells] – [cytochrome c] = No apoptotic DNA fragmentation
  • 88. Example 1: Protein (cytochrome c) upstream of apoptosis execution Cytochrome c is directly involved in apoptotic DNA fragmentation ➢ [Cells] – [cytochrome c] = No apoptotic DNA fragmentation ➢ [Cells] – [cytochrome c] + [cytochrome c] = apoptotic DNA fragmentation
  • 89. Example 2: Phenotype of reduced cell survival and increased DNA fragmentation • E3 ubiquitin-protein ligase TRAF7
 was annotated to execution phase of apoptosis ➢ Exogenous expression of TRAF7 ➢ No other data in terms of where in apoptosis this may be. ! ➢ All we know is altering TRAF7 levels affects apoptosis.
  • 90. Example 3: Target DSG2 was annotated to execution phase of apoptosis
  • 91. Example 3: Target DSG2 was annotated to execution phase of apoptosis
  • 92. Example 3: Target DSG2 was annotated to execution phase of apoptosis DSG2 is a *target* of a protease (caspase), and although its degradation indeed seems to be a part of apoptosis it does not *mediate* apoptosis.
  • 93. P R O V E T H E N E E D F O R B I O C U R AT I O N • Publish: Quantitative improvements before/after • Publish: Curator consistency studies • Publish: Independent external reviews
  • 94. R E C O G N I T I O N & C R E D I T O R C I D . O R G
  • 95. E N A B L I N G R E S E A R C H
  • 96. W H AT I S A B I O C U R AT O R ?
  • 97. W H AT I S A B I O C U R AT O R ?
  • 98. W H AT I S A B I O C U R AT O R ?
  • 99. W H AT I S A B I O C U R AT O R ? • A highly skilled and trained keeper of our biological heritage of knowledge.
  • 100. W H AT I S A B I O C U R AT O R ? • A highly skilled and trained keeper of our biological heritage of knowledge. • A content specialist who understands the research and can succinctly distill biological research results into computable data
  • 101. W H AT I S A B I O C U R AT O R ? • A highly skilled and trained keeper of our biological heritage of knowledge. • A content specialist who understands the research and can succinctly distill biological research results into computable data • Considers the ease of finding this information, its relatedness to other information, and its research and educational usability
  • 102.  B6.Cg-­‐Alms1foz/fox/J increased  weight,   adipose  tissue  volume,     glucose  homeostasis  altered ALSM1(NM_015120.4)   [c.10775delC]  +  [-­‐] GENOTYPE PHENOTYPE obesity,   diabetes  mellitus,    insulin  resistance increased  food  intake,     hyperglycemia,   insulin  resistance kcnj11c14/c14;  insrt143/+(AB) M O D E L S R E C A P I T U L AT E VA R I O U S P H E N O T Y P I C A S P E C T S O F D I S E A S E
  • 103.  B6.Cg-­‐Alms1foz/fox/J increased  weight,   adipose  tissue  volume,     glucose  homeostasis  altered GENOTYPE PHENOTYPE obesity,   diabetes  mellitus,    insulin  resistance increased  food  intake,     hyperglycemia,   insulin  resistance kcnj11c14/c14;  insrt143/+(AB) M O D E L S R E C A P I T U L AT E VA R I O U S P H E N O T Y P I C A S P E C T S O F D I S E A S E ?
  • 104. R E S E A R C H R E S O U R C E S Doelken S C et al. Dis. Model. Mech. 2013;6:358-372
  • 105. Smedley D et al. Database. 2013; bat025 Mungall CJ et al. Genome Biol. 2010; 11(1):R2 Washington N et al. Plos Biol 2009; e1000247 C R O S S - S P E C I E S P H E N O T Y P E C O M PA R I S O N S 
 B Y S E M A N T I C S I M I L A R I T Y
  • 107. PHENOTYPIC INTERPRETATION OF VARIANTS IN EXOMES (PHIVE) Whole exome Remove off-target and common variants Variant score from allele freq and pathogenicity Phenotype score from phenotypic similarity PhenIX/PhIVE score to give final candidates http://monarchinitiative.org  
  • 108. C O N F I R M E D D I A G N O S E S • Infantile Parkinsonism-dystonia • Wiedemann Steiner syndrome • de novo SYNGAP1 mutation leading autosomal dominant mental retardation • Frank-ter Haar syndrome • Infantile hypophosphatasia • … (~28%)
  • 109. R E L AT E D N E S S A C R O S S B I O L O G Y
  • 110. R E L AT E D N E S S A C R O S S B I O L O G Y • Bio-Curator, not bio-Archivist • Actively trying to represent current best understanding
  • 111. R E L AT E D N E S S A C R O S S B I O L O G Y • Bio-Curator, not bio-Archivist • Actively trying to represent current best understanding • Support interoperability
  • 112. R E L AT E D N E S S A C R O S S B I O L O G Y • Bio-Curator, not bio-Archivist • Actively trying to represent current best understanding • Support interoperability • Support research and educational usability
  • 113. R E L AT E D N E S S A C R O S S B I O L O G Y • Bio-Curator, not bio-Archivist • Actively trying to represent current best understanding • Support interoperability • Support research and educational usability • Support inference
  • 114. R E L AT E D N E S S A C R O S S B I O L O G Y • Bio-Curator, not bio-Archivist • Actively trying to represent current best understanding • Support interoperability • Support research and educational usability • Support inference • Not just for supporting searches, not just for finding PDF/online papers!
  • 115. W H AT C A N B E D O N E ?
  • 116. W H AT C A N B E D O N E ?
  • 117. W H AT C A N B E D O N E ?
  • 118. W H AT C A N B E D O N E ?
  • 119. W H AT C A N B E D O N E ?
  • 120. B I O D I V E R S I T Y D ATA J O U R N A L
  • 121. B I O D I V E R S I T Y D ATA J O U R N A L
  • 122. B I O D I V E R S I T Y D ATA J O U R N A L F R O M W R I T I N G , S U B M I S S I O N , P E E R - R E V I E W, E D I T I N G , P U B L I C AT I O N T O D I S S E M I N AT I O N !
  • 123. W H AT C A N I S B D O ?
  • 124. W H AT C A N I S B D O ? • Tangible support of standards efforts • QfO, RII, MI, publish guidelines, validators …
  • 125. W H AT C A N I S B D O ? • Tangible support of standards efforts • QfO, RII, MI, publish guidelines, validators … • Create a curation mindset across the entire life cycle • Support embedded/repurposed software, education, actively engage with text-miners, provide on-line support …
  • 126. W H AT C A N I S B D O ? • Tangible support of standards efforts • QfO, RII, MI, publish guidelines, validators … • Create a curation mindset across the entire life cycle • Support embedded/repurposed software, education, actively engage with text-miners, provide on-line support … • Prove the necessity for curation • Publish studies, greater emphasis on review and quality (assessment)
  • 127. W H AT C A N I S B D O ? • Tangible support of standards efforts • QfO, RII, MI, publish guidelines, validators … • Create a curation mindset across the entire life cycle • Support embedded/repurposed software, education, actively engage with text-miners, provide on-line support … • Prove the necessity for curation • Publish studies, greater emphasis on review and quality (assessment) • Work with traditional publishers • FORCE11, structured submissions
  • 128. W H AT C A N Y O U D O ? • Consider • The ease of finding information • Its relatedness to other information • Its research and educational usability
  • 129. R E S E A R C H ? ? Y O U , T H E B I O C U R AT O R I S B
  • 130. A C K N O W L E D G E M E N T S A N D T H A N K S Y O U A R E N O T A L O N E