5. • Luscombe et al., 2001:
*especially, but definitely not limited to, gene & protein sequence data
**often impressively large datasets. Please do not call it “big data”
13. Dr. Dayhoff established an on-line computer
database and a sophisticated retrieval system, accessable by
phone to outside users, in September 1980
http://www.dayhoff.cc/MODBiography.html
17. Kaye et al.
“Data sharing in genomics — re-shaping
scientific practice” Nat Rev Genet 2008
18. Langille et al. (2018) Microbiome
Data-release policies are
only as good as researchers’
willingness to abide by
them, and the will on the
part of journals and funding
bodies to enforce them!
19.
20. • Standard formats
• Efficient representations
quantitative data
• Gene expression
• Metabolite concentrations
23. “(i) the recorded information about each experiment should be sufficient to
interpret the experiment and should be detailed enough to enable
comparisons to similar experiments and permit replication of experiments
and (ii) the information should be structured in a way that enables useful
querying as well as automated data analysis and mining.”
Brazma et al. (2001) Nat Genet
g Genes
s Samples (Diffuse large B cell
lymphoma patients of two types,
with different prognoses)
Expression levels:
Low
High
Li et al (2001) Bioinformatics
32. All articles published in Science in 2009 that mention H1N1:
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=science[
journal]+AND+h1n1+AND+2009[pdat]
45. 3. Archive the ExactVersions of All External Programs Used
5. Record All Intermediate Results, When Possible in
Standardized Formats
Sandve et al., PLoS Comput Biol 2013
48. G Dudas et al. Nature 1–7 (2017) doi:10.1038/nature22040
Recent paper about Ebola transmission
• 1610 publicly available genomes, 2014-2015
• Data cleaning
• Relaxed molecular clock to infer root
• Markov chains to infer transmission rates
(not location specific, but based on
attributes)
Key Conclusions:
• Median transmission distance = 72 km
• Important factors:
• National vs int’l dispersal
• Distances between regions
• Population at source and destination
• Shared int’l border