SlideShare una empresa de Scribd logo
1 de 100
Open Data is Essential for
Personalized Medicine
BF Francis Ouellette
https://goo.gl/8U1QJa
Open Data is Essential for Genomics
This presentation is on:
https://www.slideshare.net/
3Module #: Title of Module
Open Data is Essential for Genomics
Open Data is Essential for Genomics
@bffo
francis@genomequebec.comE-mail
Open Data is Essential for Genomics
Times I’ve been in Italy
• Trieste 1996: Last Yeast Genome Meeting
• Naples 2005: NETTAB “Workflows management:
new abilities for the biological information overflow”
• Rome 2017: Elixir
• Palermo 2017: NETTAB
Open Data is Essential for Genomics
Outline
• What I do
• Open Data in genomics
• Final thoughts
Open Data is Essential for Genomics
But first, a little about me …
… an unfinished story!
Open Data is Essential for Genomics
https://goo.gl/anu933
Open Data is Essential for Genomics
http://goo.gl/dJIur
Open Data is Essential for Genomics
http://goo.gl/LwVOZ
Open Data is Essential for Genomics
http://goo.gl/QI6aL
Open Data is Essential for Genomics
http://goo.gl/mYHFO
Open Data is Essential for Genomics
http://goo.gl/Jc5TK
Open Data is Essential for Genomics
https://goo.gl/3PFr7L
1993-1997
Open Data is Essential for Genomics
from the National Centre for Biotechnology Information
Open Data is Essential for Genomics
from the National Centre for Biotechnology Information
Open Data is Essential for Genomics
from the National Centre for Biotechnology Information
PANIC
Open Data is Essential for Genomics
Open Data is Essential for Genomics
PANIC
Open Data is Essential for Genomics
PANIC
Open Data is Essential for Genomics
Open Data is Essential for Genomics
https://www.ubc.ca/
Open Data is Essential for Genomics
1999
Open Data is Essential for Genomics
2001: Human Genome Project
Open Data is Essential for Genomics
2003-2007
Open Data is Essential for Genomics
Open Data is Essential for Genomics
Toronto
Open Data is Essential for Genomics
2007-2017
Open Data is Essential for Genomics
International Cancer Genome Consortium
Open Data is Essential for Genomics
http://goo.gl/dJIur
Open Data is Essential for Genomics
2017- …
Open Data is Essential for Genomics
Open Data is Essential for Genomics
SABs, EBs & projects I’m on:
Open Data is Essential for Genomics
Open Data is Essential for Genomics
So what unifies all
of what I’ve done?
Open Data is Essential for Genomics
So what unifies all
of what I’ve done?
Helping scientists do science.
Open Data is Essential for Genomics
Open Data
https://goo.gl/Z63Wxp
Open Data is Essential for Genomics
Genomics
https://goo.gl/MX84KA
Open Data is Essential for Genomics
What am I calling “Genomics”?
All “omics”
– DNA and RNA, +Epigenomics
– Proteomics, +Protein Interactions, +Pathways
– Metabolomics
– Bioinformatics/Computational Biology
– All of the related data and metadata
• Phenotype
• Clinical
• Images
– New technologies …
Open Data is Essential for Genomics
Biological scope?
• Anything with DNA or RNA or protein
Open Data is Essential for Genomics
Open Data is Essential for Genomics
Example of one of a
challenge for all of us?
The integration of genomic data
with deep learning and artificial
intelligence
Open Data is Essential for Genomics
AI, Big Data, Deep Computing
• Artificial Intelligence / Deep Learning and
the Big Data Hype?
https://goo.gl/WHg36Q
Open Data is Essential for Genomics
What do we need for that?
https://goo.gl/JWpXj2
Open Data is Essential for Genomics
What do we need for that?
https://goo.gl/JWpXj2
Open Data is Essential for Genomics
What else?
• Data has to be FAIR
– TO BE FINDABLE
– TO BE ACCESSIBLE
– TO BE INTEROPERABLE
– TO BE RE-USABLE
• https://www.force11.org/group/fairgroup/fairprinciples
Open Data is Essential for Genomics
Big data examples
• Genomic sequences
• Imaging
• Population scale collected wearable data
Open Data is Essential for Genomics
Data Center for all in Québec?
• Health Care in Canada is governed
province by province.
• Génome Québec is working with various
ministries to set something that could be
useful/centralized and make genomic data
usable for research (controlled access).
• Needs to include clinical data
Open Data is Essential for Genomics
“Building a data centre is
like making pancakes, you
always need to throw
away the 1st one”
Robert Grossman
Frederick H. Rawson Professor and
the Director of the Center for Data
Intensive Science (CDIS) at the
University of Chicago
http://rgrossman.com/
Open Data is Essential for Genomics
Sharing all data types,
including clinical data?
https://goo.gl/ofEPeX
Open Data is Essential for Genomics
Authors present at the
“Toronto meeting”
https://goo.gl/ofEPeX
Open Data is Essential for Genomics
53 Introduction 1.0
Open data critical to
progress in Science
Open Data is Essential for Genomics
54 Introduction 1.0
One example: GenBank
GenBank sequence
database is an open
access, annotated
collection of all publicly
available nucleotide
sequences and their
protein translations.
Open Data is Essential for Genomics
55 Introduction 1.0
Open data critical to progress in Science
• Without GenBank and other public
sequence databases
– There would be no BLAST
– There would be no diagnostics DNA testing
– There would be no understanding of the
human genome (there probably would not
have been a human genome to work on in the
first place).
Open Data is Essential for Genomics
Adapted from Niko Beerenwinkel ,Chris D. Greenman ,Jens Lagergren
ICGC PCAWG
Docker
Testing
Computational Cancer Biology: An Evolutionary Perspective
•Published: February 4, 2016. https://doi.org/10.1371/journal.pcbi.1004717
Open Data is Essential for Genomics
Cancer is a Disease
of the Genome
Challenge in Treating Cancer:
 Every tumour is different
 Every cancer patient is different
Adapted from Tom Hudsonhttps://www.cancer.gov/research/areas/genomics
Open Data is Essential for Genomics
Analysis Data Types
• Simple Somatic Mutations (SSM or SNV)
• Copy Number Alterations (CAN or CNV)
• Structural Variants (SV)
• Germline variants (SNPs)
• Gene Expression (micro-arrays and RNASeq)
• miRNA Expression (RNASeq)
• Epigenomics (Arrays and Methylation)
• Splicing Variation (RNASeq)
• Protein Expression (Arrays)
Open Data is Essential for Genomics
International Cancer Genome Consortium
• Collect ~500 tumour/normal pairs from each of 50 different major
cancer types; 25,000 T/N pairs!
• Comprehensive genome analysis of each T/N pair:
– Genome
– Transcriptome
– Methylome
– Clinical data
• Make the data available to the research community & public.
Identify
genome
changes
…GATTATTCCAGGTAT… …GATTATTGCAGGTAT… …GATTATTGCAGGTAT…
Adapted from Tom Hudson
ONTARIO INSTITUTE FOR CANCER RESEARC
60
Open Data is Essential for Genomics
International Cancer Genome Consortium: http:/icgc.org
Open Data is Essential for Genomics
ICGC needs to deal with different
kinds of users!
62
• Biologists/Clinicians:
– Web interface to processed data, providing:
• Affected gene lists with consequences
• Impact on pathways
• Power users:
– Application Programing Interface (API) to get
to data
– Availability and Integration with cloud
resources
Open Data is Essential for Genomics
ICGC Data Coordinating Centre:
dcc.icgc.org
63
Open Data is Essential for Genomics
https://dcc.icgc.org/
64
Open Data is Essential for Genomics
65
https://dcc.icgc.org/icgc-in-the-cloud
Open Data is Essential for Genomics
66
http://www.cancercollaboratory.org/
Open Data is Essential for Genomics
Some challenges:
67
• So, we have lots of data, is
it generated the same way?
Open Data is Essential for Genomics
Every country/group has basically
been submitting:
68
– Simple Somatic Mutations (SSM or SNV)
– Copy Number Alterations (CAN or CNV)
– Structural Variants (SV)
– Germline variants (SNPs)
– Gene Expression (micro-arrays and RNASeq)
– miRNA Expression (RNASeq)
– Epigenomics (Arrays and Methylation)
– Splicing Variation (RNASeq)
– Protein Expression (Arrays)
Open Data is Essential for Genomics
Are they all using the same
pipelines?
69
• No
Open Data is Essential for Genomics
70
Open Data is Essential for Genomics
Steering Committee of PCAWG
71
• Peter Campbell, Sanger Inst.
• Gady Getz, Broad
• Jan Korbel, EMBL
• Lincoln Stein, OICR
• Josh Stuart, UCSC
Open Data is Essential for Genomics
PanCancer Analysis of Whole
Genomes (PCAWG)
• > 2,800 T/N pairs with clinical data from 20
tumour type of whole genome analysis.
• Aligned with one standard pipeline.
• Genomic Variants determined with 3 pipelines
• 17 working groups
• > 50 Papers are being
written now.
Open Data is Essential for Genomics
https://www.biorxiv.org/search/pcawg
Open Data is Essential for Genomics
Deliverable for PCAWG include:
74
• 1st PANCANCER analysis on > 2,800
cancer tumours from a WGS perspective
• RNA, SSM, CNV, Methylation analysis &
germline
• Published (executable) pipelines
– Docker / Dockstore
– Mutiple cloud access to data
– Multiple portal access to data
Open Data is Essential for Genomics
https://dcc.icgc.org/pcawg
75
Open Data is Essential for Genomics
Working Groups (1/2)
76
1. Novel somatic mutation calling methods
2. Analysis of mutations in regulatory regions
3. Integration of transcriptome and genome
4. Integration of epigenome and genome
5. Consequences of somatic mutations on pathway
and network activity
6. Patterns of structural variations, signatures,
genomic correlations, retrotransposons, mobile
elements
7. Mutation signatures and processes
8. Germline cancer genome
Open Data is Essential for Genomics
Working Groups (2/2)
77
9 Inferring driver mutations and identifying cancer
genes and pathways
10 Translating cancer genomes to the clinic
11 Evolution and heterogeneity
12 Exploratory: portals, visualization and software
infrastructure
13 Molecular subtypes and classification
14 Analysis of mutations in non-coding RNA
15 Exploratory: mitochondrial
16 Exploratory: pathogens
17 Tech Technical working group
Open Data is Essential for Genomics
https://goo.gl/AMxwSU
Open Data is Essential for Genomics
https://goo.gl/AMxwSU
Open Data is Essential for Genomics
https://goo.gl/AMxwSU
Open Data is Essential for Genomics
https://goo.gl/AMxwSU
Open Data is Essential for Genomics
http://dockstore.org
82
Open Data is Essential for Genomics
Docker Testing Group
• Group that to ensure all container
workflow work as expected.
https://goo.gl/AMxwSU
Open Data is Essential for Genomics
Access to Data?
• Human Data
• Patients consented to have their DNA
looked at so people could understand
cancer
• Need to have a system to maximize
people’s gift to science.
Open Data is Essential for Genomics
Open Data is Essential for Genomics
Identify
yourself
Fill out detail form which
includes:
• Contact and Project
Information
•Information Technology
details and procedures
for keeping data secure
•Data Access Agreement
All of these
documents are
put into a PDF
file that you
print and get your
institution to sign
off on your behalf
Open Data is Essential for Genomics
Open Data is Essential for Genomics
Open Data is Essential for Genomics
89
https://icgc.org/daco/approved-projects
314 groups
Open Data is Essential for Genomics
DACO
ICGC
dbGaP
GDC
EGA
TCGA
BAM
Open
Open
ERA
BA
M
BA
M
EGA id
& password
WGS
Ger m
Line
Open Data is Essential for Genomics
Challenge:
• Open Data and controlled access data
• Not enough eyeballs on the data
• Eyeballs on the data needed to make
discoveries.
https://goo.gl/ogbWXG
Open Data is Essential for Genomics
Culture of Sharing Openly
• Public Funding agencies
• Consortiums
• Mentors
• Peers
• New generation (vs my old generation)
• Has to become the norm
Open Data is Essential for Genomics
Final thoughts …
• Access to data is essential for science
• Getting data that is FAIR is hard work
• It is essential to share the work you do if
you want to be recognized, get tenure, get
a job or a promotion.
• Human data is more complicated, but
don’t let that get in the way!
• There is a lot of material out there, learn
from it (& cite your sources)!
Open Data is Essential for Genomics
Last message to students and
young PDFs and investigators:
Open Data is Essential for Genomics
Last message to students and
young PDFs and investigators:
Be open so people
can see how great
you are!
ONTARIO INSTITUTE FOR CANCER RESEARC
96
915
Open Data is Essential for Genomics
DCC Software
Developer
Vincent Ferretti
Dusan Andric
Phuong-My Do
Francois Gerthoffert
Terry Lin
Michael Moncada
Vitalii Slobodianyk
Bob Tiernay
Douglas Wong
Linda Xiang
Junjun Zhang
Acknowledgments
ICGC/OICR
Project leaders:
Tom Hudson
John McPherson
Lincoln Stein
Jared Simpson
Paul Boutros
Vincent Ferretti
Francis Ouellette
Jennifer Jennings
Ouellette Lab
Alysha Moncrieffe
Ann Meyer
Zhibin Lu
Web Dev
Joseph Yamada
Kaman Wu
Kim Cullion
Koji Miyauchi
Miyuki Fukuma
ICGC DCC Biocuration
Hardeep Nahal
Marc Perry
http://oicr.on.ca http://icgc.org
… and all the patients and their
families that that are putting
their hopes into our work!
Research
IT/Systems
David Sutton,
Bob Gibson
David Magda
Rob Naccarato
Brian Ott
Gino Yearwood
EGA
Jordi Rambla De
Argila
Arcadi Navarro
Audald Iloret
Mauricio Moldes
|
ÉQUIPE DES AFFAIRES SCIENTIFIQUES
9827 mars 2017
B.F. Francis
Ouellette
Annina Spilker
Joël Savard
Diana IglesiasDiane
Bouchard
Cristina CiurliMicheline
Ayoub
Hélène
Fournier
Open Data is Essential for Genomics
99
Grazie
Open Data is Essential for Genomics
100

Más contenido relacionado

La actualidad más candente

2015 bioinformatics personal_genomics_wim_vancriekinge
2015 bioinformatics personal_genomics_wim_vancriekinge2015 bioinformatics personal_genomics_wim_vancriekinge
2015 bioinformatics personal_genomics_wim_vancriekingeProf. Wim Van Criekinge
 
Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...
Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...
Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...VHIR Vall d’Hebron Institut de Recerca
 
Reference Data Integration: A Strategy for the Future
Reference Data Integration: A Strategy for the FutureReference Data Integration: A Strategy for the Future
Reference Data Integration: A Strategy for the FutureBarry Smith
 
Data analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsData analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsmikaelhuss
 
NetBioSIG2013-Talk Thomas Kelder
NetBioSIG2013-Talk Thomas KelderNetBioSIG2013-Talk Thomas Kelder
NetBioSIG2013-Talk Thomas KelderAlexander Pico
 
Next generation sequencing in preimplantation genetic screening (NGS in PGS)
Next generation sequencing in preimplantation genetic screening (NGS in PGS)Next generation sequencing in preimplantation genetic screening (NGS in PGS)
Next generation sequencing in preimplantation genetic screening (NGS in PGS)Mahidol University, Thailand
 
Pathology is being disrupted by Data Integration, AI & Blockchain
Pathology is being disrupted by Data Integration, AI & BlockchainPathology is being disrupted by Data Integration, AI & Blockchain
Pathology is being disrupted by Data Integration, AI & BlockchainNatalio Krasnogor
 
BIOINFORMATICS Applications And Challenges
BIOINFORMATICS Applications And ChallengesBIOINFORMATICS Applications And Challenges
BIOINFORMATICS Applications And ChallengesAmos Watentena
 
Building bioinformatics resources for the global community
Building bioinformatics resources for the global communityBuilding bioinformatics resources for the global community
Building bioinformatics resources for the global communityExternalEvents
 
Data for AI models, the past, the present, the future
Data for AI models, the past, the present, the futureData for AI models, the past, the present, the future
Data for AI models, the past, the present, the futurePistoia Alliance
 
WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015
WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015
WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015Torsten Seemann
 
Career oppurtunities in the field of Bioinformatics
Career oppurtunities in the field of BioinformaticsCareer oppurtunities in the field of Bioinformatics
Career oppurtunities in the field of BioinformaticsShikha Thakur
 
Pallavi online assignment
Pallavi online assignmentPallavi online assignment
Pallavi online assignmentreshmafmtc
 
Bioinformatics lecture 1
Bioinformatics lecture 1Bioinformatics lecture 1
Bioinformatics lecture 1Hamid Ur-Rahman
 
Basics of Data Analysis in Bioinformatics
Basics of Data Analysis in BioinformaticsBasics of Data Analysis in Bioinformatics
Basics of Data Analysis in BioinformaticsElena Sügis
 
NetBioSIG2013-Talk Robin Haw
NetBioSIG2013-Talk Robin Haw NetBioSIG2013-Talk Robin Haw
NetBioSIG2013-Talk Robin Haw Alexander Pico
 

La actualidad más candente (20)

2015 bioinformatics personal_genomics_wim_vancriekinge
2015 bioinformatics personal_genomics_wim_vancriekinge2015 bioinformatics personal_genomics_wim_vancriekinge
2015 bioinformatics personal_genomics_wim_vancriekinge
 
Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...
Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...
Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...
 
Reference Data Integration: A Strategy for the Future
Reference Data Integration: A Strategy for the FutureReference Data Integration: A Strategy for the Future
Reference Data Integration: A Strategy for the Future
 
Data analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsData analysis & integration challenges in genomics
Data analysis & integration challenges in genomics
 
Michael Reich, GenomeSpace Workshop, fged_seattle_2013
Michael Reich, GenomeSpace Workshop, fged_seattle_2013Michael Reich, GenomeSpace Workshop, fged_seattle_2013
Michael Reich, GenomeSpace Workshop, fged_seattle_2013
 
NetBioSIG2013-Talk Thomas Kelder
NetBioSIG2013-Talk Thomas KelderNetBioSIG2013-Talk Thomas Kelder
NetBioSIG2013-Talk Thomas Kelder
 
Next generation sequencing in preimplantation genetic screening (NGS in PGS)
Next generation sequencing in preimplantation genetic screening (NGS in PGS)Next generation sequencing in preimplantation genetic screening (NGS in PGS)
Next generation sequencing in preimplantation genetic screening (NGS in PGS)
 
Pathology is being disrupted by Data Integration, AI & Blockchain
Pathology is being disrupted by Data Integration, AI & BlockchainPathology is being disrupted by Data Integration, AI & Blockchain
Pathology is being disrupted by Data Integration, AI & Blockchain
 
Proposal for 2016 survey of WGS capacity in EU/EEA Member States
Proposal for 2016 survey of WGS capacity in EU/EEA Member StatesProposal for 2016 survey of WGS capacity in EU/EEA Member States
Proposal for 2016 survey of WGS capacity in EU/EEA Member States
 
BIOINFORMATICS Applications And Challenges
BIOINFORMATICS Applications And ChallengesBIOINFORMATICS Applications And Challenges
BIOINFORMATICS Applications And Challenges
 
Building bioinformatics resources for the global community
Building bioinformatics resources for the global communityBuilding bioinformatics resources for the global community
Building bioinformatics resources for the global community
 
Data for AI models, the past, the present, the future
Data for AI models, the past, the present, the futureData for AI models, the past, the present, the future
Data for AI models, the past, the present, the future
 
WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015
WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015
WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015
 
Career oppurtunities in the field of Bioinformatics
Career oppurtunities in the field of BioinformaticsCareer oppurtunities in the field of Bioinformatics
Career oppurtunities in the field of Bioinformatics
 
Pallavi online assignment
Pallavi online assignmentPallavi online assignment
Pallavi online assignment
 
Bioinformatics lecture 1
Bioinformatics lecture 1Bioinformatics lecture 1
Bioinformatics lecture 1
 
Introduction to Cancer Genomics Databases
Introduction to Cancer Genomics DatabasesIntroduction to Cancer Genomics Databases
Introduction to Cancer Genomics Databases
 
Basics of Data Analysis in Bioinformatics
Basics of Data Analysis in BioinformaticsBasics of Data Analysis in Bioinformatics
Basics of Data Analysis in Bioinformatics
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
NetBioSIG2013-Talk Robin Haw
NetBioSIG2013-Talk Robin Haw NetBioSIG2013-Talk Robin Haw
NetBioSIG2013-Talk Robin Haw
 

Similar a Open data genomics_palermo_2017_ver03

Grand round whsiao_may2015
Grand round whsiao_may2015Grand round whsiao_may2015
Grand round whsiao_may2015IRIDA_community
 
How Can We Make Genomic Epidemiology a Widespread Reality? - William Hsiao
How Can We Make Genomic Epidemiology a Widespread Reality?  - William HsiaoHow Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao
How Can We Make Genomic Epidemiology a Widespread Reality? - William HsiaoWilliam Hsiao
 
Bioinformatics Introduction
Bioinformatics IntroductionBioinformatics Introduction
Bioinformatics IntroductionDavid Montaner
 
Biocuration activities for the International Cancer Genome Consortium (ICGC).
Biocuration activities for the International Cancer Genome Consortium (ICGC).Biocuration activities for the International Cancer Genome Consortium (ICGC).
Biocuration activities for the International Cancer Genome Consortium (ICGC).Neuro, McGill University
 
tranSMART Community Meeting 5-7 Nov 13 - Session 3: The TraIT user stories fo...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: The TraIT user stories fo...tranSMART Community Meeting 5-7 Nov 13 - Session 3: The TraIT user stories fo...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: The TraIT user stories fo...David Peyruc
 
Will Biomedical Research Fundamentally Change in the Era of Big Data?
Will Biomedical Research Fundamentally Change in the Era of Big Data?Will Biomedical Research Fundamentally Change in the Era of Big Data?
Will Biomedical Research Fundamentally Change in the Era of Big Data?Philip Bourne
 
Ontology for the Financial Services Industry
Ontology for the Financial Services IndustryOntology for the Financial Services Industry
Ontology for the Financial Services IndustryBarry Smith
 
Workshop finding and accessing data - fiona - lunteren april 18 2016
Workshop   finding and accessing data - fiona - lunteren april 18 2016Workshop   finding and accessing data - fiona - lunteren april 18 2016
Workshop finding and accessing data - fiona - lunteren april 18 2016Fiona Nielsen
 
Advanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchAdvanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchEuropean Bioinformatics Institute
 
Amia tb-review-08
Amia tb-review-08Amia tb-review-08
Amia tb-review-08Russ Altman
 
bioinfomatics
bioinfomaticsbioinfomatics
bioinfomaticsnguyenpg
 
PhenoMeNal: Large scale computing with medical metabolic phenotyping data
PhenoMeNal: Large scale computing with medical metabolic phenotyping dataPhenoMeNal: Large scale computing with medical metabolic phenotyping data
PhenoMeNal: Large scale computing with medical metabolic phenotyping dataChristoph Steinbeck
 
Life sciences big data use cases
Life sciences big data use casesLife sciences big data use cases
Life sciences big data use casesGuy Coates
 
The Big Picture: The Industrial Revolutiona talk in berlin, 2008, about indus...
The Big Picture: The Industrial Revolutiona talk in berlin, 2008, about indus...The Big Picture: The Industrial Revolutiona talk in berlin, 2008, about indus...
The Big Picture: The Industrial Revolutiona talk in berlin, 2008, about indus...robertstevens65
 

Similar a Open data genomics_palermo_2017_ver03 (20)

Grand round whsiao_may2015
Grand round whsiao_may2015Grand round whsiao_may2015
Grand round whsiao_may2015
 
How Can We Make Genomic Epidemiology a Widespread Reality? - William Hsiao
How Can We Make Genomic Epidemiology a Widespread Reality?  - William HsiaoHow Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao
How Can We Make Genomic Epidemiology a Widespread Reality? - William Hsiao
 
Nov 2014 ouellette_windsor_icgc_final
Nov 2014 ouellette_windsor_icgc_finalNov 2014 ouellette_windsor_icgc_final
Nov 2014 ouellette_windsor_icgc_final
 
Bioinformatics Introduction
Bioinformatics IntroductionBioinformatics Introduction
Bioinformatics Introduction
 
NGS and the molecular basis of disease: a practical view
NGS and the molecular basis of disease: a practical viewNGS and the molecular basis of disease: a practical view
NGS and the molecular basis of disease: a practical view
 
JALANov2000
JALANov2000JALANov2000
JALANov2000
 
Biocuration activities for the International Cancer Genome Consortium (ICGC).
Biocuration activities for the International Cancer Genome Consortium (ICGC).Biocuration activities for the International Cancer Genome Consortium (ICGC).
Biocuration activities for the International Cancer Genome Consortium (ICGC).
 
2015 04 22_time_labs_shared
2015 04 22_time_labs_shared2015 04 22_time_labs_shared
2015 04 22_time_labs_shared
 
tranSMART Community Meeting 5-7 Nov 13 - Session 3: The TraIT user stories fo...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: The TraIT user stories fo...tranSMART Community Meeting 5-7 Nov 13 - Session 3: The TraIT user stories fo...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: The TraIT user stories fo...
 
Will Biomedical Research Fundamentally Change in the Era of Big Data?
Will Biomedical Research Fundamentally Change in the Era of Big Data?Will Biomedical Research Fundamentally Change in the Era of Big Data?
Will Biomedical Research Fundamentally Change in the Era of Big Data?
 
Ontology for the Financial Services Industry
Ontology for the Financial Services IndustryOntology for the Financial Services Industry
Ontology for the Financial Services Industry
 
Workshop finding and accessing data - fiona - lunteren april 18 2016
Workshop   finding and accessing data - fiona - lunteren april 18 2016Workshop   finding and accessing data - fiona - lunteren april 18 2016
Workshop finding and accessing data - fiona - lunteren april 18 2016
 
KnetMiner - EBI Workshop 2017
KnetMiner - EBI Workshop 2017KnetMiner - EBI Workshop 2017
KnetMiner - EBI Workshop 2017
 
Use of data
Use of dataUse of data
Use of data
 
Advanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchAdvanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven Research
 
Amia tb-review-08
Amia tb-review-08Amia tb-review-08
Amia tb-review-08
 
bioinfomatics
bioinfomaticsbioinfomatics
bioinfomatics
 
PhenoMeNal: Large scale computing with medical metabolic phenotyping data
PhenoMeNal: Large scale computing with medical metabolic phenotyping dataPhenoMeNal: Large scale computing with medical metabolic phenotyping data
PhenoMeNal: Large scale computing with medical metabolic phenotyping data
 
Life sciences big data use cases
Life sciences big data use casesLife sciences big data use cases
Life sciences big data use cases
 
The Big Picture: The Industrial Revolutiona talk in berlin, 2008, about indus...
The Big Picture: The Industrial Revolutiona talk in berlin, 2008, about indus...The Big Picture: The Industrial Revolutiona talk in berlin, 2008, about indus...
The Big Picture: The Industrial Revolutiona talk in berlin, 2008, about indus...
 

Último

Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringPrajakta Shinde
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPirithiRaju
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPirithiRaju
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationColumbia Weather Systems
 
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXDole Philippines School
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxMurugaveni B
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naJASISJULIANOELYNV
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...Universidade Federal de Sergipe - UFS
 
basic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomybasic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomyDrAnita Sharma
 
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024Jene van der Heide
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayupadhyaymani499
 
trihybrid cross , test cross chi squares
trihybrid cross , test cross chi squarestrihybrid cross , test cross chi squares
trihybrid cross , test cross chi squaresusmanzain586
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationColumbia Weather Systems
 

Último (20)

Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical Engineering
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdf
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather Station
 
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdf
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 
Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by na
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
 
basic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomybasic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomy
 
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyay
 
trihybrid cross , test cross chi squares
trihybrid cross , test cross chi squarestrihybrid cross , test cross chi squares
trihybrid cross , test cross chi squares
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather Station
 

Open data genomics_palermo_2017_ver03

  • 1. Open Data is Essential for Personalized Medicine BF Francis Ouellette https://goo.gl/8U1QJa
  • 2. Open Data is Essential for Genomics This presentation is on: https://www.slideshare.net/
  • 3. 3Module #: Title of Module
  • 4. Open Data is Essential for Genomics
  • 5. Open Data is Essential for Genomics @bffo francis@genomequebec.comE-mail
  • 6. Open Data is Essential for Genomics Times I’ve been in Italy • Trieste 1996: Last Yeast Genome Meeting • Naples 2005: NETTAB “Workflows management: new abilities for the biological information overflow” • Rome 2017: Elixir • Palermo 2017: NETTAB
  • 7. Open Data is Essential for Genomics Outline • What I do • Open Data in genomics • Final thoughts
  • 8. Open Data is Essential for Genomics But first, a little about me … … an unfinished story!
  • 9. Open Data is Essential for Genomics https://goo.gl/anu933
  • 10. Open Data is Essential for Genomics http://goo.gl/dJIur
  • 11. Open Data is Essential for Genomics http://goo.gl/LwVOZ
  • 12. Open Data is Essential for Genomics http://goo.gl/QI6aL
  • 13. Open Data is Essential for Genomics http://goo.gl/mYHFO
  • 14. Open Data is Essential for Genomics http://goo.gl/Jc5TK
  • 15. Open Data is Essential for Genomics https://goo.gl/3PFr7L 1993-1997
  • 16. Open Data is Essential for Genomics from the National Centre for Biotechnology Information
  • 17. Open Data is Essential for Genomics from the National Centre for Biotechnology Information
  • 18. Open Data is Essential for Genomics from the National Centre for Biotechnology Information PANIC
  • 19. Open Data is Essential for Genomics
  • 20. Open Data is Essential for Genomics PANIC
  • 21. Open Data is Essential for Genomics PANIC
  • 22. Open Data is Essential for Genomics
  • 23. Open Data is Essential for Genomics https://www.ubc.ca/
  • 24. Open Data is Essential for Genomics 1999
  • 25. Open Data is Essential for Genomics 2001: Human Genome Project
  • 26. Open Data is Essential for Genomics 2003-2007
  • 27. Open Data is Essential for Genomics
  • 28. Open Data is Essential for Genomics Toronto
  • 29. Open Data is Essential for Genomics 2007-2017
  • 30. Open Data is Essential for Genomics International Cancer Genome Consortium
  • 31. Open Data is Essential for Genomics http://goo.gl/dJIur
  • 32. Open Data is Essential for Genomics 2017- …
  • 33. Open Data is Essential for Genomics
  • 34. Open Data is Essential for Genomics SABs, EBs & projects I’m on:
  • 35. Open Data is Essential for Genomics
  • 36. Open Data is Essential for Genomics So what unifies all of what I’ve done?
  • 37. Open Data is Essential for Genomics So what unifies all of what I’ve done? Helping scientists do science.
  • 38. Open Data is Essential for Genomics Open Data https://goo.gl/Z63Wxp
  • 39. Open Data is Essential for Genomics Genomics https://goo.gl/MX84KA
  • 40. Open Data is Essential for Genomics What am I calling “Genomics”? All “omics” – DNA and RNA, +Epigenomics – Proteomics, +Protein Interactions, +Pathways – Metabolomics – Bioinformatics/Computational Biology – All of the related data and metadata • Phenotype • Clinical • Images – New technologies …
  • 41. Open Data is Essential for Genomics Biological scope? • Anything with DNA or RNA or protein
  • 42. Open Data is Essential for Genomics
  • 43. Open Data is Essential for Genomics Example of one of a challenge for all of us? The integration of genomic data with deep learning and artificial intelligence
  • 44. Open Data is Essential for Genomics AI, Big Data, Deep Computing • Artificial Intelligence / Deep Learning and the Big Data Hype? https://goo.gl/WHg36Q
  • 45. Open Data is Essential for Genomics What do we need for that? https://goo.gl/JWpXj2
  • 46. Open Data is Essential for Genomics What do we need for that? https://goo.gl/JWpXj2
  • 47. Open Data is Essential for Genomics What else? • Data has to be FAIR – TO BE FINDABLE – TO BE ACCESSIBLE – TO BE INTEROPERABLE – TO BE RE-USABLE • https://www.force11.org/group/fairgroup/fairprinciples
  • 48. Open Data is Essential for Genomics Big data examples • Genomic sequences • Imaging • Population scale collected wearable data
  • 49. Open Data is Essential for Genomics Data Center for all in Québec? • Health Care in Canada is governed province by province. • Génome Québec is working with various ministries to set something that could be useful/centralized and make genomic data usable for research (controlled access). • Needs to include clinical data
  • 50. Open Data is Essential for Genomics “Building a data centre is like making pancakes, you always need to throw away the 1st one” Robert Grossman Frederick H. Rawson Professor and the Director of the Center for Data Intensive Science (CDIS) at the University of Chicago http://rgrossman.com/
  • 51. Open Data is Essential for Genomics Sharing all data types, including clinical data? https://goo.gl/ofEPeX
  • 52. Open Data is Essential for Genomics Authors present at the “Toronto meeting” https://goo.gl/ofEPeX
  • 53. Open Data is Essential for Genomics 53 Introduction 1.0 Open data critical to progress in Science
  • 54. Open Data is Essential for Genomics 54 Introduction 1.0 One example: GenBank GenBank sequence database is an open access, annotated collection of all publicly available nucleotide sequences and their protein translations.
  • 55. Open Data is Essential for Genomics 55 Introduction 1.0 Open data critical to progress in Science • Without GenBank and other public sequence databases – There would be no BLAST – There would be no diagnostics DNA testing – There would be no understanding of the human genome (there probably would not have been a human genome to work on in the first place).
  • 56. Open Data is Essential for Genomics Adapted from Niko Beerenwinkel ,Chris D. Greenman ,Jens Lagergren ICGC PCAWG Docker Testing Computational Cancer Biology: An Evolutionary Perspective •Published: February 4, 2016. https://doi.org/10.1371/journal.pcbi.1004717
  • 57. Open Data is Essential for Genomics Cancer is a Disease of the Genome Challenge in Treating Cancer:  Every tumour is different  Every cancer patient is different Adapted from Tom Hudsonhttps://www.cancer.gov/research/areas/genomics
  • 58. Open Data is Essential for Genomics Analysis Data Types • Simple Somatic Mutations (SSM or SNV) • Copy Number Alterations (CAN or CNV) • Structural Variants (SV) • Germline variants (SNPs) • Gene Expression (micro-arrays and RNASeq) • miRNA Expression (RNASeq) • Epigenomics (Arrays and Methylation) • Splicing Variation (RNASeq) • Protein Expression (Arrays)
  • 59. Open Data is Essential for Genomics International Cancer Genome Consortium • Collect ~500 tumour/normal pairs from each of 50 different major cancer types; 25,000 T/N pairs! • Comprehensive genome analysis of each T/N pair: – Genome – Transcriptome – Methylome – Clinical data • Make the data available to the research community & public. Identify genome changes …GATTATTCCAGGTAT… …GATTATTGCAGGTAT… …GATTATTGCAGGTAT… Adapted from Tom Hudson
  • 60. ONTARIO INSTITUTE FOR CANCER RESEARC 60
  • 61. Open Data is Essential for Genomics International Cancer Genome Consortium: http:/icgc.org
  • 62. Open Data is Essential for Genomics ICGC needs to deal with different kinds of users! 62 • Biologists/Clinicians: – Web interface to processed data, providing: • Affected gene lists with consequences • Impact on pathways • Power users: – Application Programing Interface (API) to get to data – Availability and Integration with cloud resources
  • 63. Open Data is Essential for Genomics ICGC Data Coordinating Centre: dcc.icgc.org 63
  • 64. Open Data is Essential for Genomics https://dcc.icgc.org/ 64
  • 65. Open Data is Essential for Genomics 65 https://dcc.icgc.org/icgc-in-the-cloud
  • 66. Open Data is Essential for Genomics 66 http://www.cancercollaboratory.org/
  • 67. Open Data is Essential for Genomics Some challenges: 67 • So, we have lots of data, is it generated the same way?
  • 68. Open Data is Essential for Genomics Every country/group has basically been submitting: 68 – Simple Somatic Mutations (SSM or SNV) – Copy Number Alterations (CAN or CNV) – Structural Variants (SV) – Germline variants (SNPs) – Gene Expression (micro-arrays and RNASeq) – miRNA Expression (RNASeq) – Epigenomics (Arrays and Methylation) – Splicing Variation (RNASeq) – Protein Expression (Arrays)
  • 69. Open Data is Essential for Genomics Are they all using the same pipelines? 69 • No
  • 70. Open Data is Essential for Genomics 70
  • 71. Open Data is Essential for Genomics Steering Committee of PCAWG 71 • Peter Campbell, Sanger Inst. • Gady Getz, Broad • Jan Korbel, EMBL • Lincoln Stein, OICR • Josh Stuart, UCSC
  • 72. Open Data is Essential for Genomics PanCancer Analysis of Whole Genomes (PCAWG) • > 2,800 T/N pairs with clinical data from 20 tumour type of whole genome analysis. • Aligned with one standard pipeline. • Genomic Variants determined with 3 pipelines • 17 working groups • > 50 Papers are being written now.
  • 73. Open Data is Essential for Genomics https://www.biorxiv.org/search/pcawg
  • 74. Open Data is Essential for Genomics Deliverable for PCAWG include: 74 • 1st PANCANCER analysis on > 2,800 cancer tumours from a WGS perspective • RNA, SSM, CNV, Methylation analysis & germline • Published (executable) pipelines – Docker / Dockstore – Mutiple cloud access to data – Multiple portal access to data
  • 75. Open Data is Essential for Genomics https://dcc.icgc.org/pcawg 75
  • 76. Open Data is Essential for Genomics Working Groups (1/2) 76 1. Novel somatic mutation calling methods 2. Analysis of mutations in regulatory regions 3. Integration of transcriptome and genome 4. Integration of epigenome and genome 5. Consequences of somatic mutations on pathway and network activity 6. Patterns of structural variations, signatures, genomic correlations, retrotransposons, mobile elements 7. Mutation signatures and processes 8. Germline cancer genome
  • 77. Open Data is Essential for Genomics Working Groups (2/2) 77 9 Inferring driver mutations and identifying cancer genes and pathways 10 Translating cancer genomes to the clinic 11 Evolution and heterogeneity 12 Exploratory: portals, visualization and software infrastructure 13 Molecular subtypes and classification 14 Analysis of mutations in non-coding RNA 15 Exploratory: mitochondrial 16 Exploratory: pathogens 17 Tech Technical working group
  • 78. Open Data is Essential for Genomics https://goo.gl/AMxwSU
  • 79. Open Data is Essential for Genomics https://goo.gl/AMxwSU
  • 80. Open Data is Essential for Genomics https://goo.gl/AMxwSU
  • 81. Open Data is Essential for Genomics https://goo.gl/AMxwSU
  • 82. Open Data is Essential for Genomics http://dockstore.org 82
  • 83. Open Data is Essential for Genomics Docker Testing Group • Group that to ensure all container workflow work as expected. https://goo.gl/AMxwSU
  • 84. Open Data is Essential for Genomics Access to Data? • Human Data • Patients consented to have their DNA looked at so people could understand cancer • Need to have a system to maximize people’s gift to science.
  • 85. Open Data is Essential for Genomics
  • 86. Open Data is Essential for Genomics Identify yourself Fill out detail form which includes: • Contact and Project Information •Information Technology details and procedures for keeping data secure •Data Access Agreement All of these documents are put into a PDF file that you print and get your institution to sign off on your behalf
  • 87. Open Data is Essential for Genomics
  • 88. Open Data is Essential for Genomics
  • 89. Open Data is Essential for Genomics 89 https://icgc.org/daco/approved-projects 314 groups
  • 90. Open Data is Essential for Genomics DACO ICGC dbGaP GDC EGA TCGA BAM Open Open ERA BA M BA M EGA id & password WGS Ger m Line
  • 91. Open Data is Essential for Genomics Challenge: • Open Data and controlled access data • Not enough eyeballs on the data • Eyeballs on the data needed to make discoveries. https://goo.gl/ogbWXG
  • 92. Open Data is Essential for Genomics Culture of Sharing Openly • Public Funding agencies • Consortiums • Mentors • Peers • New generation (vs my old generation) • Has to become the norm
  • 93. Open Data is Essential for Genomics Final thoughts … • Access to data is essential for science • Getting data that is FAIR is hard work • It is essential to share the work you do if you want to be recognized, get tenure, get a job or a promotion. • Human data is more complicated, but don’t let that get in the way! • There is a lot of material out there, learn from it (& cite your sources)!
  • 94. Open Data is Essential for Genomics Last message to students and young PDFs and investigators:
  • 95. Open Data is Essential for Genomics Last message to students and young PDFs and investigators: Be open so people can see how great you are!
  • 96. ONTARIO INSTITUTE FOR CANCER RESEARC 96 915
  • 97. Open Data is Essential for Genomics DCC Software Developer Vincent Ferretti Dusan Andric Phuong-My Do Francois Gerthoffert Terry Lin Michael Moncada Vitalii Slobodianyk Bob Tiernay Douglas Wong Linda Xiang Junjun Zhang Acknowledgments ICGC/OICR Project leaders: Tom Hudson John McPherson Lincoln Stein Jared Simpson Paul Boutros Vincent Ferretti Francis Ouellette Jennifer Jennings Ouellette Lab Alysha Moncrieffe Ann Meyer Zhibin Lu Web Dev Joseph Yamada Kaman Wu Kim Cullion Koji Miyauchi Miyuki Fukuma ICGC DCC Biocuration Hardeep Nahal Marc Perry http://oicr.on.ca http://icgc.org … and all the patients and their families that that are putting their hopes into our work! Research IT/Systems David Sutton, Bob Gibson David Magda Rob Naccarato Brian Ott Gino Yearwood EGA Jordi Rambla De Argila Arcadi Navarro Audald Iloret Mauricio Moldes
  • 98. | ÉQUIPE DES AFFAIRES SCIENTIFIQUES 9827 mars 2017 B.F. Francis Ouellette Annina Spilker Joël Savard Diana IglesiasDiane Bouchard Cristina CiurliMicheline Ayoub Hélène Fournier
  • 99. Open Data is Essential for Genomics 99 Grazie
  • 100. Open Data is Essential for Genomics 100