Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Pride and ProteomeXchange
1. PRIDE resources and ProteomeXchange
Dr. Juan Antonio Vizcaíno
Proteomics Team Leader
EMBL-EBI
Hinxton, Cambridge, UK
2. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
Data resources at EMBL-EBI
Genes, genomes & variation
ArrayExpress
Expression Atlas PRIDE
InterPro Pfam UniProt
ChEMBL ChEBI
Molecular structures
Protein Data Bank in Europe
Electron Microscopy Data Bank
European Nucleotide Archive
European Variation Archive
European Genome-phenome Archive
Gene & protein expression
Protein sequences, families & motifs
Chemical biology
Reactions, interactions &
pathways
IntAct Reactome MetaboLights
Systems
BioModels Enzyme Portal BioSamples
Ensembl
Ensembl Genomes
GWAS Catalog
Metagenomics portal
Europe PubMed Central
Gene Ontology
Experimental Factor
Ontology
Literature & ontologies
3. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
• PRIDE Archive (in the context of ProteomeXchange
and the PSI standards)
• How to submit data to PRIDE: PRIDE tools
• How to access data in PRIDE Archive
• PRIDE Cluster and PRIDE Proteomes
Overview
4. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
• PRIDE Archive (in the context of
ProteomeXchange and the PSI standards)
• How to submit data to PRIDE: PRIDE tools
• How to access data in PRIDE Archive
• PRIDE Cluster and PRIDE Proteomes
Overview
5. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
• PRIDE stores mass spectrometry (MS)-based
proteomics data:
• Peptide and protein expression data
(identification and quantification)
• Post-translational modifications
• Mass spectra (raw data and peak lists)
• Technical and biological metadata
• Any other related information
• Full support for tandem MS approaches
• Any type of data can be stored.
PRIDE (PRoteomics IDEntifications) Archive
http://www.ebi.ac.uk/pride/archive
Martens et al., Proteomics, 2005
Vizcaíno et al., NAR, 2016
6. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
Data content in PRIDE Archive
• Submission driven resource
• PRIDE is split in datasets (group of assays)
• An assay represents one MS run (in most cases).
• No data reprocessing at present. PRIDE aims to represent
the author’s view on the data
• Supported formats: PRIDE XML and mzIdentML.
• Raw data is also now stored
7. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
ProteomeXchange: A Global, distributed proteomics
database
PASSEL
(SRM data)
PRIDE
(MS/MS data)
MassIVE
(MS/MS data)
Raw
ID/Q
Meta
jPOST
(MS/MS data)
Mandatory raw data deposition
since July 2015
• Goal: Development of a framework to allow standard data submission and
dissemination pipelines between the main existing proteomics repositories.
http://www.proteomexchange.org
New in 2016
Vizcaíno et al., Nat Biotechnol, 2014
Deutsch et al., NAR, 2017, in press
8. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
ProteomeCentral
Metadata /
Manuscript
Raw Data
Results
Journals
Peptide Atlas
Receiving repositories
PRIDE
Researcher’s results
Raw data
Metadata
PASSEL
Research
groups
Reanalysis of datasets
MassIVE
jPOST
MS/MS
data
(as complete
submissions)
Any other
workflow
(mainly partial
submissions)
DATASETS
SRM
data
Reprocessed results
MassIVE
ProteomeXchange data workflow
Vizcaíno et al., Nat Biotechnol, 2014
Deutsch et al., NAR, 2017, in press
9. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
ProteomeCentral
Metadata /
Manuscript
Raw Data
Results
Journals
UniProt/
neXtProtPeptide Atlas
Other DBs
Receiving repositories
PRIDE
GPMDBResearcher’s results
Raw data
Metadata
PASSEL
proteomicsDB
Research
groups
Reanalysis of datasets
MassIVE
jPOST
MS/MS
data
(as complete
submissions)
Any other
workflow
(mainly partial
submissions)
DATASETS
OmicsDI
Integration with other
omics datasets
SRM
data
Reprocessed results
MassIVE
ProteomeXchange data workflow
10. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
PRIDE: Source of MS proteomics data
• PRIDE Archive already provides or
will soon provide MS proteomics
data to other EMBL-EBI resources
such as UniProt, Ensembl and the
Expression Atlas.
http://www.ebi.ac.uk/pride
11. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
• PRIDE Archive (in the context of ProteomeXchange
and the PSI standards)
• How to submit data to PRIDE: PRIDE tools
• How to access data in PRIDE Archive
• A sneak peak to other PRIDE resources
Overview
12. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
ProteomeCentral
Metadata /
Manuscript
Raw Data
Results
Journals
Peptide Atlas
Receiving repositories
PRIDE
Researcher’s results
Raw data
Metadata
PASSEL
Research
groups
Reanalysis of datasets
MassIVE
jPOST
MS/MS
data
(as complete
submissions)
Any other
workflow
(mainly partial
submissions)
DATASETS
SRM
data
Reprocessed results
MassIVE
ProteomeXchange data workflow
13. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
Complete
Partial
Complete vs Partial submissions: processed results
For complete submissions, it is possible to connect the spectra with the identification
processed results and they can be visualized.
14. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
Complete vs Partial submissions: experimental metadata
Complete Partial
General experimental metadata about the projects is similar.
However, at the assay level information in partial submissions is not so detailed
15. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
How to perform a complete PX submission to PRIDE
• Decide between a complete/partial submission
• File conversion/export to mzIdentML (or PRIDE XML)
• File check before submission (PRIDE Inspector)
• Experimental annotation and actual file submission (PX
submission tool)
• Post-submission steps
16. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
PX Data workflow for MS/MS data
1. Mass spectrometer output files: raw data (binary files) or
peak list spectra in a standardized format (mzML, mzXML).
2. Result files:
a. Complete submissions: Result files can be converted to
the mzIdentML data standard (or PRIDE XML).
b. Partial submissions: For workflows not yet supported by
PRIDE, search engine output files will be stored and
provided in their original form.
3. Metadata: Sufficiently detailed description of sample origin,
workflow, instrumentation, submitter.
4. Other files: Optional files:
a. QUANT: Quantification related results e. FASTA
b. PEAK: Peak list files f. SP_LIBRARY
c. GEL: Gel images
d. OTHER: Any other file type
Published
Raw
Files
Other
files
17. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
PX Data workflow for MS/MS data
1. Mass spectrometer output files: raw data (binary files) or
peak list spectra in a standardized format (mzML, mzXML).
2. Result files:
a. Complete submissions: Result files can be converted to
the mzIdentML data standard (or PRIDE XML).
b. Partial submissions: For workflows not yet supported by
PRIDE, search engine output files will be stored and
provided in their original form.
3. Metadata: Sufficiently detailed description of sample origin,
workflow, instrumentation, submitter.
4. Other files: Optional files (the list can be extended):
a. QUANT: Quantification related results e. FASTA
b. PEAK: Peak list files f. SP_LIBRARY
c. GEL: Gel images
d. OTHER: Any other file type
Published
Raw
Files
Other
files
18. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
PRIDE Components: Data Submission Process
PRIDE Converter 2
PRIDE Inspector PX Submission Tool
mzIdentML
PRIDE XML
In addition to PRIDE Archive, the PRIDE team develops
and maintains different tools and software libraries to
facilitate the handling and visualisation of MS proteomics
data and the submission process
19. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
Tools ‘RESULT’ file generation Final ‘RESULT’ file
mzIdentML
‘RESULT’
Native file export to mzIdentML
Spectra
files
(mzML,
mzXML,
mzData,
mgf,
pkl,
ms2,
dta, apl)
Mascot
ProteinPilot
Scaffold
PEAKS
MSGF+
Others
Native File export
20. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
Complete submissions
Search
Engine
Results +
MS files
Search
engines
mzIdentML
- Mascot
- MSGF+
- MyriMatch and related tools from D. Tabb’s lab
- OpenMS
- PEAKS
- PeptideShaker
- ProCon (ProteomeDiscoverer, Sequest)
- Scaffold
- TPP via the idConvert tool (ProteoWizard)
- ProteinPilot (from version 5.0)
- X!Tandem native conversion (Beta,
PILEDRIVER)
- Others: library for X!Tandem conversion, lab
internal pipelines, …
- Crux
- Soon: ProteomeDiscoverer (Thermo)
An increasing number of tools support export to mzIdentML 1.1
- Referenced spectral files need to be submitted as well
(all open formats are supported).
Updated list: http://www.psidev.info/tools-implementing-mzIdentML#.
21. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
Tools ‘RESULT’ file generation Final ‘RESULT’ file
mzTab
‘RESULT’
Coming soon: Support for mzTab
Spectra
files
(mzML,
mzXML,
mzData,
mgf,
pkl,
ms2,
dta, apl)
Mascot
MaxQuant
Others
Native File export
22. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
PRIDE Components: Submission Process
PRIDE Converter 2
PRIDE Inspector PX Submission Tool
mzIdentML
PRIDE XML
2
23. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
PRIDE Inspector Toolsuite
Wang et al., Nat. Biotechnology, 2012
Perez-Riverol et al., MCP, 2016
PRIDE Inspector
PRIDE Inspector 2 supports:
- PRIDE XML
- mzIdentML + all types of spectra files
- mzML
- mzTab identification and Quantification (+
all types of spectra files)
https://github.com/PRIDE-Toolsuite/
24. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
PRIDE Inspector Toolsuite
PRIDE Inspector 2
https://github.com/PRIDE-Toolsuite/
New visualisation
functionality for Protein
Groups
25. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
PRIDE Components: Submission Process
PRIDE Converter 2
PRIDE Inspector PX Submission Tool
mzIdentML
PRIDE XML
3
26. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
PX Submission Tool
Desktop application for data
submissions to ProteomeXchange via
PRIDE
• Implemented in Java 7
• Streamlines the submission process
• Capture mappings between files
• Retain metadata
• Fast file transfer with Aspera (FASP®
transfer technology) – FTP also
available
• Command line option
Submission tool screenshot
28. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
PRIDE Archive – over 5,000 datasets from
over 51 countries and 2,000 groups
• USA – 814 datasets
• Germany – 528
• UK – 338
• China – 328
• France – 222
• Netherlands – 175
• Canada - 137
Data volume:
• Total: ~275 TB
• Number of all files: ~560,000
• PXD000320-324: ~ 4 TB
• PXD002319-26 ~2.4 TB
• PXD001471 ~1.6 TB
• 1,973 datasets i.e. 52% of
all are publicly accessible
• ~90% of all
ProteomeXchange datasets
YearSubmissions
All submissions
Complete
PRIDE Archive growth
In the last 12 months: ~165 submitted datasets per month
Top Species studied by at least 100
datasets:
2,010 Homo sapiens
604 Mus musculus
191 Saccharomyces cerevisiae
140 Arabidopsis thaliana
127 Rattus norvegicus
>900 reported taxa in total
29. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
Public data release: when does it happen?
• When the author tells us to do it (the authors can do it by
themselves)
• When we find out that a dataset has been published
• We look for PXD identifiers in PubMed abstracts.
• If your PXD identifier is not in the abstract, a paper may have
been published and the data is still private. Let us know!
• New web form in the PRIDE web to facilitate the process
30. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
Partial submissions can be used to store
other data types
• Everything can be stored, not only MS/MS data: very flexible
mechanism to be able to capture all types of datasets.
• PRIDE does not actively store SRM data (PASSEL).
• Top down proteomics datasets.
• Mass Spectrometry Imaging datasets.
• Data independent acquisition techniques: e.g. SWATH-MS, MSE,
HD-MSE, etc.
31. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
C
D
From original publication [13] Reconstructed ProteomeXchange data
1. Thermo RAW data / UDP
2. Mirion Software (JLU)
1. Thermo RAW data / UDP
2. Convert to imzML
3. Upload to PRIDE
(EBI, Cambridge, UK)
4. Download from PRIDE
5. Display in MSiReader
- Vendor-independent data format
- Freely available software (open source)
- ‘open data‘ – free to reuse
- Anybody can do this!
A public repository for mass spectrometry imaging data
Römpp et al., 2015
PRIDE
database
European
Bioinformatics
Institute,
Cambridge, UK
3. Upload
4. Download
32. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
• PRIDE Archive (in the context of ProteomeXchange
and the PSI standards)
• How to submit data to PRIDE: PRIDE tools
• How to access data in PRIDE Archive
• PRIDE Cluster and PRIDE Proteomes
Overview
33. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
Ways to access data in PRIDE Archive
• PRIDE web interface
• File repository
• REST web service
• PRIDE Inspector tool
36. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
ProteomeCentral
Metadata /
Manuscript
Raw Data
Results
Journals
Peptide Atlas
Receiving repositories
PRIDE
Researcher’s results
Raw data
Metadata
PASSEL
Research
groups
Reanalysis of datasets
MassIVE
jPOST
MS/MS
data
(as complete
submissions)
Any other
workflow
(mainly partial
submissions)
DATASETS
SRM
data
Reprocessed results
MassIVE
ProteomeXchange data workflow
37. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
ProteomeCentral: Centralised portal for all PX
datasets
http://proteomecentral.proteomexchange.org/cgi/GetDataset
38. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
RSS feed and Twitter for following
announcements of public datasets
http://groups.google.com/group/proteomexchange/feed/rss_v2_0_msgs.xml
@proteomexchange
39. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
• PRIDE Archive (in the context of ProteomeXchange
and the PSI standards)
• How to submit data to PRIDE: PRIDE tools
• How to access data in PRIDE Archive
• PRIDE Cluster and PRIDE Proteomes
Overview
40. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
Added value resources: PRIDE Cluster and
PRIDE Proteomes
• Condensed and across-data set, QC-filtered view on
PRIDE data.
• PRIDE Cluster: Peptide centric.
• PRIDE Proteomes: Protein centric (identification data)
41. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
PRIDE Cluster
• Provide an aggregated peptide centric view of PRIDE Archive.
• Hypothesis: same peptide will generate similar MS/MS spectra across
experiments.
• New version of spectral clustering algorithm to reliably group spectra
coming from the same peptide.
• Enables QC of peptide-spectrum matches (PSMs). Infer reliable
identifications by comparing submitted identifications of spectra within a
cluster.
After clustering, a representative spectrum is built for all peptides
consistently identified across different datasets.
Used to build spectral libraries (for 16 species).
Griss et al., Nat. Methods, 2013
Griss et al., Nat. Methods,
2016
42. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
Example: one perfect cluster
- 880 PSMs give the same peptide ID
- 4 species
- 28 datasets
- Same instruments
http://www.ebi.ac.uk/pride/cluster/
43. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
PRIDE Cluster as a Public Data Mining Resource
43
• http://www.ebi.ac.uk/pride/cluster
• Spectral libraries for 16 species.
• All clustering results, as well as specific subsets of interest available.
• Source code (open source) and Java API
44. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
PRIDE Proteomes web interface:
identification info Unique/Shared Peptides
Mass spec-based
sequence coverage
PTM detected ( )
Observed
tissues
Biological vs
Sample Prep
PTMs
http://wwwdev.ebi.ac.uk/pride/proteomes/
45. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
• Main characteristics of PRIDE Archive and
ProteomeXchange
• PX/PRIDE submission workflow for MS/MS data
• PRIDE Inspector
• PX submission tool
• PRIDE/ProteomeXchange has become the de facto
standard for data submission and data availability in
proteomics
• PRIDE Proteomes and PRIDE Cluster: new resources
Conclusions
47. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
Do you want to know a bit more…?
http://www.slideshare.net/JuanAntonioVizcaino
48. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
Aknowledgements: People
Attila Csordas
Tobias Ternent
Gerhard Mayer (de.NBI)
Johannes Griss
Yasset Perez-Riverol
Manuel Bernal-Llinares
Andrew Jarnuczak
Enrique Perez
Former team members, especially
Rui Wang, Florian Reisinger, Noemi
del Toro, Jose A. Dianes & Henning
Hermjakob
Acknowledgements: The PRIDE Team
All data submitters !!!
@pride_ebi