SlideShare una empresa de Scribd logo
1 de 29
ProteomeXchange: data 
deposition and data retrieval made 
easy 
Juan Antonio VIZCAINO, Ph.D. 
PRIDE Group coordinator 
Proteomics Services Group 
European Bioinformatics Institute 
Hinxton, Cambridge 
United Kingdom 
juan@ebi.ac.uk
Overview 
• The ProteomeXchange (PX) consortium 
• Highlights in the last year 
• PRIME-XS datasets
ProteomeXchange Consortium 
• Goal: Development of a framework to allow 
standard data submission and dissemination 
pipelines between the main existing proteomics 
repositories. 
• Includes PeptideAtlas (ISB, Seattle), PRIDE 
(Cambridge, UK) and (very recently) MassIVE 
(UCSD, San Diego). 
• Common identifier space (PXD identifiers) 
• Two supported data workflows: MS/MS and SRM. 
• Main objective: Make life easier for researchers 
http://www.proteomexchange.org
ProteomeXchange data workflow 
ProteomeCentral 
Results 
Raw Data* 
Metadata / 
Manuscript 
PRIDE 
(MS/MS data) 
Journals 
UniProt/ 
neXtProt 
Peptide Atlas 
Other DBs 
Receiving repositories 
PASSEL 
(SRM data) 
Other DBs 
GPMDB 
Researcher’s results 
Reprocessed results 
Raw data* 
Metadata 
MassIVE 
(MS/MS data) 
Vizcaíno et al., Nat Biotechnol, 2014
MassIVE (UCSD) 
http://proteomics.ucsd.edu/service/massive/ 
• Just joined ProteomeXchange on June 2014 
• Only partial submissions. A few datasets so far.
Overview 
• The ProteomeXchange (PX) consortium 
• Highlights in the last year 
• PRIME-XS datasets
PX Data workflow for MS/MS data 
1. Mass spectrometer output files: raw data (binary files) or 
peak list spectra in a standardized format (mzML, mzXML). 
2. Result files: 
a. Complete submissions: Result files can be converted to 
PRIDE XML or the mzIdentML data standard. 
b. Partial submissions: For workflows not yet supported by 
PRIDE, search engine output files will be stored and 
provided in their original form. 
3. Metadata: Sufficiently detailed description of sample origin, 
workflow, instrumentation, submitter. 
4. Other files: Optional files: 
a. QUANT: Quantification related results e. FASTA 
b. PEAK: Peak list files f. SP_LIBRARY 
c. GEL: Gel images 
d. OTHER: Any other file type 
Published 
Raw 
Files 
Other 
files
Complete vs Partial submissions: 
processed results 
For complete submissions, it is possible to connect the spectra with the identification 
processed results and they can be visualized. 
PRIDE XML, mzIdentML supported 
mzTab to come 
Complete Partial
Complete vs Partial submissions: 
experimental metadata 
Complete Partial 
General experimental metadata about the projects is similar. 
However, at the assay level information, in partial submissions is less annotated
Complete submissions using 
mzIdentML 
Search Engine 
Results + MS 
files 
Search 
engines 
mzIdentML 
An increasing number of tools support export to mzIdentML 1.1 
- Mascot 
- MSGF+ 
- Myrimatch and related tools from D. Tabb’s lab 
- OpenMS 
- PEAKS 
- ProCon (ProteomeDiscoverer, Sequest) 
- Scaffold 
- TPP via the idConvert tool (ProteoWizard) 
- ProteinPilot (planned by the end of 2014) 
- Others: library for X!Tandem conversion, lab 
internal pipelines, … 
- Referenced spectral files need to be submitted as well 
(all open formats are supported). 
Updated list: http://www.psidev.info/tools-implementing-mzIdentML#.
Tools ‘RESULT’ file generation Final ‘RESULT’ file 
mzIdentML 
‘RESULT’ 
Now: native file export 
Spectra 
files 
Mascot 
ProteinPilo 
t 
Scaffold 
PEAKS 
MSGF+ 
Others 
Native File export
Original data files ‘RESULT’ file generation Final ‘RESULT’ file 
Search 
output 
files 
Spectra 
files 
PRIDE 
XML 
‘RESULT’ 
Before: file conversion using PRIDE 
Converter 
File conversion 
PRIDE 
Converter
PRIDE Inspector 2 
Wang et al., Nat. Biotechnology, 2012 
PRIDE Inspector 2.0 
PRIDE Inspector 2.0 supports: 
- PRIDE XML 
- mzIdentML + all types of spectra files 
- mzML 
- mzTab (work in progress) 
http://code.google.com/p/pride-toolsuite/ 
wiki/PRIDEInspector
PX submission tool: data submission 
Published 
Raw 
Other 
files 
http://www.proteomexchange.org/submission 
PX 
submission 
tool 
• Capture the mappings between the different types of files. 
• Add the mandatory metadata annotation. 
• Make the file upload process straightforward to the submitter (It transfers all the 
files using Aspera or FTP). 
• Command line alternative: some scripting is needed.
Uploading large datasets: Aspera 
- Aspera is the default file transfer protocol to PRIDE: 
- PX Submission tool 
- Command line 
- Up to 50X faster than FTP 
File transfer speed should 
not be a problem!!
Tutorial manuscript detailing 
the process 
Example dataset: 
PXD000764 
- Title: “Discovery of new CSF biomarkers for meningitis in children” 
- 12 runs: 4 controls and 8 infected samples 
- Identification and quantification data 
http://www.proteomexchange.org/submission Ternent et al., Proteomics, 2014
ProteomeXchange: 1329 datasets up until October 2014 
Origin: 
271 USA 
166 Germany 
115 United Kingdom 
73 Switzerland 
70 China 
68 Netherlands 
67 France 
55 Canada 
44 Spain 
42 Belgium 
33 Sweden 
31 Australia 
31 Denmark 
31 Japan 
20 India 
20 Norway 
19 Taiwan 
17 Ireland 
16 Austria 
14 Finland 
14 Italy 
12 Republic of Korea 
11 Brazil 
9 Russia 
8 Israel 
7 Singapore … 
Type: 
437 PRIDE complete 
792 PRIDE partial 
63 PeptideAtlas/PASSEL complete 
14 MassIVE 
23 reprocessed 
Publicly Accessible: 
691 datasets, 52% of all 
86% PRIDE 
12% PASSEL 
2% MassIVE 
Top Species studied by at least 10 
datasets: 
577 Homo sapiens 
165 Mus musculus 
56 Saccharomyces cerevisiae 
53 Arabidopsis thaliana 
29 Rattus norvegicus 
22 Escherichia coli 
17 Bos taurus 
16 Mycobacterium tuberculosis 
13 Oryza sativa 
13 Drosophila melanogaster 
13 Glycine max 
~ 290 species in total 
Data volume: 
Total: ~55 TB 
Number of all files: ~131,000 
PXD000320-324: ~ 5 TB 
PXD000065: ~ 1.4TB 
Datasets/year: 
2012: 102 
2013: 527 
2014: 700
Overview 
• The ProteomeXchange (PX) consortium 
• Highlights in the last year 
• PRIME-XS datasets
PX submission tool: PRIME-XS tags 
37 Datasets in total (both public and 
private at present): 
- 20 from the Netherlands 
- 4 from UK 
- 2 from Austria, Belgium, Denmark, 
Spain and Switzerland 
- 1 from France and USA.
PRIME-XS are now tagged in PRIDE 
PRIME-XS datasets are now tagged and can be browsed as a group 
http://www.ebi.ac.uk/pride/archive/simpleSearch?q=prime-xs
ProteomeCentral: Portal for all PX 
datasets 
http://proteomecentral.proteomexchange.org/cgi/GetDataset
Which are the most accessed 
datasets? 
PXD Identifier Total Hits Dataset title Publication 
PXD000561 153512 A draft map of the human proteome 
Kim et al., Nature,2014. 
PMID: 24870542 
PXD000851 111587 
Membrane proteomic analysis of 
colorectal cancer tissue 
Kume et al., MCP, 2014. 
PMID:24687888 
PXD000865 51639 
Mass spectrometry based draft of the 
human proteome 
Wilhelm et al., 2014, 
Nature, PMID:24870543
Total Numbers 
Which are the most accessed 
datasets?
Find the desired PRIDE project … 
… inspect the project details …. 
Reshake PRIDE data in 
PeptideShaker 
… and start re-analyzing the data! 
http://peptide-shaker.googlecode.com 
Vaudel M, Burkhart J, Zahedi RP, Berven FS, Sickmann A, Martens L, 
Barsnes H. Nature Biotechnology (in press)
A little bit of perspective 
Berlin 2011 Mallorca 2012 
Annecy 2013 Split 2013
A little bit of perspective 
2011 2012 2013 2014 
PRIDE Inspector PX Submission Tool 
mzIdentML mzQuantML 
PRIDE/PX datasets 
qcML 
mzTab 
PRIDE web (2011) 
PRIDE Converter 
PRIDE Converter 2 
PRIDE Inspector 2 
PRIDE web (2014)
Conclusions 
• ProteomeXchange is widely used. 
– PRIDE contains most of the MS/MS datasets. 
– It has now a new consortium member: 
MassIVE (UCSD). 
– Around half of the datasets are already public. 
• Different open source tools available to 
facilitate the process: 
– File transfer speed should not be a problem 
(Aspera support)
Aknowledgements: People 
Attila Csordas 
Tobias Ternent 
Noemi del Toro 
Rui Wang 
Florian Reisinger 
Jose A. Dianes 
Johannes Griss 
Steven Lewis 
Yasset Perez-Riverol 
Henning Hermjakob 
All previous team members 
ProteomeXchange partners
Acknowledgements: Funding 
@pride_ebi 
pride-ebi@ebi.ac.uk 
pride-support@ebi.ac.uk 
http://www.proteomexchange.org 
http://code.google.com/p/pride-converter-2/

Más contenido relacionado

La actualidad más candente

Extreme Scripting July 2009
Extreme Scripting July 2009Extreme Scripting July 2009
Extreme Scripting July 2009
Ian Foster
 
Semantically supporting data discovery, markup and aggregation in EMODnet
Semantically supporting data discovery, markup and aggregation in EMODnetSemantically supporting data discovery, markup and aggregation in EMODnet
Semantically supporting data discovery, markup and aggregation in EMODnet
Adam Leadbetter
 

La actualidad más candente (20)

Mass spectrometry resources at the EBI
Mass spectrometry resources at the EBIMass spectrometry resources at the EBI
Mass spectrometry resources at the EBI
 
2016 bioinformatics i_databases_wim_vancriekinge
2016 bioinformatics i_databases_wim_vancriekinge2016 bioinformatics i_databases_wim_vancriekinge
2016 bioinformatics i_databases_wim_vancriekinge
 
Data exchange alternatives, GIGA TAG (2009)
Data exchange alternatives, GIGA TAG (2009)Data exchange alternatives, GIGA TAG (2009)
Data exchange alternatives, GIGA TAG (2009)
 
SureChEMBL patent annotations in Open PHACTS
SureChEMBL patent annotations in Open PHACTSSureChEMBL patent annotations in Open PHACTS
SureChEMBL patent annotations in Open PHACTS
 
Extreme Scripting July 2009
Extreme Scripting July 2009Extreme Scripting July 2009
Extreme Scripting July 2009
 
Implementation of GPU-based bioinformatic tools at the ENCODE DCC
Implementation of GPU-based bioinformatic tools at the ENCODE DCCImplementation of GPU-based bioinformatic tools at the ENCODE DCC
Implementation of GPU-based bioinformatic tools at the ENCODE DCC
 
Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...
Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...
Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...
 
Linked Data, Labels, URIs
Linked Data, Labels, URIsLinked Data, Labels, URIs
Linked Data, Labels, URIs
 
Sciunits: Reusable Research Objects
Sciunits: Reusable Research Objects Sciunits: Reusable Research Objects
Sciunits: Reusable Research Objects
 
SureChEMBL and Open PHACTS
SureChEMBL and Open PHACTSSureChEMBL and Open PHACTS
SureChEMBL and Open PHACTS
 
Overview of SureChEMBL
Overview of SureChEMBLOverview of SureChEMBL
Overview of SureChEMBL
 
BioBankCloud: Machine Learning on Genomics + GA4GH @ Med at Scale
BioBankCloud: Machine Learning on Genomics + GA4GH  @ Med at ScaleBioBankCloud: Machine Learning on Genomics + GA4GH  @ Med at Scale
BioBankCloud: Machine Learning on Genomics + GA4GH @ Med at Scale
 
Semantically supporting data discovery, markup and aggregation in EMODnet
Semantically supporting data discovery, markup and aggregation in EMODnetSemantically supporting data discovery, markup and aggregation in EMODnet
Semantically supporting data discovery, markup and aggregation in EMODnet
 
EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)
EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)
EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)
 
Sharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsSharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reports
 
Big data from the LHC commissioning: practical lessons from big science - Sim...
Big data from the LHC commissioning: practical lessons from big science - Sim...Big data from the LHC commissioning: practical lessons from big science - Sim...
Big data from the LHC commissioning: practical lessons from big science - Sim...
 
Scalable Genome Analysis with ADAM
Scalable Genome Analysis with ADAMScalable Genome Analysis with ADAM
Scalable Genome Analysis with ADAM
 
GBIF-Norway status for the 6th European GBIF nodes meeting April 2014
GBIF-Norway status for the 6th European GBIF nodes meeting April 2014GBIF-Norway status for the 6th European GBIF nodes meeting April 2014
GBIF-Norway status for the 6th European GBIF nodes meeting April 2014
 
Experiences to learn from the MS proteomics field
Experiences to learn from the MS proteomics fieldExperiences to learn from the MS proteomics field
Experiences to learn from the MS proteomics field
 
Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014
Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014
Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014
 

Similar a ProteomeXchange: data deposition and data retrieval made easy

What's mine is yours (and vice versa) Data sharing in vibrational spectroscopy
What's mine is yours (and vice versa) Data sharing in vibrational spectroscopyWhat's mine is yours (and vice versa) Data sharing in vibrational spectroscopy
What's mine is yours (and vice versa) Data sharing in vibrational spectroscopy
Alex Henderson
 

Similar a ProteomeXchange: data deposition and data retrieval made easy (20)

PRIDE and ProteomeXchange
PRIDE and ProteomeXchangePRIDE and ProteomeXchange
PRIDE and ProteomeXchange
 
Proteomics public data resources: enabling "big data" analysis in proteomics
Proteomics public data resources: enabling "big data" analysis in proteomicsProteomics public data resources: enabling "big data" analysis in proteomics
Proteomics public data resources: enabling "big data" analysis in proteomics
 
Human microbiome project
Human microbiome projectHuman microbiome project
Human microbiome project
 
Pride and ProteomeXchange
Pride and ProteomeXchangePride and ProteomeXchange
Pride and ProteomeXchange
 
Submitting your data to ProteomeXchange – a mini tutorial
Submitting your data to ProteomeXchange – a mini tutorialSubmitting your data to ProteomeXchange – a mini tutorial
Submitting your data to ProteomeXchange – a mini tutorial
 
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)
 
ProteomeXchange_and_PRIDE_Semmeting_2015
ProteomeXchange_and_PRIDE_Semmeting_2015ProteomeXchange_and_PRIDE_Semmeting_2015
ProteomeXchange_and_PRIDE_Semmeting_2015
 
Standarization in Proteomics: From raw data to metadata files
Standarization in Proteomics: From raw data to metadata filesStandarization in Proteomics: From raw data to metadata files
Standarization in Proteomics: From raw data to metadata files
 
AHUPO_Vizcaino_remote_presentation_082014
AHUPO_Vizcaino_remote_presentation_082014AHUPO_Vizcaino_remote_presentation_082014
AHUPO_Vizcaino_remote_presentation_082014
 
The ProteomeXchange Consoritum: 2017 update
The ProteomeXchange Consoritum: 2017 updateThe ProteomeXchange Consoritum: 2017 update
The ProteomeXchange Consoritum: 2017 update
 
2014 genome informatics Linked Data
2014 genome informatics Linked Data2014 genome informatics Linked Data
2014 genome informatics Linked Data
 
Data submissions and archiving raw data in life sciences. A pilot with Proteo...
Data submissions and archiving raw data in life sciences. A pilot with Proteo...Data submissions and archiving raw data in life sciences. A pilot with Proteo...
Data submissions and archiving raw data in life sciences. A pilot with Proteo...
 
Systematic integration of millions of peptidoform evidences into Ensembl and ...
Systematic integration of millions of peptidoform evidences into Ensembl and ...Systematic integration of millions of peptidoform evidences into Ensembl and ...
Systematic integration of millions of peptidoform evidences into Ensembl and ...
 
PRIDE and ProteomeXchange: supporting the cultural change in proteomics publi...
PRIDE and ProteomeXchange: supporting the cultural change in proteomics publi...PRIDE and ProteomeXchange: supporting the cultural change in proteomics publi...
PRIDE and ProteomeXchange: supporting the cultural change in proteomics publi...
 
Normal/Tumor somatic mutations report tool
Normal/Tumor somatic mutations report toolNormal/Tumor somatic mutations report tool
Normal/Tumor somatic mutations report tool
 
Proteomics repositories integration using EUDAT resources
Proteomics repositories integration using EUDAT resourcesProteomics repositories integration using EUDAT resources
Proteomics repositories integration using EUDAT resources
 
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...
 
PRIDE-ProteomeXchange
PRIDE-ProteomeXchangePRIDE-ProteomeXchange
PRIDE-ProteomeXchange
 
What's mine is yours (and vice versa) Data sharing in vibrational spectroscopy
What's mine is yours (and vice versa) Data sharing in vibrational spectroscopyWhat's mine is yours (and vice versa) Data sharing in vibrational spectroscopy
What's mine is yours (and vice versa) Data sharing in vibrational spectroscopy
 
Tim Pugh-SPEDDEXES 2014
Tim Pugh-SPEDDEXES 2014Tim Pugh-SPEDDEXES 2014
Tim Pugh-SPEDDEXES 2014
 

Más de Juan Antonio Vizcaino

Más de Juan Antonio Vizcaino (20)

Reusing and integrating public proteomics data to improve our knowledge of th...
Reusing and integrating public proteomics data to improve our knowledge of th...Reusing and integrating public proteomics data to improve our knowledge of th...
Reusing and integrating public proteomics data to improve our knowledge of th...
 
Introduction to the PSI standard data formats
Introduction to the PSI standard data formatsIntroduction to the PSI standard data formats
Introduction to the PSI standard data formats
 
Reuse of public proteomics data
Reuse of public proteomics dataReuse of public proteomics data
Reuse of public proteomics data
 
PRIDE resources and ProteomeXchange
PRIDE resources and ProteomeXchangePRIDE resources and ProteomeXchange
PRIDE resources and ProteomeXchange
 
Proteomics repositories
Proteomics repositoriesProteomics repositories
Proteomics repositories
 
Introduction to the Proteomics Bioinformatics Course 2018
Introduction to the Proteomics Bioinformatics Course 2018Introduction to the Proteomics Bioinformatics Course 2018
Introduction to the Proteomics Bioinformatics Course 2018
 
ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Process...
ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Process...ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Process...
ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Process...
 
PSI-Proteome Informatics update
PSI-Proteome Informatics updatePSI-Proteome Informatics update
PSI-Proteome Informatics update
 
ProteomeXchange update
ProteomeXchange updateProteomeXchange update
ProteomeXchange update
 
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...
 
The ELIXIR Proteomics community
The ELIXIR Proteomics community The ELIXIR Proteomics community
The ELIXIR Proteomics community
 
The ELIXIR Proteomics Community
The ELIXIR Proteomics CommunityThe ELIXIR Proteomics Community
The ELIXIR Proteomics Community
 
A proteomics data “gold mine” at your disposal: Now that the data is there, w...
A proteomics data “gold mine” at your disposal: Now that the data is there, w...A proteomics data “gold mine” at your disposal: Now that the data is there, w...
A proteomics data “gold mine” at your disposal: Now that the data is there, w...
 
Public proteomics data: a (mostly unexploited) gold mine for computational re...
Public proteomics data: a (mostly unexploited) gold mine for computational re...Public proteomics data: a (mostly unexploited) gold mine for computational re...
Public proteomics data: a (mostly unexploited) gold mine for computational re...
 
How to run and maintain a popular biological data repository?
How to run and maintain a popular biological data repository?How to run and maintain a popular biological data repository?
How to run and maintain a popular biological data repository?
 
Reuse of public proteomics data
Reuse of public proteomics dataReuse of public proteomics data
Reuse of public proteomics data
 
Proteomics repositories
Proteomics repositoriesProteomics repositories
Proteomics repositories
 
Proteomics data standards
Proteomics data standardsProteomics data standards
Proteomics data standards
 
Introduction to the Proteomics Bioinformatics Course 2017
Introduction to the Proteomics Bioinformatics Course 2017Introduction to the Proteomics Bioinformatics Course 2017
Introduction to the Proteomics Bioinformatics Course 2017
 
Is it feasible to identify novel biomarkers by mining public proteomics data?
Is it feasible to identify novel biomarkers by mining public proteomics data?Is it feasible to identify novel biomarkers by mining public proteomics data?
Is it feasible to identify novel biomarkers by mining public proteomics data?
 

Último

Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
PirithiRaju
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
Lokesh Kothari
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
ssuser79fe74
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
AlMamun560346
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
gindu3009
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
RizalinePalanog2
 

Último (20)

Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Creating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening DesignsCreating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening Designs
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 

ProteomeXchange: data deposition and data retrieval made easy

  • 1. ProteomeXchange: data deposition and data retrieval made easy Juan Antonio VIZCAINO, Ph.D. PRIDE Group coordinator Proteomics Services Group European Bioinformatics Institute Hinxton, Cambridge United Kingdom juan@ebi.ac.uk
  • 2. Overview • The ProteomeXchange (PX) consortium • Highlights in the last year • PRIME-XS datasets
  • 3. ProteomeXchange Consortium • Goal: Development of a framework to allow standard data submission and dissemination pipelines between the main existing proteomics repositories. • Includes PeptideAtlas (ISB, Seattle), PRIDE (Cambridge, UK) and (very recently) MassIVE (UCSD, San Diego). • Common identifier space (PXD identifiers) • Two supported data workflows: MS/MS and SRM. • Main objective: Make life easier for researchers http://www.proteomexchange.org
  • 4. ProteomeXchange data workflow ProteomeCentral Results Raw Data* Metadata / Manuscript PRIDE (MS/MS data) Journals UniProt/ neXtProt Peptide Atlas Other DBs Receiving repositories PASSEL (SRM data) Other DBs GPMDB Researcher’s results Reprocessed results Raw data* Metadata MassIVE (MS/MS data) Vizcaíno et al., Nat Biotechnol, 2014
  • 5. MassIVE (UCSD) http://proteomics.ucsd.edu/service/massive/ • Just joined ProteomeXchange on June 2014 • Only partial submissions. A few datasets so far.
  • 6. Overview • The ProteomeXchange (PX) consortium • Highlights in the last year • PRIME-XS datasets
  • 7. PX Data workflow for MS/MS data 1. Mass spectrometer output files: raw data (binary files) or peak list spectra in a standardized format (mzML, mzXML). 2. Result files: a. Complete submissions: Result files can be converted to PRIDE XML or the mzIdentML data standard. b. Partial submissions: For workflows not yet supported by PRIDE, search engine output files will be stored and provided in their original form. 3. Metadata: Sufficiently detailed description of sample origin, workflow, instrumentation, submitter. 4. Other files: Optional files: a. QUANT: Quantification related results e. FASTA b. PEAK: Peak list files f. SP_LIBRARY c. GEL: Gel images d. OTHER: Any other file type Published Raw Files Other files
  • 8. Complete vs Partial submissions: processed results For complete submissions, it is possible to connect the spectra with the identification processed results and they can be visualized. PRIDE XML, mzIdentML supported mzTab to come Complete Partial
  • 9. Complete vs Partial submissions: experimental metadata Complete Partial General experimental metadata about the projects is similar. However, at the assay level information, in partial submissions is less annotated
  • 10. Complete submissions using mzIdentML Search Engine Results + MS files Search engines mzIdentML An increasing number of tools support export to mzIdentML 1.1 - Mascot - MSGF+ - Myrimatch and related tools from D. Tabb’s lab - OpenMS - PEAKS - ProCon (ProteomeDiscoverer, Sequest) - Scaffold - TPP via the idConvert tool (ProteoWizard) - ProteinPilot (planned by the end of 2014) - Others: library for X!Tandem conversion, lab internal pipelines, … - Referenced spectral files need to be submitted as well (all open formats are supported). Updated list: http://www.psidev.info/tools-implementing-mzIdentML#.
  • 11. Tools ‘RESULT’ file generation Final ‘RESULT’ file mzIdentML ‘RESULT’ Now: native file export Spectra files Mascot ProteinPilo t Scaffold PEAKS MSGF+ Others Native File export
  • 12. Original data files ‘RESULT’ file generation Final ‘RESULT’ file Search output files Spectra files PRIDE XML ‘RESULT’ Before: file conversion using PRIDE Converter File conversion PRIDE Converter
  • 13. PRIDE Inspector 2 Wang et al., Nat. Biotechnology, 2012 PRIDE Inspector 2.0 PRIDE Inspector 2.0 supports: - PRIDE XML - mzIdentML + all types of spectra files - mzML - mzTab (work in progress) http://code.google.com/p/pride-toolsuite/ wiki/PRIDEInspector
  • 14. PX submission tool: data submission Published Raw Other files http://www.proteomexchange.org/submission PX submission tool • Capture the mappings between the different types of files. • Add the mandatory metadata annotation. • Make the file upload process straightforward to the submitter (It transfers all the files using Aspera or FTP). • Command line alternative: some scripting is needed.
  • 15. Uploading large datasets: Aspera - Aspera is the default file transfer protocol to PRIDE: - PX Submission tool - Command line - Up to 50X faster than FTP File transfer speed should not be a problem!!
  • 16. Tutorial manuscript detailing the process Example dataset: PXD000764 - Title: “Discovery of new CSF biomarkers for meningitis in children” - 12 runs: 4 controls and 8 infected samples - Identification and quantification data http://www.proteomexchange.org/submission Ternent et al., Proteomics, 2014
  • 17. ProteomeXchange: 1329 datasets up until October 2014 Origin: 271 USA 166 Germany 115 United Kingdom 73 Switzerland 70 China 68 Netherlands 67 France 55 Canada 44 Spain 42 Belgium 33 Sweden 31 Australia 31 Denmark 31 Japan 20 India 20 Norway 19 Taiwan 17 Ireland 16 Austria 14 Finland 14 Italy 12 Republic of Korea 11 Brazil 9 Russia 8 Israel 7 Singapore … Type: 437 PRIDE complete 792 PRIDE partial 63 PeptideAtlas/PASSEL complete 14 MassIVE 23 reprocessed Publicly Accessible: 691 datasets, 52% of all 86% PRIDE 12% PASSEL 2% MassIVE Top Species studied by at least 10 datasets: 577 Homo sapiens 165 Mus musculus 56 Saccharomyces cerevisiae 53 Arabidopsis thaliana 29 Rattus norvegicus 22 Escherichia coli 17 Bos taurus 16 Mycobacterium tuberculosis 13 Oryza sativa 13 Drosophila melanogaster 13 Glycine max ~ 290 species in total Data volume: Total: ~55 TB Number of all files: ~131,000 PXD000320-324: ~ 5 TB PXD000065: ~ 1.4TB Datasets/year: 2012: 102 2013: 527 2014: 700
  • 18. Overview • The ProteomeXchange (PX) consortium • Highlights in the last year • PRIME-XS datasets
  • 19. PX submission tool: PRIME-XS tags 37 Datasets in total (both public and private at present): - 20 from the Netherlands - 4 from UK - 2 from Austria, Belgium, Denmark, Spain and Switzerland - 1 from France and USA.
  • 20. PRIME-XS are now tagged in PRIDE PRIME-XS datasets are now tagged and can be browsed as a group http://www.ebi.ac.uk/pride/archive/simpleSearch?q=prime-xs
  • 21. ProteomeCentral: Portal for all PX datasets http://proteomecentral.proteomexchange.org/cgi/GetDataset
  • 22. Which are the most accessed datasets? PXD Identifier Total Hits Dataset title Publication PXD000561 153512 A draft map of the human proteome Kim et al., Nature,2014. PMID: 24870542 PXD000851 111587 Membrane proteomic analysis of colorectal cancer tissue Kume et al., MCP, 2014. PMID:24687888 PXD000865 51639 Mass spectrometry based draft of the human proteome Wilhelm et al., 2014, Nature, PMID:24870543
  • 23. Total Numbers Which are the most accessed datasets?
  • 24. Find the desired PRIDE project … … inspect the project details …. Reshake PRIDE data in PeptideShaker … and start re-analyzing the data! http://peptide-shaker.googlecode.com Vaudel M, Burkhart J, Zahedi RP, Berven FS, Sickmann A, Martens L, Barsnes H. Nature Biotechnology (in press)
  • 25. A little bit of perspective Berlin 2011 Mallorca 2012 Annecy 2013 Split 2013
  • 26. A little bit of perspective 2011 2012 2013 2014 PRIDE Inspector PX Submission Tool mzIdentML mzQuantML PRIDE/PX datasets qcML mzTab PRIDE web (2011) PRIDE Converter PRIDE Converter 2 PRIDE Inspector 2 PRIDE web (2014)
  • 27. Conclusions • ProteomeXchange is widely used. – PRIDE contains most of the MS/MS datasets. – It has now a new consortium member: MassIVE (UCSD). – Around half of the datasets are already public. • Different open source tools available to facilitate the process: – File transfer speed should not be a problem (Aspera support)
  • 28. Aknowledgements: People Attila Csordas Tobias Ternent Noemi del Toro Rui Wang Florian Reisinger Jose A. Dianes Johannes Griss Steven Lewis Yasset Perez-Riverol Henning Hermjakob All previous team members ProteomeXchange partners
  • 29. Acknowledgements: Funding @pride_ebi pride-ebi@ebi.ac.uk pride-support@ebi.ac.uk http://www.proteomexchange.org http://code.google.com/p/pride-converter-2/