2. PSI meeting 2017
Ghent, 18 April 2016
Overview
• Introduction and status
• Submission and citation statistics
• New prospective members: jPOST and iPROX
• OmicsDI interface
3. PSI meeting 2017
Ghent, 18 April 2016
ProteomeXchange Consortium
• Goal: Development of a framework to allow standard
data submission and dissemination pipelines
between the main existing proteomics repositories.
• Includes PeptideAtlas (ISB, Seattle), PRIDE
(Cambridge, UK) and (very recently) MassIVE (UCSD,
San Diego).
• Common identifier space (PXD identifiers)
• Two supported data workflows: MS/MS and SRM.
• Main objective: Make life easier for researchers
http://www.proteomexchange.org
4. PSI meeting 2017
Ghent, 18 April 2016
ProteomeCentral
Metadata /
Manuscript
Raw Data*
Results
Journals
UniProt/
neXtProt
Peptide Atlas
Other DBs
Receiving repositories
PASSEL
(SRM data)
PRIDE
(MS/MS data)
Other DBs
GPMDB
Researcher’s results
Reprocessed results
Raw data*
Metadata
MassIVE
(MS/MS data)
Vizcaíno et al., Nat Biotechnol, 2014
ProteomeXchange data workflow
5. PSI meeting 2017
Ghent, 18 April 2016
Complete
Partial
Complete vs Partial submissions: processed results
For complete submissions, it is possible to connect the spectra with the identification
processed results and they can be visualized.
6. PSI meeting 2017
Ghent, 18 April 2016
Complete vs Partial submissions: experimental metadata
Complete Partial
General experimental metadata about the projects is similar.
However, at the assay level information in partial submissions is not so detailed
7. PSI meeting 2017
Ghent, 18 April 2016
Complete submissions
Search
Engine
Results + MS
files
Search
engines
mzIdentML
- Mascot
- MSGF+
- MyriMatch and related tools from D. Tabb’s lab
- OpenMS
- PEAKS
- PeptideShaker
- ProCon (ProteomeDiscoverer, Sequest)
- Scaffold
- TPP via the idConvert tool (ProteoWizard)
- ProteinPilot (from version 5.0)
- X!Tandem native conversion (Beta, PILEDRIVER)
- Others: library for X!Tandem conversion, lab
internal pipelines, …
- Crux
An increasing number of tools support export to mzIdentML 1.1
- Referenced spectral files need to be submitted as well
(all open formats are supported).
Updated list: http://www.psidev.info/tools-implementing-mzIdentML#.
8. PSI meeting 2017
Ghent, 18 April 2016
Status of ProteomeXchange
• No changes in the Consortium during 2015.
• Grant ‘ProteomeXchange 2’ refined and submitted again to
the joint NSF/BBSRC call but it was not successful.
• Prospective members:
• JPOST (Japan). Dedicated funding for 3 years.
• iProx (China).
• BBSRC Partnering grants with China and Japan obtained to help with
the process.
• No further contacts with other proteomics resources.
9. PSI meeting 2017
Ghent, 18 April 2016
Overview
• Introduction and status
• Submission and citation statistics
• New prospective members: jPOST and iPROX
• OmicsDI interface
10. PSI meeting 2017
Ghent, 18 April 2016
Origin:
885 USA
465 Germany
342 United Kingdom
264 China
194 France
158 Netherland
136 Canada
126 Switzerland
107 Denmark
104 Spain
99 Australia
95 Japan
72 Belgium
68 Austria
63 Sweden
61 India
51 Norway
43 Taiwan
30 Italy
29 Brazil
28 Singapore
28 Finland
27 Ireland
27 Russia
26 Israel …
ProteomeXchange: 3,802 datasets up until 1st April, 2016
Type:
2429 PRIDE partial
1016 PRIDE complete
250 MassIVE
84 PeptideAtlas/PASSEL complete
23 Reprocessed
Publicly Accessible:
1973 datasets, 52% of all
91% PRIDE
5% MassIVE
4% PASSEL
Data volume:
Total: ~220 TB
Number of all files: ~560,000
Datasets/year:
2012: 102
2013: 527
2014: 963
2015: 1758
2016: 452
Top Species studied by at least 20 datasets:
1526 Homo sapiens
485 Mus musculus
150 Saccharomyces cerevisiae
121 Arabidopsis thaliana
102 Rattus norvegicus
86 Escherichia coli
44 Bos taurus
35 Drosophila melanogaster
32 Glycine max
~ 700 species in total
11. PSI meeting 2017
Ghent, 18 April 2016
PRIDE Archive submitted datasets up until 1st April, 2016
• In the last year: ~150 submitted datasets per month
• Size: ~ 210TB
12. PSI meeting 2017
Ghent, 18 April 2016
PRIDE Archive: Size comparison with other EBI resources (May 2015)
1.E+07
1.E+08
1.E+09
1.E+10
1.E+11
1.E+12
1.E+13
1.E+14
1.E+15
1.E+16
1.E+17
2004 2006 2008 2010 2012 2014 2016
bytes
date
Data accumulation by resource
Metabolites
PRIDE
EGA
ENA (less AE)
AE
Chart generated by Guy Cochrane
13. PSI meeting 2017
Ghent, 18 April 2016
Data reuse is increasing
Data download volume in 2015: ~ 200 TB
14. PSI meeting 2017
Ghent, 18 April 2016
Which are the most accessed datasets? (total
number of hits)
15. PSI meeting 2017
Ghent, 18 April 2016
Citations statistics
Top cited paper (citations/year) in proteomics in NBT
16. PSI meeting 2017
Ghent, 18 April 2016
Overview
• Introduction and status
• Submission and citation statistics
• New prospective members: jPOST and iPROX
• OmicsDI interface
18. PSI meeting 2017
Ghent, 18 April 2016
jPOST Project (April 2015 – March 2018)
The jPOST project is supported by National Bioscience
Database Center, Japan Science and Technology Agency
(NBDC-JST).
Set the main servers (Dec, 2015)
Use the PSI terminology for data registration
Preparation of demo-site for jPOST repository
(until this meeting)
jPOST Repository Ready for PX partnership
jPOST Repository Start (May 2, 2016)
jPOST Database Start (2017)
(www.jpost.org)
20. PSI meeting 2017
Ghent, 18 April 2016
iProX: integrated proteome resources in China
At present, iProX contains:
• 225 projects
• 834 subprojects
• 15398 data files
• Most of data comes from
the CNHPP
http://www.iprox.org Slides from Y. Zhu
21. PSI meeting 2017
Ghent, 18 April 2016
Providing stable
service to users
iProX
submission
system
iProX
proteome
database
Dataset import and
management
User information
MS/MS data
processing
pipeline
iProX
Experiment raw
files and
metedata
Information of
dataset and
idenficaitons
iProX diagram
22. PSI meeting 2017
Ghent, 18 April 2016
Updates
• Two full time curators
• Chunyuan Yang, Ph.D. in medical genetics
• Xue Wang, M.Sc. in bioinformatics
• Aspera license upgraded from 100M bps to 500M bps
• High availability: hot standby
• Will be deployed in cloud platform in May, 2016
• Move to Network Information Center, Chinese Academy of
Sciences
• Internet connection for service will exceed 1 G bps
• Remote backup in Shanghai, China
23. PSI meeting 2017
Ghent, 18 April 2016
Overview
• Introduction and status
• Submission and citation statistics
• New prospective members: jPOST and iPROX
• OmicsDI interface
24. PSI meeting 2017
Ghent, 18 April 2016
ProteomeCentral: Portal for all PX datasets
http://proteomecentral.proteomexchange.org/cgi/GetDataset
25. PSI meeting 2017
Ghent, 18 April 2016
OmicsDI: Portal for omics datasets
http://www.ebi.ac.uk/Tools/omicsdi/
• Aims to integrate of ‘omics’ datasets (genomics, proteomics and
metabolomics at present). Not only EBI resources are included.
PRIDE Archive
MassIVE
PASSEL
GPMDB
MetaboLights
Metabolomics Workbench
GNPS
EGA
26. PSI meeting 2017
Ghent, 18 April 2016
Aknowledgements: People
PRIDE team
Attila Csordas
Tobias Ternent
Noemi del Toro
Gerhard Mayer (Bochum, de.NBI)
Johannes Griss
Yasset Perez-Riverol
Henning Hermjakob
Former team members: Rui Wang,
Florian Reisinger and Jose A. Dianes
Acknowledgements:
PX partners
Eric Deutsch
Nuno Bandeira
Yasushi Ishihama (jPOST team)
Yunping Zhu (iPROX team)