Enabling automated processing and analysis of large-scale proteomics data
1. www.elixir-europe.org
ELIXIR All Hands 2017, 21-23 March, Rome, Italy
Enabling automated processing and
analysis of large-scale proteomics data
Juan Antonio Vizcaíno, EMBL-EBI
Hinxton, Cambridge
2. ELIXIR All Hands 2017, 21-23 March, Rome, Italy
One slide intro to MS based proteomics
Hein et al., Handbook of Systems Biology, 2012
3. ELIXIR All Hands 2017, 21-23 March, Rome, Italy
Kickoff meeting for Proteomics activities in ELIXIR
• It took place on March 1-2 2017, at Tuebingen, organised by ELIXIR-Germany.
• ~25 people attending, representing 11 ELIXIR Nodes.
• Outcome: White paper outlining the possible future activities related to
proteomics in the context of ELIXIR. This paper will be submitted to F1000
Research.
• To be available by end of May/ beginning of June.
4. ELIXIR All Hands 2017, 21-23 March, Rome, Italy
ELIXIR Implementation Project
• 1-year project just started. Led by EMBL-EBI (Vizcaíno) and ELIXIR-Germany
(Kohlbacher, Eisenacher).
• Aim: Development of reproducible data analysis pipelines for shot-gun
proteomics approaches using the OpenMS framework.
• Deployment of the pipelines in the EMBL-EBI “Embassy cloud” as proof of
concept:
• Facilitate future deployment in other cloud environments.
• Direct connection with public datasets in the PRIDE database.
5. ELIXIR All Hands 2017, 21-23 March, Rome, Italy
• PRIDE is the word-leading mass spectrometry (MS)-
based proteomics data repository.
• It stores:
• Peptide and protein expression data (identification
and quantification)
• Post-translational modifications
• Mass spectra (raw data and peak lists)
• Technical and biological metadata
• Any other related information
• Any data workflow is now supported.
• Leading the global ProteomeXchange Consortium.
PRIDE (PRoteomics IDEntifications) Archive
http://www.ebi.ac.uk/pride/archive
Martens et al., Proteomics, 2005
Vizcaíno et al., NAR, 2016
6. ELIXIR All Hands 2017, 21-23 March, Rome, Italy
Why this project is timely?
Martens & Vizcaíno, Trends Bioch Sci, 2017 Data download from PRIDE in 2016: 243 TB
0
50
100
150
200
250
300
2013 2014 2015 2016
Downloads in TBs
• Open, reproducible, traceable and scalable analysis pipelines are
needed, as the size of proteomics datasets keeps growing.
• Reuse of public proteomics data is flourishing.
7. ELIXIR All Hands 2017, 21-23 March, Rome, Italy
Aknowledgements: People
Mathias Walzer
Yasset Perez-Riverol
EMBL-EBI cloud team (led by Steven Newhouse)
Oliver Kohlbacher (Tuebingen University)
Martin Eisenacher (Bochum University)
Everyone who attended the workshop in Tuebingen
(March 1-2)
Do you want to get involved?
Acknowledgements
8. ELIXIR All Hands 2017, 21-23 March, Rome, Italy
www.hupo2017.ie
Abstract Deadline: 5th April
Early Registration Deadline: 14th June
Dublin 17-21st September