Event: Plant and Animal Genomes conference 2012
Speaker: Sandra Orchard
InterPro is an open-source protein resource used for the automatic annotation of proteins, and is scalable to the analysis of entire new genomes through the use of a downloadable version of InterProScan, which can be incorporated into an existing local pipeline. InterPro integrates protein signatures from 11 major signature databases (CATH-Gene3D, HAMAP, PANTHER, Pfam, PIRSF, PRINTS, ProDom, PROSITE, SMART, SUPERFAMILY, and TIGRFAMs) into a single resource, taking advantage of the different areas of specialization of each to produce a resource that provides protein classification on multiple levels: protein families, structural superfamilies and functionally close subfamilies, as well as functional domains, repeats and important sites. The InterPro website has been improved, following extensive community consultation and a new version of InterProScan promises improved speed, ease of implementation as well as additional functionalities.
2024: Domino Containers - The Next Step. News from the Domino Container commu...
InterPro and InterProScan 5.0
1. EBI is an Outstation of the European Molecular Biology Laboratory.
2. http://www.ebi.ac.uk/interpro
• is a database that groups predictive protein signatures together
• 11 member databases
• single searchable resource
• provides functional analysis of proteins by classifying them into
families and predicting domains and important sites
• Enables whole genome analysis
InterPro
4. http://www.ebi.ac.uk/interpro
Protein signatures
• More sensitive homology searches
• Each member database creates signatures using different methods and
methodologies:
manually-created sequence alignments
automatic processes with some human input and correction
entirely automatically.
5. http://www.ebi.ac.uk/interpro
Why do we need predictive
annotation tools?
0
2,000,000
4,000,000
6,000,000
8,000,000
10,000,000
12,000,000
14,000,000
5-Jan-04 5-Jan-05 5-Jan-06 5-Jan-07 5-Jan-08 5-Jan-09 5-Jan-10
Numberofsequences
Date
UniProtKB
UniProtKB/Swiss-Prot
6. http://www.ebi.ac.uk/interpro
What are protein signatures?
Multiple sequence alignment
Protein family/domain
Build model
Search
Mature
model
ITWKGPVCGLDGKTYRNECALL
AVPRSPVCGSDDVTYANECELK
UniProtit.
Significant
match
Protein analysis
7. http://www.ebi.ac.uk/interpro
Member databases
Hidden Markov Models Finger-
Prints
Profiles Patterns
Sequence
Clusters
Structural Domains
Functional annotation of families/domains
Prediction of
conserved
domains
Protein features
(active sites…)
METHODS
10. http://www.ebi.ac.uk/interpro
The InterPro entry: types
Proteins share a common evolutionary origin, as reflected in their
related functions, sequences or structure
Family
Distinct functional, structural or sequence units that may exist in a
variety of biological contextsDomain
Short sequences typically repeated within a proteinRepeats
PTM Active
Site
Binding
Site
Conserved
Site
Sites
11. http://www.ebi.ac.uk/interpro
InterPro Entry
Adds extensive annotation
Links to other databases
Structural information and viewers
Groups similar signatures together
Adds extensive annotation
Links to other databases
Quality control
Removes redundancy
12. http://www.ebi.ac.uk/interpro
InterPro Entry
Adds extensive annotation
Links to other databases
Structural information and viewers
Groups similar signatures together
Adds extensive annotation
Links to other databases
Hierarchical classification
13. http://www.ebi.ac.uk/interpro
Interpro hierarchies:
Families
FAMILIES can have parent/child relationships with other Families
Parent/Child relationships are based on:
• Comparison of protein hits
child should be a subset of parent
siblings should not have matches in common
• Existing hierarchies in member databases
• Biological knowledge of curators
17. http://www.ebi.ac.uk/interpro
InterPro Entry
Adds extensive annotation
Links to other databases
Structural information and viewers
Groups similar signatures together
Adds extensive annotation
Links to other databases
The Gene Ontology project provides a
controlled vocabulary of terms for
describing gene product characteristics
18. http://www.ebi.ac.uk/interpro
InterPro Entry
Adds extensive annotation
Links to other databases
Structural information and viewers
Groups similar signatures together
Adds extensive annotation
Links to other databases
UniProt
KEGG ... Reactome ... IntAct ...
UniProt taxonomy
PANDIT ... MEROPS ... Pfam clans ...
Pubmed
19. http://www.ebi.ac.uk/interpro
InterPro Entry
Adds extensive annotation
Links to other databases
Structural information and viewers
Groups similar signatures together
Adds extensive annotation
Links to other databases
PDB 3-D Structures
SCOP Structural
domains
CATH Structural
domain classification
23. http://www.ebi.ac.uk/interpro
InterProScan 5.0 aims
• Easy install and configuration
• Modular
• Expandable
• Easily integrated into existing pipelines
• Incorporate new data model / XML exchange format
• Easy to port on to different architectures:
• Desktop machine
• Simple LAN
• LSF
• PBS
• Sun Grid Engine ...cloud? GRID?
• Reliablity
26. http://www.ebi.ac.uk/interpro
“Worker”
Peforms task /
sub-task and
reports back to
Broker
“Worker”
Peforms task /
sub-task and
reports back to
Broker
“Worker”
Peforms task /
sub-task and
reports back to
Broker
“Worker”
Peforms task /
sub-task and
reports back to
Broker
“Worker”
Peforms task /
sub-task and
reports back to
Broker
Monitoring & Management
Application
Web or stand-alone app to monitor & manage InterProScan
Broker starts
workers on demand
Workers take tasks
off queues
• Simple and robust programming model
• Mature and stable standard – current JMS version released in 2002
• Guaranteed message delivery to a single worker
• Easy to monitor
• Flexible – easy to implement on multiple platforms
Java Messaging Service
“Master”
Schedules tasks &
sub-tasks, and places
on queue
Broker
Manages
queues & topics
“Worker”
Peforms task /
sub-task and
reports back to
Broker
“Worker”
Peforms task /
sub-task and
reports back to
Broker
“Worker”
Peforms task /
sub-task and
reports back to
Broker
“Worker”
Peforms task /
sub-task and
reports back to
Broker
“Worker”
Performs task /
sub-task, reports
back to Broker
31. http://www.ebi.ac.uk/interpro
• BerkeleyDB-backed REST web service
• Includes matches for all of UniParc (27 million
sequences)
• 250 million matches
• Fast response
• Integrated into i5.
0
50
100
150
200
250
300
350
400
0 10 20 30 40 50 60 70 80
Response Time (ms) per sequence
Pre-calculated match lookup
36. EBI is an Outstation of the European Molecular Biology Laboratory.
Come and see us at
booths 9 and 10!
• Job opportunities
• PhD and postdoc positions
• Training in person and online
• Services
• Industry programme
Notas del editor
Mention why this needs to be InterPro spefic,we have to cover a lot of different member database definitions.