1. PSLID, the Protein Subcellular Loca4on
Image Database:
Subcellular loca4on assignments,
annotated image collec4ons, image
analysis tools, and genera4ve models of
protein distribu4ons
Estelle Glory, Jus.n Newberg, Tao Peng, Ivan Cao‐Berg, and
Robert F. Murphy
Departments of Biological Sciences, Biomedical Engineering and
Machine Learning and
1
2. Contributors
• Michael Boland
• Mia Markey • David Casasent
• Gregory Porreca • Simon Watkins
• Meel Velliste • Jon Jarvik, Peter Berget
• Kai Huang • Jack Rohrer
• Xiang Chen • Tom Mitchell
• Yanhua Hu • Christos Faloutsos
• Juchang Hua • Jelena Kovacevic
• Ting Zhao • Geoff Gordon
• Shann‐Ching Chen • B. S. Manjunath, Ambuj Singh
• Elvira Osuna Highley • Les Loew, Ion Moraru, Jim Schaff
• Jus4n Newberg • Gustavo Rohde
• Estelle Glory • Ghislain Bonamy, Sumit Chanda,
• Tao Peng Dan Rines
• Luis Coelho
• Ivan Cao‐Berg
6. Feature levels and granularity
Single Single Single
Object Cell Field
Object Cell Field
features features features
Aggregate/average operator
Granularity: 2D, 3D, 2Dt, 3Dt
7. ER gian.n gpp130
2D
LAMP Mito Nucleolin
Images of
HeLa
cells
Ac.n TfR Tubulin DNA
100
Subcellular PaVern
90
Human Accuracy
80
Classifica.on: 70
Computer vs. Human 60
50
40
Even beVer results using mul.resolu.on methods 40 50 60 70 80 90 100
Computer Accuracy
Even beVer results for 3D images
9. Decomposing
mixture paVerns
• Proteins can be in more than one structure
• Clustering or classifying whole cell paVerns will
consider each combina.on of two or more
“basic” paVerns as a unique new paVern
• Desirable to have a way to decompose mixtures
instead
• Our approach: assume that each basic paVern
has a recognizable combina.on of different
types of objects
12. 0.5
0.4
Amt fluor.
0.3
Pure Lysosomal Pattern
0.2
0.1
0
Golgi class
1
2 Lysosomal class
3
4
5 Nuclear class
6
7
Object type 8
0.5
0.4
Pure Golgi PaRern
0.3
Amt fluor.
0.2
0.1
0
Golgi class
1
2 Lysosomal class
3
4
5 Nuclear class
6
7
Object type 8
0.25
0.2
0.15
Amt fluor.
0.1
0.05
All
0 Golgi class
1 2 Lysosomal class
3 4 5 Nuclear class
6 7
Object type 8
13. Test samples
• How do we test a subcellular paVern unmixing
algorithm?
• Need images of known mixtures of pure
paVerns – difficult to obtain “naturally”
• Created test set by mixing different
propor.ons of two probes that localize to
different cell parts (lysosomes and
mitochondria)
14. Tao Peng, Ghislain Bonamy, Estelle
Glory, Sumit Chanda, Dan Rines
(Genome Research Institute of
Novartis Foundation)
• Lysotracker
26. Models for protein‐containing objects
• Mixture of Gaussian
objects
• Learn distribu.ons for
number of objects and
r: normalized distance, a: angle to major axis object size
• Learn probability
density func.on for
objects rela.ve to
nucleus and cell
27. Synthesized Images
Lysosomes Endosomes
Have XML design for capturing model parameters
SLML toolbox ‐ Ivan Cao‐Berg, Tao Peng, Ting Zhao
Have portable tool for genera.ng images from model 27
28. Model Distribu.on
• Genera.ve models provide beVer way of
distribu.ng what is known about
“subcellular loca.on families” (or other
imaging results, such as illustra.ng change
due to drug addi.on)
• Have XML design for capturing the models
for distribu.on
• Have portable tool for genera.ng
images from the model
29. Combining Models for Cell Simula.ons
Protein 1
Cell Shape
Nuclear Model
Protein 2
Cell Shape Simulation
Nuclear Model
Protein 3
Cell Shape
Shared Nuclear Model
Nuclear
and Cell XML
Shape
32. PSLID
• Loading pipeline driven by script
– Calculates thumbnail images, features, segmenta.on
– Creates database records and links
– Creates predefined sets
• Web applica.on
– Create sets by searching on context or content
– Analyze sets with any SLIC tool
– Full display or summary
– SOAP/XML interface
34. Annotated Datasets
• 2D and 3D images of 9 major subcellular
paVerns in HeLa cells
• 3D images of ~300 proteins in 3T3 cells
• 2D images of ~3000 proteins in 3T3 cells
• 2D and 3D images for paVern unmixing
• Datasets from other inves.gators