2. When
WS-‐MR
is
suitable
• You’ve
got
good
data
(<4
A)
• You’ve
tried
MR
with
lots
of
good
candidates
• a
priori
knowledge
• sequence
similarity
(PSI-‐BLAST
search)
• Or
• protein
not
sequenced
• no
a
priori
knowledge
of
expected
fold
• You
haven’t
found
any
good
models
to
use
for
phasing
• Time
to
try
a
brute-‐force
search:
WS-‐MR
3. When
MR
is
not
suitable
• Complexes
containing
signiOicant
DNA
or
RNA
• at
least
right
now,
these
will
probably
not
work
• You
haven’t
tried
MR
and
just
want
a
“quick
Oix”
• Very
large
or
very
small
structures
• both
are
computationally
difOicult
• Low
resolution
(>
4.5
A)
• experience
so
far
suggests
these
aren’t
going
to
be
helped
much
4. Requirements
• ReOlection
data
in
MTZ
Oile
format
• Must
have
amplitude
columns
(e.g.
FP,
SIGFP)
• Doesn’t
work
with
intensities
(I,
SIGI)
• Time
• To
analyze
results
• To
take
next
steps
• Managed
expectations
• Identify
good
MR
candidates
about
1
in
4
cases
• We
don’t
produce
a
fully
phased
structure,
only
a
list
of
good
MR
candidates
and
their
best
placements
as
returned
by
Phaser
• Experience
with
Phaser
to
interpret
results
and
re-‐run
candidate
models
5. Background
• Utilizes
Phaser
for
MR
• Utilizes
Open
Science
Grid
for
computing
• References
• Stokes-‐Rees,
Sliz,
Protein
structure
determination
by
exhaustive
search
of
Protein
Data
Bank
derived
databases,
Proc.
Nat'l
Academy
of
Sciences
doi:10.1073/pnas.1012095107
• Stokes-‐Rees,
Sliz,
Compute
and
data
management
strategies
for
grid
deployment
of
high
throughput
protein
structure
studies,
IEEE
Workshop
on
Many
Task
Computing
on
Grids
and
Supercomputers
2010
(MTAGS10),
Seattle,
November
2010
• Phaser:
McCoy,
Grosse-‐Kunstleve,
Adams,
Winn,
Storoni,
Read;
J.
Appl.
Cryst.
(2007).
40,
658-‐674
• Murzin
A.
G.,
Brenner
S.
E.,
Hubbard
T.,
Chothia
C.
(1995).
SCOP:
a
structural
classi?ication
of
proteins
database
for
the
investigation
of
sequences
and
structures.
J.
Mol.
Biol.
247,
536-‐540.
• Requires
20-‐50,000
hours
of
computing
• Produces
300,000
Oiles
• Attempts
100,000
single-‐domain
MR
trials
using
all
SCOP
domains
6. Step
1:
Register
to
use
Portal
https://portal.nebiogrid.org/d/accounts/create
8. Side
Note:
MTZ
columns
• Use
CCP4
tool
“mtzdmp”
to
check
column
names
and
resolution
if
you’re
not
sure
column
$ mtzdmp GAS.mtz | less names resolution
...
* Column Labels :
H K L FP SIGFP FreeRflag
...
* Resolution Range :
0.00050 0.25197 ( 44.699 - 1.992 A )
...
9. Step
3a:
Review
active
task
list
on
portal
click
here
to
access
task
10. Step
3b:
Check
email
for
task
details
and
link
click
here
to
access
task
13. Step
5b:
Check
status
Click
here
Remember:
Someone
from
SBGrid
will
R
=
Running
manually
review
your
job
and
release
it.
Until
that
happens
your
job
won’t
even
be
in
I
=
Idle
the
queue.
Even
after
that,
it
could
be
in
the
H
=
Held
queue
for
several
days
before
it
starts
running.
Do
email
us
if
you
have
questions
or
if
it
seems
stuck
or
not
running.
14. Step
5c:
Check
status
summary
of
active
jobs
outcomes
to
date
15. Step
6a:
Review
scatter
graphs
Look
for
a
cluster
of
high
TFZ
and
high
LLG
results
distinct
from
the
rest
NOTE:
This
graph
is
a
static
image
16. Step
6b:
Cases
with
no
strong
MR
candidates*
*
Remember
this
is
usually
the
case,
unfortunately
17. Step
6c:
Review
scatter
graphs
Click
this
button
to
load
data
and
enable
clickable
image
NOTE:
This
graph
is
a
dynamic
clickable
image.
Only
the
Oirst
5000
results
by
LLG
are
currently
available
because
of
memory
constraints
18. Step
6d:
Review
scatter
graphs
Click
data
point
to
view
details
Click
large
cartoon
image
to
add
to
PDB
image
basket details
19. Step
7:
Review
tabular
data
live
results
(space
delimited)
sorted
results
(tab
delimited),
generated
by
”check
status”
20. Step
8:
Wait
for
job
to
Oinish
No
running
jobs
(all
done)
NOTE:
This
job
is
not
results
aprox.
100,000
yet
Oinished! errors
<
5,000
21. Step
9:
Download
Oinalized
augmented
results
augmented
contains
static
SCOP
domain
class
and
name
(25
MB)
Oinal
contains
a
sorted,
cleaned
set
of
results
(5
MB)
22. Step
10:
Review
and
download
speciOic
SCOP
PDB
• Use
the
tabular
results
to
identify
speciOic
SCOP
codes
that
look
promising
• PDBs
can
be
fetched
using
one
of
these
resources:
http://portal.nebiogrid.org/biodb/scop/v1.75/clean/code2/
http://abitibi.sbgrid.org/cgi/pdbview.py
http://abitibi.sbgrid.org/cgi/tmalign.py
23.
24.
25. Step
11:
Recreate
Phaser
output
This
is
the
command
input
to
Phaser
ROOT 2vlj-test
MODE MR_AUTO
HKLIn ../2vlj.mtz
LABIn F=FP SIGF=SIGFP
ENSEmble 200la_ PDB 00/200la_.pdb IDENtity 0.3
COMPosition SOLVENT 50.0
RESOlution 2.4
SEARch ENSEmble 200la_ NUM 1
Click
on
“test”
directory
(bottom
of
job
page)
26. Step
12:
Over
to
you
• You
now
need
to
reOine
your
structure
• WS-‐MR
only
gets
you
as
far
attempting
to
identify
promising
MR
candidates
if
you
haven’t
had
success
with
conventional
model
identiOication
methods
• Some
further
MR
options
that
exist:
• Second
domain
search
with
Oirst
domain
Oixed
• homo-‐dimer/homo-‐trimer
searches
• Custom
PDB
search
library
-‐
you
give
us
the
PDBs,
we
can
run
WS-‐MR
over
the
set
27. Conclusion
and
Thanks
• We
welcome
ideas
for
improvements
• Special
processing
requirements?
• We
may
be
able
to
do
this
from
the
command
line
interface
• Please
contact
us
if
you
have
any
questions
• hpc@sbgrid.org
• Open
Science
Grid
is
a
big
enabler
here!
• http://opensciencegrid.org
• Thanks
to
SBGrid
team:
• http://www.sbgrid.org
• Thanks
to
the
Sliz
Lab
at
Harvard
Medical
School:
• http://hkl.hms.harvard.edu