SlideShare una empresa de Scribd logo
1 de 27
Descargar para leer sin conexión
Wide-­‐Search	
  
 Molecular	
  
Replacement
      Ian	
  Stokes-­‐Rees
 http://portal.nebiogrid.org/
When	
  WS-­‐MR	
  is	
  suitable

• You’ve	
  got	
  good	
  data	
  (<4	
  A)
• You’ve	
  tried	
  MR	
  with	
  lots	
  of	
  good	
  candidates
     •    a	
  priori	
  knowledge
     •    sequence	
  similarity	
  (PSI-­‐BLAST	
  search)
• Or
     •    protein	
  not	
  sequenced
     •    no	
  a	
  priori	
  knowledge	
  of	
  expected	
  fold
• You	
  haven’t	
  found	
  any	
  good	
  models	
  to	
  use	
  for	
  
  phasing
• Time	
  to	
  try	
  a	
  brute-­‐force	
  search:	
  WS-­‐MR
When	
  MR	
  is	
  not	
  suitable


• Complexes	
  containing	
  signiOicant	
  DNA	
  or	
  RNA
   •    at	
  least	
  right	
  now,	
  these	
  will	
  probably	
  not	
  work
• You	
  haven’t	
  tried	
  MR	
  and	
  just	
  want	
  a	
  “quick	
  Oix”
• Very	
  large	
  or	
  very	
  small	
  structures
   •    both	
  are	
  computationally	
  difOicult
• Low	
  resolution	
  (>	
  4.5	
  A)
   •    experience	
  so	
  far	
  suggests	
  these	
  aren’t	
  going	
  to	
  be	
  helped	
  much
Requirements
• ReOlection	
  data	
  in	
  MTZ	
  Oile	
  format
   •   Must	
  have	
  amplitude	
  columns	
  (e.g.	
  FP,	
  SIGFP)

   •   Doesn’t	
  work	
  with	
  intensities	
  (I,	
  SIGI)

• Time
   •   To	
  analyze	
  results
   •   To	
  take	
  next	
  steps

• Managed	
  expectations
   •   Identify	
  good	
  MR	
  candidates	
  about	
  1	
  in	
  4	
  cases
   •   We	
  don’t	
  produce	
  a	
  fully	
  phased	
  structure,	
  only	
  a	
  list	
  of	
  good	
  MR	
  
       candidates	
  and	
  their	
  best	
  placements	
  as	
  returned	
  by	
  Phaser

• Experience	
  with	
  Phaser	
  to	
  interpret	
  results	
  and	
  
  re-­‐run	
  candidate	
  models
Background
• Utilizes	
  Phaser	
  for	
  MR
• Utilizes	
  Open	
  Science	
  Grid	
  for	
  computing
• References
   •   Stokes-­‐Rees,	
  Sliz,	
  Protein	
  structure	
  determination	
  by	
  exhaustive	
  search	
  of	
  Protein	
  Data	
  
       Bank	
  derived	
  databases,	
  Proc.	
  Nat'l	
  Academy	
  of	
  Sciences	
  doi:10.1073/pnas.1012095107
   •   Stokes-­‐Rees,	
  Sliz,	
  Compute	
  and	
  data	
  management	
  strategies	
  for	
  grid	
  deployment	
  of	
  high	
  
       throughput	
  protein	
  structure	
  studies,	
  IEEE	
  Workshop	
  on	
  Many	
  Task	
  Computing	
  on	
  Grids	
  
       and	
  Supercomputers	
  2010	
  (MTAGS10),	
  Seattle,	
  November	
  2010
   •   Phaser:	
  McCoy,	
  Grosse-­‐Kunstleve,	
  Adams,	
  Winn,	
  Storoni,	
  Read;	
  J.	
  Appl.	
  Cryst.	
  (2007).	
  40,	
  
       658-­‐674
   •   Murzin	
  A.	
  G.,	
  Brenner	
  S.	
  E.,	
  Hubbard	
  T.,	
  Chothia	
  C.	
  (1995).	
  SCOP:	
  a	
  structural	
  
       classi?ication	
  of	
  proteins	
  database	
  for	
  the	
  investigation	
  of	
  sequences	
  and	
  structures.	
  J.	
  Mol.	
  
       Biol.	
  247,	
  536-­‐540.
• Requires	
  20-­‐50,000	
  hours	
  of	
  computing
• Produces	
  300,000	
  Oiles
• Attempts	
  100,000	
  single-­‐domain	
  MR	
  trials	
  using	
  all	
  SCOP	
  
  domains
Step	
  1:	
  Register	
  to	
  use	
  Portal
  https://portal.nebiogrid.org/d/accounts/create
Step	
  2:	
  Submit	
  Computational	
  Task
     https://portal.nebiogrid.org/d/apps/wsmr/create
Side	
  Note:	
  MTZ	
  columns

• Use	
  CCP4	
  tool	
  “mtzdmp”	
  to	
  check	
  column	
  names	
  
  and	
  resolution	
  if	
  you’re	
  not	
  sure

                                      column
 $ mtzdmp GAS.mtz | less              names                resolution
 ...
  * Column Labels :
  H K L FP SIGFP FreeRflag
 ...
  * Resolution Range :
     0.00050    0.25197          (      44.699 -         1.992 A )
 ...
Step	
  3a:	
  Review	
  active	
  task	
  
          list	
  on	
  portal




                          click	
  here	
  to	
  
                          access	
  task
Step	
  3b:	
  Check	
  email	
  for	
  task	
  
         details	
  and	
  link




                          click	
  here	
  to	
  
                          access	
  task
Step	
  4:	
  Log	
  into	
  job	
  page
Step	
  5a:	
  Review	
  web	
  page
Step	
  5b:	
  Check	
  status
Click	
  
here




   Remember:	
  Someone	
  from	
  SBGrid	
  will	
                                    R	
  =	
  Running
   manually	
  review	
  your	
  job	
  and	
  release	
  it.	
  	
  
   Until	
  that	
  happens	
  your	
  job	
  won’t	
  even	
  be	
  in	
  
                                                                                       I	
  =	
  Idle
   the	
  queue.	
  	
  Even	
  after	
  that,	
  it	
  could	
  be	
  in	
  the	
     H	
  =	
  Held
   queue	
  for	
  several	
  days	
  before	
  it	
  starts	
  
   running.	
  	
  Do	
  email	
  us	
  if	
  you	
  have	
  questions	
  
   or	
  if	
  it	
  seems	
  stuck	
  or	
  not	
  running.
Step	
  5c:	
  Check	
  status
                     summary	
  of	
  
                     active	
  jobs




                outcomes	
  to	
  date
Step	
  6a:	
  Review	
  scatter	
  graphs




                             Look	
  for	
  a	
  cluster	
  of	
  high	
  
                             TFZ	
  and	
  high	
  LLG	
  results	
  
                             distinct	
  from	
  the	
  rest	
  



              NOTE:	
  This	
  graph	
  is	
  a	
  static	
  image
Step	
  6b:	
  Cases	
  with	
  no	
  strong	
  
         MR	
  candidates*




    *	
  Remember	
  this	
  is	
  usually	
  the	
  case,	
  unfortunately
Step	
  6c:	
  Review	
  scatter	
  graphs


Click	
  this	
  button	
  
to	
  load	
  data	
  and	
  
enable	
  clickable	
  
image

                     NOTE:	
  This	
  graph	
  is	
  a	
  dynamic	
  clickable	
  image.	
  	
  
                     Only	
  the	
  Oirst	
  5000	
  results	
  by	
  LLG	
  are	
  currently	
  
                     available	
  because	
  of	
  memory	
  constraints
Step	
  6d:	
  Review	
  scatter	
  graphs

Click	
  data	
  point	
  
to	
  view	
  details




Click	
  large	
  cartoon	
  
image	
  to	
  add	
  to	
      PDB	
  
image	
  basket                 details
Step	
  7:	
  Review	
  tabular	
  data




                 live	
  results	
  (space	
  delimited)


                 sorted	
  results	
  (tab	
  delimited),	
  
                 generated	
  by	
  ”check	
  status”
Step	
  8:	
  Wait	
  for	
  job	
  to	
  Oinish
                                        No	
  running	
  jobs	
  (all	
  done)




NOTE:	
  This	
  job	
  is	
  not	
           results	
  aprox.	
  100,000
yet	
  Oinished!                              errors	
  <	
  5,000
Step	
  9:	
  Download	
  Oinalized	
  
    augmented	
  results
               augmented	
  contains	
  static	
  SCOP	
  
               domain	
  class	
  and	
  name	
  (25	
  MB)




                  Oinal	
  contains	
  a	
  sorted,	
  cleaned	
  
                  set	
  of	
  results	
  (5	
  MB)
Step	
  10:	
  Review	
  and	
  
        download	
  speciOic	
  SCOP	
  PDB	
  
• Use	
  the	
  tabular	
  results	
  to	
  identify	
  speciOic	
  SCOP	
  codes	
  
    that	
  look	
  promising
•   PDBs	
  can	
  be	
  fetched	
  using	
  one	
  of	
  these	
  resources:
    http://portal.nebiogrid.org/biodb/scop/v1.75/clean/code2/
    http://abitibi.sbgrid.org/cgi/pdbview.py
    http://abitibi.sbgrid.org/cgi/tmalign.py
Step	
  11:	
  Recreate	
  Phaser	
  output



                                              This	
  is	
  the	
  command	
  
                                              input	
  to	
  Phaser
                                         ROOT          2vlj-test
                                         MODE          MR_AUTO
                                         HKLIn         ../2vlj.mtz
                                         LABIn         F=FP SIGF=SIGFP
                                         ENSEmble      200la_ PDB 00/200la_.pdb IDENtity 0.3
                                         COMPosition   SOLVENT 50.0
                                         RESOlution    2.4
                                         SEARch        ENSEmble 200la_ NUM 1

         Click	
  on	
  “test”	
  
         directory
         (bottom	
  of	
  job	
  page)
Step	
  12:	
  Over	
  to	
  you

• You	
  now	
  need	
  to	
  reOine	
  your	
  structure
• WS-­‐MR	
  only	
  gets	
  you	
  as	
  far	
  attempting	
  to	
  
  identify	
  promising	
  MR	
  candidates	
  if	
  you	
  haven’t	
  
  had	
  success	
  with	
  conventional	
  model	
  
  identiOication	
  methods
• Some	
  further	
  MR	
  options	
  that	
  exist:
   •   Second	
  domain	
  search	
  with	
  Oirst	
  domain	
  Oixed
   •   homo-­‐dimer/homo-­‐trimer	
  searches
   •   Custom	
  PDB	
  search	
  library	
  -­‐	
  you	
  give	
  us	
  the	
  PDBs,	
  we	
  can	
  run	
  WS-­‐MR	
  
       over	
  the	
  set
Conclusion	
  and	
  Thanks
• We	
  welcome	
  ideas	
  for	
  improvements
• Special	
  processing	
  requirements?
   •   We	
  may	
  be	
  able	
  to	
  do	
  this	
  from	
  the	
  command	
  line	
  interface
• Please	
  contact	
  us	
  if	
  you	
  have	
  any	
  questions
   •   hpc@sbgrid.org


• Open	
  Science	
  Grid	
  is	
  a	
  big	
  enabler	
  here!
   •   http://opensciencegrid.org
• Thanks	
  to	
  SBGrid	
  team:
   •   http://www.sbgrid.org
• Thanks	
  to	
  the	
  Sliz	
  Lab	
  at	
  Harvard	
  Medical	
  School:
   •   http://hkl.hms.harvard.edu

Más contenido relacionado

Similar a Wide Search Molecular Replacement and the NEBioGrid portal interface

Client Best Practices
Client Best PracticesClient Best Practices
Client Best PracticesYuval Dagai
 
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016MLconf
 
MLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott ClarkMLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott ClarkSigOpt
 
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACPerformance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACKristofferson A
 
Improving The Quality of Existing Software
Improving The Quality of Existing SoftwareImproving The Quality of Existing Software
Improving The Quality of Existing SoftwareSteven Smith
 
What to expect from Java 9
What to expect from Java 9What to expect from Java 9
What to expect from Java 9Ivan Krylov
 
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...Databricks
 
7 Database Mistakes YOU Are Making -- Linuxfest Northwest 2019
7 Database Mistakes YOU Are Making -- Linuxfest Northwest 20197 Database Mistakes YOU Are Making -- Linuxfest Northwest 2019
7 Database Mistakes YOU Are Making -- Linuxfest Northwest 2019Dave Stokes
 
Machine Learning with ML.NET and Azure - Andy Cross
Machine Learning with ML.NET and Azure - Andy CrossMachine Learning with ML.NET and Azure - Andy Cross
Machine Learning with ML.NET and Azure - Andy CrossAndrew Flatters
 
Pse2010 rel storage
Pse2010 rel storagePse2010 rel storage
Pse2010 rel storageLars Noldan
 
Performance Test Driven Development with Oracle Coherence
Performance Test Driven Development with Oracle CoherencePerformance Test Driven Development with Oracle Coherence
Performance Test Driven Development with Oracle Coherencearagozin
 
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The SequelSilicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The SequelDaniel Coupal
 
日本一細かいJavaOne2011報告
日本一細かいJavaOne2011報告日本一細かいJavaOne2011報告
日本一細かいJavaOne2011報告心 谷本
 
Intro to machine learning with scikit learn
Intro to machine learning with scikit learnIntro to machine learning with scikit learn
Intro to machine learning with scikit learnYoss Cohen
 
Alternate for scheduled apex using flow builder
Alternate for scheduled apex using flow builderAlternate for scheduled apex using flow builder
Alternate for scheduled apex using flow builderKadharBashaJ
 
Information from pixels
Information from pixelsInformation from pixels
Information from pixelsDave Snowdon
 
Managing Apache Spark Workload and Automatic Optimizing
Managing Apache Spark Workload and Automatic OptimizingManaging Apache Spark Workload and Automatic Optimizing
Managing Apache Spark Workload and Automatic OptimizingDatabricks
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsScott Clark
 

Similar a Wide Search Molecular Replacement and the NEBioGrid portal interface (20)

Client Best Practices
Client Best PracticesClient Best Practices
Client Best Practices
 
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
 
MLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott ClarkMLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott Clark
 
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACPerformance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
 
Improving The Quality of Existing Software
Improving The Quality of Existing SoftwareImproving The Quality of Existing Software
Improving The Quality of Existing Software
 
What to expect from Java 9
What to expect from Java 9What to expect from Java 9
What to expect from Java 9
 
HDFS Erasure Coding in Action
HDFS Erasure Coding in Action HDFS Erasure Coding in Action
HDFS Erasure Coding in Action
 
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
 
7 Database Mistakes YOU Are Making -- Linuxfest Northwest 2019
7 Database Mistakes YOU Are Making -- Linuxfest Northwest 20197 Database Mistakes YOU Are Making -- Linuxfest Northwest 2019
7 Database Mistakes YOU Are Making -- Linuxfest Northwest 2019
 
SFDC Batch Apex
SFDC Batch ApexSFDC Batch Apex
SFDC Batch Apex
 
Machine Learning with ML.NET and Azure - Andy Cross
Machine Learning with ML.NET and Azure - Andy CrossMachine Learning with ML.NET and Azure - Andy Cross
Machine Learning with ML.NET and Azure - Andy Cross
 
Pse2010 rel storage
Pse2010 rel storagePse2010 rel storage
Pse2010 rel storage
 
Performance Test Driven Development with Oracle Coherence
Performance Test Driven Development with Oracle CoherencePerformance Test Driven Development with Oracle Coherence
Performance Test Driven Development with Oracle Coherence
 
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The SequelSilicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
 
日本一細かいJavaOne2011報告
日本一細かいJavaOne2011報告日本一細かいJavaOne2011報告
日本一細かいJavaOne2011報告
 
Intro to machine learning with scikit learn
Intro to machine learning with scikit learnIntro to machine learning with scikit learn
Intro to machine learning with scikit learn
 
Alternate for scheduled apex using flow builder
Alternate for scheduled apex using flow builderAlternate for scheduled apex using flow builder
Alternate for scheduled apex using flow builder
 
Information from pixels
Information from pixelsInformation from pixels
Information from pixels
 
Managing Apache Spark Workload and Automatic Optimizing
Managing Apache Spark Workload and Automatic OptimizingManaging Apache Spark Workload and Automatic Optimizing
Managing Apache Spark Workload and Automatic Optimizing
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning Models
 

Más de Boston Consulting Group

Cloud-native Enterprise Data Science Teams
Cloud-native Enterprise Data Science TeamsCloud-native Enterprise Data Science Teams
Cloud-native Enterprise Data Science TeamsBoston Consulting Group
 
Cloud-native Enterprise Data Science Teams
Cloud-native Enterprise Data Science TeamsCloud-native Enterprise Data Science Teams
Cloud-native Enterprise Data Science TeamsBoston Consulting Group
 
Adapting federated cyberinfrastructure for shared data collection facilities ...
Adapting federated cyberinfrastructure for shared data collection facilities ...Adapting federated cyberinfrastructure for shared data collection facilities ...
Adapting federated cyberinfrastructure for shared data collection facilities ...Boston Consulting Group
 
2012 02 pre_hbs_grid_overview_ianstokesrees_pt2
2012 02 pre_hbs_grid_overview_ianstokesrees_pt22012 02 pre_hbs_grid_overview_ianstokesrees_pt2
2012 02 pre_hbs_grid_overview_ianstokesrees_pt2Boston Consulting Group
 
2012 02 pre_hbs_grid_overview_ianstokesrees_pt1
2012 02 pre_hbs_grid_overview_ianstokesrees_pt12012 02 pre_hbs_grid_overview_ianstokesrees_pt1
2012 02 pre_hbs_grid_overview_ianstokesrees_pt1Boston Consulting Group
 
2011 11 pre_cs50_accelerating_sciencegrid_ianstokesrees
2011 11 pre_cs50_accelerating_sciencegrid_ianstokesrees2011 11 pre_cs50_accelerating_sciencegrid_ianstokesrees
2011 11 pre_cs50_accelerating_sciencegrid_ianstokesreesBoston Consulting Group
 
2011 10 pre_broad_grid_overview_ianstokesrees
2011 10 pre_broad_grid_overview_ianstokesrees2011 10 pre_broad_grid_overview_ianstokesrees
2011 10 pre_broad_grid_overview_ianstokesreesBoston Consulting Group
 
Big Data: tools and techniques for working with large data sets
Big Data: tools and techniques for working with large data setsBig Data: tools and techniques for working with large data sets
Big Data: tools and techniques for working with large data setsBoston Consulting Group
 
2010 06 pre_show_computing_lifesciences_stokesrees
2010 06 pre_show_computing_lifesciences_stokesrees2010 06 pre_show_computing_lifesciences_stokesrees
2010 06 pre_show_computing_lifesciences_stokesreesBoston Consulting Group
 

Más de Boston Consulting Group (16)

Cloud-native Enterprise Data Science Teams
Cloud-native Enterprise Data Science TeamsCloud-native Enterprise Data Science Teams
Cloud-native Enterprise Data Science Teams
 
Cloud-native Enterprise Data Science Teams
Cloud-native Enterprise Data Science TeamsCloud-native Enterprise Data Science Teams
Cloud-native Enterprise Data Science Teams
 
Beyond the Science Gateway
Beyond the Science GatewayBeyond the Science Gateway
Beyond the Science Gateway
 
Anaconda Data Science Collaboration
Anaconda Data Science CollaborationAnaconda Data Science Collaboration
Anaconda Data Science Collaboration
 
Python Blaze Overview
Python Blaze OverviewPython Blaze Overview
Python Blaze Overview
 
Making Data Analytics Awesome
Making Data Analytics AwesomeMaking Data Analytics Awesome
Making Data Analytics Awesome
 
Adapting federated cyberinfrastructure for shared data collection facilities ...
Adapting federated cyberinfrastructure for shared data collection facilities ...Adapting federated cyberinfrastructure for shared data collection facilities ...
Adapting federated cyberinfrastructure for shared data collection facilities ...
 
SBGrid Science Portal - eScience 2012
SBGrid Science Portal - eScience 2012SBGrid Science Portal - eScience 2012
SBGrid Science Portal - eScience 2012
 
2012 02 pre_hbs_grid_overview_ianstokesrees_pt2
2012 02 pre_hbs_grid_overview_ianstokesrees_pt22012 02 pre_hbs_grid_overview_ianstokesrees_pt2
2012 02 pre_hbs_grid_overview_ianstokesrees_pt2
 
2012 02 pre_hbs_grid_overview_ianstokesrees_pt1
2012 02 pre_hbs_grid_overview_ianstokesrees_pt12012 02 pre_hbs_grid_overview_ianstokesrees_pt1
2012 02 pre_hbs_grid_overview_ianstokesrees_pt1
 
2011 11 pre_cs50_accelerating_sciencegrid_ianstokesrees
2011 11 pre_cs50_accelerating_sciencegrid_ianstokesrees2011 11 pre_cs50_accelerating_sciencegrid_ianstokesrees
2011 11 pre_cs50_accelerating_sciencegrid_ianstokesrees
 
2011 10 pre_broad_grid_overview_ianstokesrees
2011 10 pre_broad_grid_overview_ianstokesrees2011 10 pre_broad_grid_overview_ianstokesrees
2011 10 pre_broad_grid_overview_ianstokesrees
 
Grid Computing Overview
Grid Computing OverviewGrid Computing Overview
Grid Computing Overview
 
Big Data: tools and techniques for working with large data sets
Big Data: tools and techniques for working with large data setsBig Data: tools and techniques for working with large data sets
Big Data: tools and techniques for working with large data sets
 
2010 06 pre_show_computing_lifesciences_stokesrees
2010 06 pre_show_computing_lifesciences_stokesrees2010 06 pre_show_computing_lifesciences_stokesrees
2010 06 pre_show_computing_lifesciences_stokesrees
 
To Infiniband and Beyond
To Infiniband and BeyondTo Infiniband and Beyond
To Infiniband and Beyond
 

Wide Search Molecular Replacement and the NEBioGrid portal interface

  • 1. Wide-­‐Search   Molecular   Replacement Ian  Stokes-­‐Rees http://portal.nebiogrid.org/
  • 2. When  WS-­‐MR  is  suitable • You’ve  got  good  data  (<4  A) • You’ve  tried  MR  with  lots  of  good  candidates • a  priori  knowledge • sequence  similarity  (PSI-­‐BLAST  search) • Or • protein  not  sequenced • no  a  priori  knowledge  of  expected  fold • You  haven’t  found  any  good  models  to  use  for   phasing • Time  to  try  a  brute-­‐force  search:  WS-­‐MR
  • 3. When  MR  is  not  suitable • Complexes  containing  signiOicant  DNA  or  RNA • at  least  right  now,  these  will  probably  not  work • You  haven’t  tried  MR  and  just  want  a  “quick  Oix” • Very  large  or  very  small  structures • both  are  computationally  difOicult • Low  resolution  (>  4.5  A) • experience  so  far  suggests  these  aren’t  going  to  be  helped  much
  • 4. Requirements • ReOlection  data  in  MTZ  Oile  format • Must  have  amplitude  columns  (e.g.  FP,  SIGFP) • Doesn’t  work  with  intensities  (I,  SIGI) • Time • To  analyze  results • To  take  next  steps • Managed  expectations • Identify  good  MR  candidates  about  1  in  4  cases • We  don’t  produce  a  fully  phased  structure,  only  a  list  of  good  MR   candidates  and  their  best  placements  as  returned  by  Phaser • Experience  with  Phaser  to  interpret  results  and   re-­‐run  candidate  models
  • 5. Background • Utilizes  Phaser  for  MR • Utilizes  Open  Science  Grid  for  computing • References • Stokes-­‐Rees,  Sliz,  Protein  structure  determination  by  exhaustive  search  of  Protein  Data   Bank  derived  databases,  Proc.  Nat'l  Academy  of  Sciences  doi:10.1073/pnas.1012095107 • Stokes-­‐Rees,  Sliz,  Compute  and  data  management  strategies  for  grid  deployment  of  high   throughput  protein  structure  studies,  IEEE  Workshop  on  Many  Task  Computing  on  Grids   and  Supercomputers  2010  (MTAGS10),  Seattle,  November  2010 • Phaser:  McCoy,  Grosse-­‐Kunstleve,  Adams,  Winn,  Storoni,  Read;  J.  Appl.  Cryst.  (2007).  40,   658-­‐674 • Murzin  A.  G.,  Brenner  S.  E.,  Hubbard  T.,  Chothia  C.  (1995).  SCOP:  a  structural   classi?ication  of  proteins  database  for  the  investigation  of  sequences  and  structures.  J.  Mol.   Biol.  247,  536-­‐540. • Requires  20-­‐50,000  hours  of  computing • Produces  300,000  Oiles • Attempts  100,000  single-­‐domain  MR  trials  using  all  SCOP   domains
  • 6. Step  1:  Register  to  use  Portal https://portal.nebiogrid.org/d/accounts/create
  • 7. Step  2:  Submit  Computational  Task https://portal.nebiogrid.org/d/apps/wsmr/create
  • 8. Side  Note:  MTZ  columns • Use  CCP4  tool  “mtzdmp”  to  check  column  names   and  resolution  if  you’re  not  sure column $ mtzdmp GAS.mtz | less names resolution ... * Column Labels : H K L FP SIGFP FreeRflag ... * Resolution Range : 0.00050 0.25197 ( 44.699 - 1.992 A ) ...
  • 9. Step  3a:  Review  active  task   list  on  portal click  here  to   access  task
  • 10. Step  3b:  Check  email  for  task   details  and  link click  here  to   access  task
  • 11. Step  4:  Log  into  job  page
  • 12. Step  5a:  Review  web  page
  • 13. Step  5b:  Check  status Click   here Remember:  Someone  from  SBGrid  will   R  =  Running manually  review  your  job  and  release  it.     Until  that  happens  your  job  won’t  even  be  in   I  =  Idle the  queue.    Even  after  that,  it  could  be  in  the   H  =  Held queue  for  several  days  before  it  starts   running.    Do  email  us  if  you  have  questions   or  if  it  seems  stuck  or  not  running.
  • 14. Step  5c:  Check  status summary  of   active  jobs outcomes  to  date
  • 15. Step  6a:  Review  scatter  graphs Look  for  a  cluster  of  high   TFZ  and  high  LLG  results   distinct  from  the  rest   NOTE:  This  graph  is  a  static  image
  • 16. Step  6b:  Cases  with  no  strong   MR  candidates* *  Remember  this  is  usually  the  case,  unfortunately
  • 17. Step  6c:  Review  scatter  graphs Click  this  button   to  load  data  and   enable  clickable   image NOTE:  This  graph  is  a  dynamic  clickable  image.     Only  the  Oirst  5000  results  by  LLG  are  currently   available  because  of  memory  constraints
  • 18. Step  6d:  Review  scatter  graphs Click  data  point   to  view  details Click  large  cartoon   image  to  add  to   PDB   image  basket details
  • 19. Step  7:  Review  tabular  data live  results  (space  delimited) sorted  results  (tab  delimited),   generated  by  ”check  status”
  • 20. Step  8:  Wait  for  job  to  Oinish No  running  jobs  (all  done) NOTE:  This  job  is  not   results  aprox.  100,000 yet  Oinished! errors  <  5,000
  • 21. Step  9:  Download  Oinalized   augmented  results augmented  contains  static  SCOP   domain  class  and  name  (25  MB) Oinal  contains  a  sorted,  cleaned   set  of  results  (5  MB)
  • 22. Step  10:  Review  and   download  speciOic  SCOP  PDB   • Use  the  tabular  results  to  identify  speciOic  SCOP  codes   that  look  promising • PDBs  can  be  fetched  using  one  of  these  resources: http://portal.nebiogrid.org/biodb/scop/v1.75/clean/code2/ http://abitibi.sbgrid.org/cgi/pdbview.py http://abitibi.sbgrid.org/cgi/tmalign.py
  • 23.
  • 24.
  • 25. Step  11:  Recreate  Phaser  output This  is  the  command   input  to  Phaser ROOT 2vlj-test MODE MR_AUTO HKLIn ../2vlj.mtz LABIn F=FP SIGF=SIGFP ENSEmble 200la_ PDB 00/200la_.pdb IDENtity 0.3 COMPosition SOLVENT 50.0 RESOlution 2.4 SEARch ENSEmble 200la_ NUM 1 Click  on  “test”   directory (bottom  of  job  page)
  • 26. Step  12:  Over  to  you • You  now  need  to  reOine  your  structure • WS-­‐MR  only  gets  you  as  far  attempting  to   identify  promising  MR  candidates  if  you  haven’t   had  success  with  conventional  model   identiOication  methods • Some  further  MR  options  that  exist: • Second  domain  search  with  Oirst  domain  Oixed • homo-­‐dimer/homo-­‐trimer  searches • Custom  PDB  search  library  -­‐  you  give  us  the  PDBs,  we  can  run  WS-­‐MR   over  the  set
  • 27. Conclusion  and  Thanks • We  welcome  ideas  for  improvements • Special  processing  requirements? • We  may  be  able  to  do  this  from  the  command  line  interface • Please  contact  us  if  you  have  any  questions • hpc@sbgrid.org • Open  Science  Grid  is  a  big  enabler  here! • http://opensciencegrid.org • Thanks  to  SBGrid  team: • http://www.sbgrid.org • Thanks  to  the  Sliz  Lab  at  Harvard  Medical  School: • http://hkl.hms.harvard.edu