SlideShare una empresa de Scribd logo
1 de 9
Descargar para leer sin conexión
Data Processing with
       Ruby
        Brian Chapados
      http://chapados.org



                              SDRuby
                            April 3, 2008
Understanding Proteins
sequence: 1-D linear chain
     > Archaeglobus PCNA
     MIDVIMTGELLKTVTRAIVALVSEARIHFLEKGLHSRAVDPANVAMVIVDIPK
     DSFEVYNIDEEKTIGVDMDRIFDISKSISTKDLVELIVEDESTLKVKFGSVEYK
     VALIDPSAIRKEPRIPELELPAKIVMDAGEFKKAIAAADKISDQVIFRSDKEGF
     RIEAKGDVDSIVFHMTETELIEFNGGEARSMFSVDYLKEFCKVAGSGDLLTI
     HLGTNYPVRLVFELVGGRAKVEYILAPRIESE




 structure: 3-D after
       folding
Hard to do structures with several
          components
X-ray scattering




            C. Trame, personal communication.
            Sousa et al. 2000. Cell 103: 633-643.
Raw Data
    Distance distribution function of
            particle


       R        P(R)      ERROR

0.0000E+00   0.0000E+00   0.0000E+00
0.5000E+00   0.3157E-02   0.0000E+00
0.1000E+01   0.6069E-02   0.0000E+00
0.1500E+01   0.8740E-02   0.0000E+00
0.2000E+01   0.1118E-01   0.0000E+00
0.2500E+01   0.1339E-01   0.0000E+00
0.3000E+01   0.1538E-01   0.0000E+00
0.3500E+01   0.1718E-01   0.0000E+00
0.4000E+01   0.1879E-01   0.0000E+00
0.4500E+01   0.2023E-01   0.0000E+00
0.5000E+01   0.2153E-01   0.0000E+00
0.5500E+01   0.2269E-01   0.0000E+00
0.6000E+01   0.2374E-01   0.0000E+00
0.6500E+01   0.2471E-01   0.0000E+00
0.7000E+01   0.2560E-01   0.0000E+00
0.7500E+01   0.2645E-01   0.0000E+00
0.8000E+01   0.2727E-01   0.0000E+00
0.8500E+01   0.2809E-01   0.0000E+00
0.9000E+01   0.2891E-01   0.0000E+00
0.9500E+01   0.2976E-01   0.0000E+00
0.1000E+02   0.3065E-01   0.0000E+00
0.1050E+02   0.3160E-01   0.0000E+00
Existing Software
Svergun group @ EMBL
http://www.embl-hamburg.de/ExternalInfo/Research/Sax/software.html



Works well, but...
    requires running each program multiple times
   “interactive” interfaces
    not easily scriptable
    no really... you have to see it to believe it
Help from Ruby
We want to use linux clusters with hundreds of CPUs

Ruby
 wrap external programs
 write shell scripts to run external programs
Rake
 define relationships between inputs/outputs of
               different programs
 launch external programs after dependencies
                  are satisfied
Do more with Ruby
quick and dirty...
     Define input parameters in a script
     Define common tasks in a library

 more robust...
    Ruby API for running commands
    More sophisticated information processing
    Evolve towards a micro-framework
Acknowledgements
Lab (Scripps Research Institute)
 John Tainer
 Scott Williams
 Chris Putnam

Data Collection                    Funding
    Beamline 12.3.1                 NIH, DOE, NCI
  The Advanced Light
  Source (ALS, LBNL)

Más contenido relacionado

Destacado

Aquarelas Envelhecidas Cora Coralina
Aquarelas Envelhecidas Cora CoralinaAquarelas Envelhecidas Cora Coralina
Aquarelas Envelhecidas Cora Coralina
rapolido
 
Internet Curriculum Project
Internet Curriculum ProjectInternet Curriculum Project
Internet Curriculum Project
miss_dumiak
 
Dispositivos Almacenamiento
Dispositivos AlmacenamientoDispositivos Almacenamiento
Dispositivos Almacenamiento
susitaipe
 
Presentac[1]..
Presentac[1]..Presentac[1]..
Presentac[1]..
jjgonzalez
 
Presentac[1]..
Presentac[1]..Presentac[1]..
Presentac[1]..
jjgonzalez
 
Multimedia Final
Multimedia FinalMultimedia Final
Multimedia Final
boirablava
 
Apresentacao com oportunidade de trabalho para Promotor(a) e Supervisor(a) be...
Apresentacao com oportunidade de trabalho para Promotor(a) e Supervisor(a) be...Apresentacao com oportunidade de trabalho para Promotor(a) e Supervisor(a) be...
Apresentacao com oportunidade de trabalho para Promotor(a) e Supervisor(a) be...
guest1506a6
 

Destacado (20)

Kenesunumu
KenesunumuKenesunumu
Kenesunumu
 
Aquarelas Envelhecidas Cora Coralina
Aquarelas Envelhecidas Cora CoralinaAquarelas Envelhecidas Cora Coralina
Aquarelas Envelhecidas Cora Coralina
 
Business Advantage On A Warming Planet
Business Advantage On A Warming PlanetBusiness Advantage On A Warming Planet
Business Advantage On A Warming Planet
 
Guantánamo
GuantánamoGuantánamo
Guantánamo
 
Rwanda
RwandaRwanda
Rwanda
 
Internet Curriculum Project
Internet Curriculum ProjectInternet Curriculum Project
Internet Curriculum Project
 
Rivista
RivistaRivista
Rivista
 
Dispositivos Almacenamiento
Dispositivos AlmacenamientoDispositivos Almacenamiento
Dispositivos Almacenamiento
 
Presentac[1]..
Presentac[1]..Presentac[1]..
Presentac[1]..
 
instrumentos del negocio
instrumentos del negocioinstrumentos del negocio
instrumentos del negocio
 
Dispositivos Almacenamiento
Dispositivos AlmacenamientoDispositivos Almacenamiento
Dispositivos Almacenamiento
 
Refik Saydam Hifzisihha Merkezinin TanıDaki Rolu
Refik Saydam Hifzisihha Merkezinin TanıDaki RoluRefik Saydam Hifzisihha Merkezinin TanıDaki Rolu
Refik Saydam Hifzisihha Merkezinin TanıDaki Rolu
 
Presentac[1]..
Presentac[1]..Presentac[1]..
Presentac[1]..
 
Alpha6 Guidance
Alpha6 GuidanceAlpha6 Guidance
Alpha6 Guidance
 
Kkkah
KkkahKkkah
Kkkah
 
proffessional
proffessionalproffessional
proffessional
 
Egxeiridio Drastiriotiton Modellus
Egxeiridio Drastiriotiton ModellusEgxeiridio Drastiriotiton Modellus
Egxeiridio Drastiriotiton Modellus
 
Multimedia Final
Multimedia FinalMultimedia Final
Multimedia Final
 
Apresentacao com oportunidade de trabalho para Promotor(a) e Supervisor(a) be...
Apresentacao com oportunidade de trabalho para Promotor(a) e Supervisor(a) be...Apresentacao com oportunidade de trabalho para Promotor(a) e Supervisor(a) be...
Apresentacao com oportunidade de trabalho para Promotor(a) e Supervisor(a) be...
 
D Mc Clelland Test
D Mc Clelland TestD Mc Clelland Test
D Mc Clelland Test
 

Similar a Processing Data with Ruby

Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Databricks
 
Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...
Dmytro Mishkin
 
CassandraMeetup-0225-updated
CassandraMeetup-0225-updatedCassandraMeetup-0225-updated
CassandraMeetup-0225-updated
Wei Zhu
 

Similar a Processing Data with Ruby (20)

Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
 
Cognitive Engine: Boosting Scientific Discovery
Cognitive Engine:  Boosting Scientific DiscoveryCognitive Engine:  Boosting Scientific Discovery
Cognitive Engine: Boosting Scientific Discovery
 
SRAdb Bioconductor Package Overview
SRAdb Bioconductor Package OverviewSRAdb Bioconductor Package Overview
SRAdb Bioconductor Package Overview
 
Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...
 
Plutniak maisonobe resto atelier2-network
Plutniak maisonobe resto atelier2-networkPlutniak maisonobe resto atelier2-network
Plutniak maisonobe resto atelier2-network
 
CassandraMeetup-0225-updated
CassandraMeetup-0225-updatedCassandraMeetup-0225-updated
CassandraMeetup-0225-updated
 
Analyzing Log Data With Apache Spark
Analyzing Log Data With Apache SparkAnalyzing Log Data With Apache Spark
Analyzing Log Data With Apache Spark
 
Open Source Means Upstream First
Open Source Means Upstream FirstOpen Source Means Upstream First
Open Source Means Upstream First
 
Katello on TorqueBox
Katello on TorqueBoxKatello on TorqueBox
Katello on TorqueBox
 
Microservices With Spring Boot and Spring Cloud Netflix
Microservices With Spring Boot and Spring Cloud NetflixMicroservices With Spring Boot and Spring Cloud Netflix
Microservices With Spring Boot and Spring Cloud Netflix
 
Surveillance scene classification using machine learning
Surveillance scene classification using machine learningSurveillance scene classification using machine learning
Surveillance scene classification using machine learning
 
Discovery and annotation of variants by exome analysis using NGS
Discovery and annotation of variants by exome analysis using NGSDiscovery and annotation of variants by exome analysis using NGS
Discovery and annotation of variants by exome analysis using NGS
 
DAT202_Getting started with Amazon Aurora
DAT202_Getting started with Amazon AuroraDAT202_Getting started with Amazon Aurora
DAT202_Getting started with Amazon Aurora
 
Project Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Project Tungsten Phase II: Joining a Billion Rows per Second on a LaptopProject Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Project Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
 
Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2
 
CloudCon2012 Ruo Ando
CloudCon2012 Ruo AndoCloudCon2012 Ruo Ando
CloudCon2012 Ruo Ando
 
String Comparison Surprises: Did Postgres lose my data?
String Comparison Surprises: Did Postgres lose my data?String Comparison Surprises: Did Postgres lose my data?
String Comparison Surprises: Did Postgres lose my data?
 
RDF Stream Processing and the role of Semantics
RDF Stream Processing and the role of SemanticsRDF Stream Processing and the role of Semantics
RDF Stream Processing and the role of Semantics
 
Ben Coverston - The Apache Cassandra Project
Ben Coverston - The Apache Cassandra ProjectBen Coverston - The Apache Cassandra Project
Ben Coverston - The Apache Cassandra Project
 
Solr and Machine Vision - Scott Cote, Lucidworks & Trevor Grant, IBM
Solr and Machine Vision - Scott Cote, Lucidworks & Trevor Grant, IBMSolr and Machine Vision - Scott Cote, Lucidworks & Trevor Grant, IBM
Solr and Machine Vision - Scott Cote, Lucidworks & Trevor Grant, IBM
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

Processing Data with Ruby

  • 1. Data Processing with Ruby Brian Chapados http://chapados.org SDRuby April 3, 2008
  • 2. Understanding Proteins sequence: 1-D linear chain > Archaeglobus PCNA MIDVIMTGELLKTVTRAIVALVSEARIHFLEKGLHSRAVDPANVAMVIVDIPK DSFEVYNIDEEKTIGVDMDRIFDISKSISTKDLVELIVEDESTLKVKFGSVEYK VALIDPSAIRKEPRIPELELPAKIVMDAGEFKKAIAAADKISDQVIFRSDKEGF RIEAKGDVDSIVFHMTETELIEFNGGEARSMFSVDYLKEFCKVAGSGDLLTI HLGTNYPVRLVFELVGGRAKVEYILAPRIESE structure: 3-D after folding
  • 3. Hard to do structures with several components
  • 4. X-ray scattering C. Trame, personal communication. Sousa et al. 2000. Cell 103: 633-643.
  • 5. Raw Data Distance distribution function of particle R P(R) ERROR 0.0000E+00 0.0000E+00 0.0000E+00 0.5000E+00 0.3157E-02 0.0000E+00 0.1000E+01 0.6069E-02 0.0000E+00 0.1500E+01 0.8740E-02 0.0000E+00 0.2000E+01 0.1118E-01 0.0000E+00 0.2500E+01 0.1339E-01 0.0000E+00 0.3000E+01 0.1538E-01 0.0000E+00 0.3500E+01 0.1718E-01 0.0000E+00 0.4000E+01 0.1879E-01 0.0000E+00 0.4500E+01 0.2023E-01 0.0000E+00 0.5000E+01 0.2153E-01 0.0000E+00 0.5500E+01 0.2269E-01 0.0000E+00 0.6000E+01 0.2374E-01 0.0000E+00 0.6500E+01 0.2471E-01 0.0000E+00 0.7000E+01 0.2560E-01 0.0000E+00 0.7500E+01 0.2645E-01 0.0000E+00 0.8000E+01 0.2727E-01 0.0000E+00 0.8500E+01 0.2809E-01 0.0000E+00 0.9000E+01 0.2891E-01 0.0000E+00 0.9500E+01 0.2976E-01 0.0000E+00 0.1000E+02 0.3065E-01 0.0000E+00 0.1050E+02 0.3160E-01 0.0000E+00
  • 6. Existing Software Svergun group @ EMBL http://www.embl-hamburg.de/ExternalInfo/Research/Sax/software.html Works well, but... requires running each program multiple times “interactive” interfaces not easily scriptable no really... you have to see it to believe it
  • 7. Help from Ruby We want to use linux clusters with hundreds of CPUs Ruby wrap external programs write shell scripts to run external programs Rake define relationships between inputs/outputs of different programs launch external programs after dependencies are satisfied
  • 8. Do more with Ruby quick and dirty... Define input parameters in a script Define common tasks in a library more robust... Ruby API for running commands More sophisticated information processing Evolve towards a micro-framework
  • 9. Acknowledgements Lab (Scripps Research Institute) John Tainer Scott Williams Chris Putnam Data Collection Funding Beamline 12.3.1 NIH, DOE, NCI The Advanced Light Source (ALS, LBNL)