SlideShare una empresa de Scribd logo
1 de 17
From Sequencer to Clinic:
Managing Science and Scale

Sultan Meghi, Vice President of Product
Strategy

World Genome Data Analysis Summit
November 28, 2012
 Challenges Along the Path from Genomics Research to
  Personalized Medicine
    Implementing technology
    Implementing science
    Scaling from research to clinic
 The Problem Restated…
   What’s the most efficient, reliable and robust way to capture
   my genetic data, analyze it and secure it for re-analysis and
   deeper interpretation in a clinical setting?

 Enabling Science at Scale
    Platform for big data
    Analytics framework for implementing science
    Flexible deployment

                                                              AGENDA   1
Target:
                                                           Clinicians
   Mega-scale          Complex          Infrastructure   and Patients
      data             Pipeline             costs,
   management        Development,        complexity,     leveraging a
    and data            Test &           security and     dynamically
    analysis.        Deployment.         compliance.
                                                           expanding
    Accelerating the Science of Genetic Discovery for
                                                             field of
     Researchers, Bioinformatics Specialists & Tool         science.
                      Development.


Government                                                 3rd Party
 Funding                                                    Payers


                                                            CUSTOMER NEEDS
“We can sequence the genome for dirt
      cheap, but we don’t know how to deal
      with the data.”
                               Eric Green M.D.,Ph.D.
                               Director, NHGRI




      “How do we avoid the pitfall of having
      cheap human genome sequencing but
      complex and expensive manual analysis
      to make clinical sense out of the data?”
                          Elaine Mardis Ph.D.
                          Director of Technology Development




Source: WSJ, NYT, Genome Medicine                              THE GENOMICS DATA PROBLEM   3
“Big Data” is essentially large amounts of data
       Multiple sources or data formats
       Unstructured or semi-structured
       Difficult to put into databases and analyze


Seen in other industry areas:




  Telecom




                                                      THE BIG DATA CHALLENGE IN GENOMICS   4
“Moving data around and storing the data is painful.
                                                 It’s a huge problem for us. We’re looking at the
                                                 cloud for processing options.”
                                                 - Carol Rohl Ph.D., Director of Merck, Research Labs
             STORAGE

                                                 “Datasets are so large, you have to analyze them at the
                                                 same site where the data is or using mirrors. You do not
                                                 want to be writing it onto a remote hard drive and move
                                                 the data each time you want to analyze it.”
        COMPUTATION                              - John Monahan, Novartis Institutes for Biomedical Research


                                                 “Bioinformatics tools and reference datasets change
                                                 monthly, weekly and in some cases daily. This requires
                                                 easy to manage application and data management
                                                 platforms to keep up to date with all the changes.”
        APPLICATIONS
                                                 - Sultan Meghji, Appistry, GigaOM 2012

Source: Appistry proprietary market research by CBT Advisors                    THE BIG DATA CHALLENGE IN GENOMICS   5
CLOUD STORAGE



 STORAGE

                   ANALYTICS



COMPUTATION


               USER-FOCUSED TOOLS



APPLICATIONS

                                    APPROACHES   6
7
CLOUD STORAGE




    ANALYTICS




USER-FOCUSED TOOLS




                     WHY APPISTRY?   8
Capabilities needed



                                     Automated Data
     Private Cloud Genomics
                                 Management and Storage
         Services (HIPAA
                                    Tightly Coupled to
           Compliance)
                                         Analysis

                        Industry Tools,
                        Data Sets and
                        YOUR Science

             Massively           Analytics Layer Simplifies
     Scalable/Reliable Fabric       the Build, Test and
     for Algorithms, Tools and    Deployment of Analytic
            Applications                 Pipelines


                                               APPISTRY’S GENOMIC SOLUTION   9
ATCGTA
                    TCGGCA
                     CTAATC
                    GCTCGG
                     CTATAG
                                                  Public Cloud
              Data from
             Sequencers              2
                                          8   5
                          1                             3         Open-Source
                                                    9              Algorithms
                                     4
                                          7                 3
                                     10


                              User
                                                            6
                                                                  Public Gene
      For EachRun Data All StorageDataFTP or forRepeat5+Days3-8
      Step 1: AccessData Algorithms Databases 9, 10= =steps
           8: Open-Source Algorithm Infrastructure Months
           7: New Gene Stored Open-Source
           6: Reorganize Gene on via Update:
           5: Upload algorithms + Sequence1,
           3: Send AlgorithmsRepeatfortoInfo 2, Infrastructure
           10:
           9:           Public Gene steps Data
                         Set: Database Storage
           4: Reprogramto Database to Storage Algorithms
           2: Download DataData + Sequencer FedEx
                     Stored From                                  Databases

Source: Appistry survey                                                 AYRRIS PRODUCT   10
ATCGTA
  TCGGCA
   CTAATC
  GCTCGG            SFTP
                    Transfer                                HIPAA Compliant
   CTATAG
                                                            Genomics Cloud

 Data from                         Appistry Private Cloud
                   Appistry
Sequencers         Courier
                   Over
                                   Annotated Results &       Ayrris Pipelines
                   HTTPS              Visualizations         Your Science


                                                    SNPs, Indels,
                                                    Rare Variants, etc
                                      Appistry
                                      Courier
             Consumption of           Over
             Results by internal      HTTPS
             Bioinformaticians
             and Clinicians
                                                      Data
                                                      Center and
                                                      Researcher
                                                                   CLOUD WORKFLOW   11
APPISTRY CLOUD                          APPISTRY APPLIANCE
                                             INSTITUTION




        via INTERNET




   Cloud-based genomic data                 On-site modular turn-key
    analysis and storage                      hardware and software
   Subscription to Appistry’s secure,       Enterprise-level implementation of
    HIPAA compliant cloud storage             private network HIPAA-enabled
                                              storage

   Same access to pipeline analysis algorithms & annotations (Same Science)
               Same underlying technology and efficiency
                                                                    BUSINESS MODEL   12
ATCGTA
    TCGGCA
     CTAATC                                                      Regulatory
    GCTCGG                                                       Compliant
     CTATAG
                                                               Genomics Cloud
                                  Appistry Private Cloud
  Data from
 Sequencers                       Annotated Results &          Ayrris Pipelines
                                     Visualizations            Your Science
Data from other
 instruments




                                            Integrated with
                                                                      Integrated with
 Secured, Integrated Workflows,             Research Data
                                                                      Medical Data –
 Data Management and Analysis             Systems (Genomics,
                                                                     EMR, Biller/Payer
                                               Pharma)

                                                                         CLOUD WORKFLOW   13
iTunes




“DEMOCRATIZATION” OF DATA



                            APPROACHES   14
Genomic Information



                      Decisions for prevention
                      or early treatment

                      Breast cancer
                      Osteoporosis
                      Lung cancer
                      Heart disease


                      Autism
                      Leukemia
                      ADHD
                      Genetic disorders




                                                 15
Thanks for Your Attention


main:   314.450.5720
fax:    314.450.5722
sultan@appistry.com

appistry.com
1141 South 7th St., Suite 300
St. Louis, MO 63141

Más contenido relacionado

La actualidad más candente

Advanced Research Computing at York
Advanced Research Computing at YorkAdvanced Research Computing at York
Advanced Research Computing at YorkMing Li
 
An efficient data masking for securing medical data using DNA encoding and ch...
An efficient data masking for securing medical data using DNA encoding and ch...An efficient data masking for securing medical data using DNA encoding and ch...
An efficient data masking for securing medical data using DNA encoding and ch...IJECEIAES
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemWarren Kibbe
 
Actian corporation case study rohatyn group
Actian corporation case study rohatyn groupActian corporation case study rohatyn group
Actian corporation case study rohatyn groupActian Corporation
 
An intrusion detection system for packet and flow based networks using deep n...
An intrusion detection system for packet and flow based networks using deep n...An intrusion detection system for packet and flow based networks using deep n...
An intrusion detection system for packet and flow based networks using deep n...IJECEIAES
 
Streaming HYpothesis REasoning
Streaming HYpothesis REasoningStreaming HYpothesis REasoning
Streaming HYpothesis REasoningWilliam Smith
 
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014Robert Grossman
 
ESG Lab Report - Catalogic Software DPX
ESG Lab Report - Catalogic Software DPXESG Lab Report - Catalogic Software DPX
ESG Lab Report - Catalogic Software DPXCatalogic Software
 
FEATURE EXTRACTION AND FEATURE SELECTION: REDUCING DATA COMPLEXITY WITH APACH...
FEATURE EXTRACTION AND FEATURE SELECTION: REDUCING DATA COMPLEXITY WITH APACH...FEATURE EXTRACTION AND FEATURE SELECTION: REDUCING DATA COMPLEXITY WITH APACH...
FEATURE EXTRACTION AND FEATURE SELECTION: REDUCING DATA COMPLEXITY WITH APACH...IJNSA Journal
 
Dashboards for Business Intelligence
Dashboards for Business IntelligenceDashboards for Business Intelligence
Dashboards for Business IntelligencePetteriTeikariPhD
 
Data Provenance for Data Science
Data Provenance for Data ScienceData Provenance for Data Science
Data Provenance for Data SciencePaolo Missier
 
Leveraging the CEDAR Workbench for Ontology-linked Submission of Adaptive Imm...
Leveraging the CEDAR Workbench for Ontology-linked Submission of Adaptive Imm...Leveraging the CEDAR Workbench for Ontology-linked Submission of Adaptive Imm...
Leveraging the CEDAR Workbench for Ontology-linked Submission of Adaptive Imm...Ahmad C. Bukhari
 
Analysis of Malware Infected Systems & Classification with Gradient-boosted T...
Analysis of Malware Infected Systems & Classification with Gradient-boosted T...Analysis of Malware Infected Systems & Classification with Gradient-boosted T...
Analysis of Malware Infected Systems & Classification with Gradient-boosted T...Darshan Gorasiya
 
PhD dissertation Luis Marco Ruiz
PhD dissertation Luis Marco RuizPhD dissertation Luis Marco Ruiz
PhD dissertation Luis Marco RuizLuis Marco Ruiz
 
The Pacific Research Platform: A Regional-Scale Big Data Analytics Cyberinfra...
The Pacific Research Platform: A Regional-Scale Big Data Analytics Cyberinfra...The Pacific Research Platform: A Regional-Scale Big Data Analytics Cyberinfra...
The Pacific Research Platform: A Regional-Scale Big Data Analytics Cyberinfra...Larry Smarr
 
Constructing a predictive model for an intelligent network intrusion detection
Constructing a predictive model for an intelligent network intrusion detectionConstructing a predictive model for an intelligent network intrusion detection
Constructing a predictive model for an intelligent network intrusion detectionAlebachew Chiche
 
Adaptive Real Time Data Mining Methodology for Wireless Body Area Network Bas...
Adaptive Real Time Data Mining Methodology for Wireless Body Area Network Bas...Adaptive Real Time Data Mining Methodology for Wireless Body Area Network Bas...
Adaptive Real Time Data Mining Methodology for Wireless Body Area Network Bas...acijjournal
 
New Hybrid Intrusion Detection System Based On Data Mining Technique to Enhan...
New Hybrid Intrusion Detection System Based On Data Mining Technique to Enhan...New Hybrid Intrusion Detection System Based On Data Mining Technique to Enhan...
New Hybrid Intrusion Detection System Based On Data Mining Technique to Enhan...ijceronline
 
How to Scale from Workstation through Cloud to HPC in Cryo-EM Processing
How to Scale from Workstation through Cloud to HPC in Cryo-EM ProcessingHow to Scale from Workstation through Cloud to HPC in Cryo-EM Processing
How to Scale from Workstation through Cloud to HPC in Cryo-EM Processinginside-BigData.com
 

La actualidad más candente (20)

Advanced Research Computing at York
Advanced Research Computing at YorkAdvanced Research Computing at York
Advanced Research Computing at York
 
An efficient data masking for securing medical data using DNA encoding and ch...
An efficient data masking for securing medical data using DNA encoding and ch...An efficient data masking for securing medical data using DNA encoding and ch...
An efficient data masking for securing medical data using DNA encoding and ch...
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
 
Actian corporation case study rohatyn group
Actian corporation case study rohatyn groupActian corporation case study rohatyn group
Actian corporation case study rohatyn group
 
An intrusion detection system for packet and flow based networks using deep n...
An intrusion detection system for packet and flow based networks using deep n...An intrusion detection system for packet and flow based networks using deep n...
An intrusion detection system for packet and flow based networks using deep n...
 
Streaming HYpothesis REasoning
Streaming HYpothesis REasoningStreaming HYpothesis REasoning
Streaming HYpothesis REasoning
 
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
 
ESG Lab Report - Catalogic Software DPX
ESG Lab Report - Catalogic Software DPXESG Lab Report - Catalogic Software DPX
ESG Lab Report - Catalogic Software DPX
 
FEATURE EXTRACTION AND FEATURE SELECTION: REDUCING DATA COMPLEXITY WITH APACH...
FEATURE EXTRACTION AND FEATURE SELECTION: REDUCING DATA COMPLEXITY WITH APACH...FEATURE EXTRACTION AND FEATURE SELECTION: REDUCING DATA COMPLEXITY WITH APACH...
FEATURE EXTRACTION AND FEATURE SELECTION: REDUCING DATA COMPLEXITY WITH APACH...
 
Dashboards for Business Intelligence
Dashboards for Business IntelligenceDashboards for Business Intelligence
Dashboards for Business Intelligence
 
Data Provenance for Data Science
Data Provenance for Data ScienceData Provenance for Data Science
Data Provenance for Data Science
 
C3602021025
C3602021025C3602021025
C3602021025
 
Leveraging the CEDAR Workbench for Ontology-linked Submission of Adaptive Imm...
Leveraging the CEDAR Workbench for Ontology-linked Submission of Adaptive Imm...Leveraging the CEDAR Workbench for Ontology-linked Submission of Adaptive Imm...
Leveraging the CEDAR Workbench for Ontology-linked Submission of Adaptive Imm...
 
Analysis of Malware Infected Systems & Classification with Gradient-boosted T...
Analysis of Malware Infected Systems & Classification with Gradient-boosted T...Analysis of Malware Infected Systems & Classification with Gradient-boosted T...
Analysis of Malware Infected Systems & Classification with Gradient-boosted T...
 
PhD dissertation Luis Marco Ruiz
PhD dissertation Luis Marco RuizPhD dissertation Luis Marco Ruiz
PhD dissertation Luis Marco Ruiz
 
The Pacific Research Platform: A Regional-Scale Big Data Analytics Cyberinfra...
The Pacific Research Platform: A Regional-Scale Big Data Analytics Cyberinfra...The Pacific Research Platform: A Regional-Scale Big Data Analytics Cyberinfra...
The Pacific Research Platform: A Regional-Scale Big Data Analytics Cyberinfra...
 
Constructing a predictive model for an intelligent network intrusion detection
Constructing a predictive model for an intelligent network intrusion detectionConstructing a predictive model for an intelligent network intrusion detection
Constructing a predictive model for an intelligent network intrusion detection
 
Adaptive Real Time Data Mining Methodology for Wireless Body Area Network Bas...
Adaptive Real Time Data Mining Methodology for Wireless Body Area Network Bas...Adaptive Real Time Data Mining Methodology for Wireless Body Area Network Bas...
Adaptive Real Time Data Mining Methodology for Wireless Body Area Network Bas...
 
New Hybrid Intrusion Detection System Based On Data Mining Technique to Enhan...
New Hybrid Intrusion Detection System Based On Data Mining Technique to Enhan...New Hybrid Intrusion Detection System Based On Data Mining Technique to Enhan...
New Hybrid Intrusion Detection System Based On Data Mining Technique to Enhan...
 
How to Scale from Workstation through Cloud to HPC in Cryo-EM Processing
How to Scale from Workstation through Cloud to HPC in Cryo-EM ProcessingHow to Scale from Workstation through Cloud to HPC in Cryo-EM Processing
How to Scale from Workstation through Cloud to HPC in Cryo-EM Processing
 

Destacado

Data: The Good, The Bad & The Ugly
Data: The Good, The Bad & The UglyData: The Good, The Bad & The Ugly
Data: The Good, The Bad & The UglySciBite Limited
 
SciBite - Role Of Ontologies (Pistoia Alliance Webinar)
SciBite - Role Of Ontologies (Pistoia Alliance Webinar)SciBite - Role Of Ontologies (Pistoia Alliance Webinar)
SciBite - Role Of Ontologies (Pistoia Alliance Webinar)SciBite Limited
 
Transforming Data Management and Time to Insight with Anzo Smart Data Lake®
Transforming Data Management and Time to Insight with Anzo Smart Data Lake®Transforming Data Management and Time to Insight with Anzo Smart Data Lake®
Transforming Data Management and Time to Insight with Anzo Smart Data Lake®Cambridge Semantics
 
Introduction to Anzo Unstructured
Introduction to Anzo UnstructuredIntroduction to Anzo Unstructured
Introduction to Anzo UnstructuredCambridge Semantics
 
Semantic Graph Databases: The Evolution of Relational Databases
Semantic Graph Databases: The Evolution of Relational DatabasesSemantic Graph Databases: The Evolution of Relational Databases
Semantic Graph Databases: The Evolution of Relational DatabasesCambridge Semantics
 
Applying Data Engineering and Semantic Standards to Tame the "Perfect Storm" ...
Applying Data Engineering and Semantic Standards to Tame the "Perfect Storm" ...Applying Data Engineering and Semantic Standards to Tame the "Perfect Storm" ...
Applying Data Engineering and Semantic Standards to Tame the "Perfect Storm" ...Cambridge Semantics
 
Accelerating Insight - Smart Data Lake Customer Success Stories
Accelerating Insight - Smart Data Lake Customer Success StoriesAccelerating Insight - Smart Data Lake Customer Success Stories
Accelerating Insight - Smart Data Lake Customer Success StoriesCambridge Semantics
 

Destacado (7)

Data: The Good, The Bad & The Ugly
Data: The Good, The Bad & The UglyData: The Good, The Bad & The Ugly
Data: The Good, The Bad & The Ugly
 
SciBite - Role Of Ontologies (Pistoia Alliance Webinar)
SciBite - Role Of Ontologies (Pistoia Alliance Webinar)SciBite - Role Of Ontologies (Pistoia Alliance Webinar)
SciBite - Role Of Ontologies (Pistoia Alliance Webinar)
 
Transforming Data Management and Time to Insight with Anzo Smart Data Lake®
Transforming Data Management and Time to Insight with Anzo Smart Data Lake®Transforming Data Management and Time to Insight with Anzo Smart Data Lake®
Transforming Data Management and Time to Insight with Anzo Smart Data Lake®
 
Introduction to Anzo Unstructured
Introduction to Anzo UnstructuredIntroduction to Anzo Unstructured
Introduction to Anzo Unstructured
 
Semantic Graph Databases: The Evolution of Relational Databases
Semantic Graph Databases: The Evolution of Relational DatabasesSemantic Graph Databases: The Evolution of Relational Databases
Semantic Graph Databases: The Evolution of Relational Databases
 
Applying Data Engineering and Semantic Standards to Tame the "Perfect Storm" ...
Applying Data Engineering and Semantic Standards to Tame the "Perfect Storm" ...Applying Data Engineering and Semantic Standards to Tame the "Perfect Storm" ...
Applying Data Engineering and Semantic Standards to Tame the "Perfect Storm" ...
 
Accelerating Insight - Smart Data Lake Customer Success Stories
Accelerating Insight - Smart Data Lake Customer Success StoriesAccelerating Insight - Smart Data Lake Customer Success Stories
Accelerating Insight - Smart Data Lake Customer Success Stories
 

Similar a Appistry WGDAS Presentation

The XNAT imaging informatics platform
The XNAT imaging informatics platformThe XNAT imaging informatics platform
The XNAT imaging informatics platformimgcommcall
 
SEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSIS
SEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSISSEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSIS
SEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSISIRJET Journal
 
Deep Learning Image Processing Applications in the Enterprise
Deep Learning Image Processing Applications in the EnterpriseDeep Learning Image Processing Applications in the Enterprise
Deep Learning Image Processing Applications in the EnterpriseGanesan Narayanasamy
 
Kave Salamatian, Universite de Savoie and Eiko Yoneki, University of Cambridg...
Kave Salamatian, Universite de Savoie and Eiko Yoneki, University of Cambridg...Kave Salamatian, Universite de Savoie and Eiko Yoneki, University of Cambridg...
Kave Salamatian, Universite de Savoie and Eiko Yoneki, University of Cambridg...i_scienceEU
 
Genestack Genomics Applications Platform
Genestack Genomics Applications PlatformGenestack Genomics Applications Platform
Genestack Genomics Applications Platformgenestack
 
IRJET-Implementation of Threshold based Cryptographic Technique over Cloud Co...
IRJET-Implementation of Threshold based Cryptographic Technique over Cloud Co...IRJET-Implementation of Threshold based Cryptographic Technique over Cloud Co...
IRJET-Implementation of Threshold based Cryptographic Technique over Cloud Co...IRJET Journal
 
Scalable and Repeatable Machine Learning pipelines: A key requirement for you...
Scalable and Repeatable Machine Learning pipelines: A key requirement for you...Scalable and Repeatable Machine Learning pipelines: A key requirement for you...
Scalable and Repeatable Machine Learning pipelines: A key requirement for you...All Things Open
 
White Paper: Advanced Cyber Analytics with Greenplum Database
White Paper: Advanced Cyber Analytics with Greenplum DatabaseWhite Paper: Advanced Cyber Analytics with Greenplum Database
White Paper: Advanced Cyber Analytics with Greenplum DatabaseEMC
 
Mining Big Data using Genetic Algorithm
Mining Big Data using Genetic AlgorithmMining Big Data using Genetic Algorithm
Mining Big Data using Genetic AlgorithmIRJET Journal
 
Accelerating Real-Time Analytics Insights Through Hadoop Open Source Ecosystem
Accelerating Real-Time Analytics Insights Through Hadoop Open Source EcosystemAccelerating Real-Time Analytics Insights Through Hadoop Open Source Ecosystem
Accelerating Real-Time Analytics Insights Through Hadoop Open Source EcosystemDataWorks Summit
 
As next-generation technology ratchets the price of sequen.docx
As next-generation technology ratchets the price of sequen.docxAs next-generation technology ratchets the price of sequen.docx
As next-generation technology ratchets the price of sequen.docxbob8allen25075
 
How Big Data is Reducing Costs and Improving Outcomes in Health Care
How Big Data is Reducing Costs and Improving Outcomes in Health CareHow Big Data is Reducing Costs and Improving Outcomes in Health Care
How Big Data is Reducing Costs and Improving Outcomes in Health CareCarol McDonald
 
The Science of Data Science
The Science of Data Science The Science of Data Science
The Science of Data Science James Hendler
 
JPJ1417 Data Mining With Big Data
JPJ1417   Data Mining With Big DataJPJ1417   Data Mining With Big Data
JPJ1417 Data Mining With Big Datachennaijp
 
WuXi NextCODE Scales up Genomic Sequencing on AWS (ANT210-S) - AWS re:Invent ...
WuXi NextCODE Scales up Genomic Sequencing on AWS (ANT210-S) - AWS re:Invent ...WuXi NextCODE Scales up Genomic Sequencing on AWS (ANT210-S) - AWS re:Invent ...
WuXi NextCODE Scales up Genomic Sequencing on AWS (ANT210-S) - AWS re:Invent ...Amazon Web Services
 
The FAIR data movement and 22 Feb 2023.pdf
The FAIR data movement and 22 Feb 2023.pdfThe FAIR data movement and 22 Feb 2023.pdf
The FAIR data movement and 22 Feb 2023.pdfAlan Morrison
 
IRJET - Coarse Grain Load Balance Algorithm for Detecting
IRJET - Coarse Grain Load Balance Algorithm for DetectingIRJET - Coarse Grain Load Balance Algorithm for Detecting
IRJET - Coarse Grain Load Balance Algorithm for DetectingIRJET Journal
 
Supporting a Collaborative R&D Organization with a Dynamic Big Data Solution
Supporting a Collaborative R&D Organization with a Dynamic Big Data SolutionSupporting a Collaborative R&D Organization with a Dynamic Big Data Solution
Supporting a Collaborative R&D Organization with a Dynamic Big Data SolutionSaama
 
PAACDA Comprehensive Data Corruption Detection Algorithm.docx
PAACDA Comprehensive Data Corruption Detection Algorithm.docxPAACDA Comprehensive Data Corruption Detection Algorithm.docx
PAACDA Comprehensive Data Corruption Detection Algorithm.docxShakas Technologies
 

Similar a Appistry WGDAS Presentation (20)

The XNAT imaging informatics platform
The XNAT imaging informatics platformThe XNAT imaging informatics platform
The XNAT imaging informatics platform
 
SEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSIS
SEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSISSEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSIS
SEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSIS
 
Deep Learning Image Processing Applications in the Enterprise
Deep Learning Image Processing Applications in the EnterpriseDeep Learning Image Processing Applications in the Enterprise
Deep Learning Image Processing Applications in the Enterprise
 
Kave Salamatian, Universite de Savoie and Eiko Yoneki, University of Cambridg...
Kave Salamatian, Universite de Savoie and Eiko Yoneki, University of Cambridg...Kave Salamatian, Universite de Savoie and Eiko Yoneki, University of Cambridg...
Kave Salamatian, Universite de Savoie and Eiko Yoneki, University of Cambridg...
 
Genestack Genomics Applications Platform
Genestack Genomics Applications PlatformGenestack Genomics Applications Platform
Genestack Genomics Applications Platform
 
IRJET-Implementation of Threshold based Cryptographic Technique over Cloud Co...
IRJET-Implementation of Threshold based Cryptographic Technique over Cloud Co...IRJET-Implementation of Threshold based Cryptographic Technique over Cloud Co...
IRJET-Implementation of Threshold based Cryptographic Technique over Cloud Co...
 
Scalable and Repeatable Machine Learning pipelines: A key requirement for you...
Scalable and Repeatable Machine Learning pipelines: A key requirement for you...Scalable and Repeatable Machine Learning pipelines: A key requirement for you...
Scalable and Repeatable Machine Learning pipelines: A key requirement for you...
 
White Paper: Advanced Cyber Analytics with Greenplum Database
White Paper: Advanced Cyber Analytics with Greenplum DatabaseWhite Paper: Advanced Cyber Analytics with Greenplum Database
White Paper: Advanced Cyber Analytics with Greenplum Database
 
Mining Big Data using Genetic Algorithm
Mining Big Data using Genetic AlgorithmMining Big Data using Genetic Algorithm
Mining Big Data using Genetic Algorithm
 
Accelerating Real-Time Analytics Insights Through Hadoop Open Source Ecosystem
Accelerating Real-Time Analytics Insights Through Hadoop Open Source EcosystemAccelerating Real-Time Analytics Insights Through Hadoop Open Source Ecosystem
Accelerating Real-Time Analytics Insights Through Hadoop Open Source Ecosystem
 
As next-generation technology ratchets the price of sequen.docx
As next-generation technology ratchets the price of sequen.docxAs next-generation technology ratchets the price of sequen.docx
As next-generation technology ratchets the price of sequen.docx
 
How Big Data is Reducing Costs and Improving Outcomes in Health Care
How Big Data is Reducing Costs and Improving Outcomes in Health CareHow Big Data is Reducing Costs and Improving Outcomes in Health Care
How Big Data is Reducing Costs and Improving Outcomes in Health Care
 
The Science of Data Science
The Science of Data Science The Science of Data Science
The Science of Data Science
 
JPJ1417 Data Mining With Big Data
JPJ1417   Data Mining With Big DataJPJ1417   Data Mining With Big Data
JPJ1417 Data Mining With Big Data
 
WuXi NextCODE Scales up Genomic Sequencing on AWS (ANT210-S) - AWS re:Invent ...
WuXi NextCODE Scales up Genomic Sequencing on AWS (ANT210-S) - AWS re:Invent ...WuXi NextCODE Scales up Genomic Sequencing on AWS (ANT210-S) - AWS re:Invent ...
WuXi NextCODE Scales up Genomic Sequencing on AWS (ANT210-S) - AWS re:Invent ...
 
The FAIR data movement and 22 Feb 2023.pdf
The FAIR data movement and 22 Feb 2023.pdfThe FAIR data movement and 22 Feb 2023.pdf
The FAIR data movement and 22 Feb 2023.pdf
 
IRJET - Coarse Grain Load Balance Algorithm for Detecting
IRJET - Coarse Grain Load Balance Algorithm for DetectingIRJET - Coarse Grain Load Balance Algorithm for Detecting
IRJET - Coarse Grain Load Balance Algorithm for Detecting
 
Dr. Ying Xiao: Radiation Therapy Oncology Group Bioinformatics
Dr. Ying Xiao: Radiation Therapy Oncology Group BioinformaticsDr. Ying Xiao: Radiation Therapy Oncology Group Bioinformatics
Dr. Ying Xiao: Radiation Therapy Oncology Group Bioinformatics
 
Supporting a Collaborative R&D Organization with a Dynamic Big Data Solution
Supporting a Collaborative R&D Organization with a Dynamic Big Data SolutionSupporting a Collaborative R&D Organization with a Dynamic Big Data Solution
Supporting a Collaborative R&D Organization with a Dynamic Big Data Solution
 
PAACDA Comprehensive Data Corruption Detection Algorithm.docx
PAACDA Comprehensive Data Corruption Detection Algorithm.docxPAACDA Comprehensive Data Corruption Detection Algorithm.docx
PAACDA Comprehensive Data Corruption Detection Algorithm.docx
 

Appistry WGDAS Presentation

  • 1. From Sequencer to Clinic: Managing Science and Scale Sultan Meghi, Vice President of Product Strategy World Genome Data Analysis Summit November 28, 2012
  • 2.  Challenges Along the Path from Genomics Research to Personalized Medicine  Implementing technology  Implementing science  Scaling from research to clinic  The Problem Restated… What’s the most efficient, reliable and robust way to capture my genetic data, analyze it and secure it for re-analysis and deeper interpretation in a clinical setting?  Enabling Science at Scale  Platform for big data  Analytics framework for implementing science  Flexible deployment AGENDA 1
  • 3. Target: Clinicians Mega-scale Complex Infrastructure and Patients data Pipeline costs, management Development, complexity, leveraging a and data Test & security and dynamically analysis. Deployment. compliance. expanding Accelerating the Science of Genetic Discovery for field of Researchers, Bioinformatics Specialists & Tool science. Development. Government 3rd Party Funding Payers CUSTOMER NEEDS
  • 4. “We can sequence the genome for dirt cheap, but we don’t know how to deal with the data.” Eric Green M.D.,Ph.D. Director, NHGRI “How do we avoid the pitfall of having cheap human genome sequencing but complex and expensive manual analysis to make clinical sense out of the data?” Elaine Mardis Ph.D. Director of Technology Development Source: WSJ, NYT, Genome Medicine THE GENOMICS DATA PROBLEM 3
  • 5. “Big Data” is essentially large amounts of data  Multiple sources or data formats  Unstructured or semi-structured  Difficult to put into databases and analyze Seen in other industry areas: Telecom THE BIG DATA CHALLENGE IN GENOMICS 4
  • 6. “Moving data around and storing the data is painful. It’s a huge problem for us. We’re looking at the cloud for processing options.” - Carol Rohl Ph.D., Director of Merck, Research Labs STORAGE “Datasets are so large, you have to analyze them at the same site where the data is or using mirrors. You do not want to be writing it onto a remote hard drive and move the data each time you want to analyze it.” COMPUTATION - John Monahan, Novartis Institutes for Biomedical Research “Bioinformatics tools and reference datasets change monthly, weekly and in some cases daily. This requires easy to manage application and data management platforms to keep up to date with all the changes.” APPLICATIONS - Sultan Meghji, Appistry, GigaOM 2012 Source: Appistry proprietary market research by CBT Advisors THE BIG DATA CHALLENGE IN GENOMICS 5
  • 7. CLOUD STORAGE STORAGE ANALYTICS COMPUTATION USER-FOCUSED TOOLS APPLICATIONS APPROACHES 6
  • 8. 7
  • 9. CLOUD STORAGE ANALYTICS USER-FOCUSED TOOLS WHY APPISTRY? 8
  • 10. Capabilities needed Automated Data Private Cloud Genomics Management and Storage Services (HIPAA Tightly Coupled to Compliance) Analysis Industry Tools, Data Sets and YOUR Science Massively Analytics Layer Simplifies Scalable/Reliable Fabric the Build, Test and for Algorithms, Tools and Deployment of Analytic Applications Pipelines APPISTRY’S GENOMIC SOLUTION 9
  • 11. ATCGTA TCGGCA CTAATC GCTCGG CTATAG Public Cloud Data from Sequencers 2 8 5 1 3 Open-Source 9 Algorithms 4 7 3 10 User 6 Public Gene For EachRun Data All StorageDataFTP or forRepeat5+Days3-8 Step 1: AccessData Algorithms Databases 9, 10= =steps 8: Open-Source Algorithm Infrastructure Months 7: New Gene Stored Open-Source 6: Reorganize Gene on via Update: 5: Upload algorithms + Sequence1, 3: Send AlgorithmsRepeatfortoInfo 2, Infrastructure 10: 9: Public Gene steps Data Set: Database Storage 4: Reprogramto Database to Storage Algorithms 2: Download DataData + Sequencer FedEx Stored From Databases Source: Appistry survey AYRRIS PRODUCT 10
  • 12. ATCGTA TCGGCA CTAATC GCTCGG SFTP Transfer HIPAA Compliant CTATAG Genomics Cloud Data from Appistry Private Cloud Appistry Sequencers Courier Over Annotated Results & Ayrris Pipelines HTTPS Visualizations Your Science SNPs, Indels, Rare Variants, etc Appistry Courier Consumption of Over Results by internal HTTPS Bioinformaticians and Clinicians Data Center and Researcher CLOUD WORKFLOW 11
  • 13. APPISTRY CLOUD APPISTRY APPLIANCE INSTITUTION via INTERNET  Cloud-based genomic data  On-site modular turn-key analysis and storage hardware and software  Subscription to Appistry’s secure,  Enterprise-level implementation of HIPAA compliant cloud storage private network HIPAA-enabled storage  Same access to pipeline analysis algorithms & annotations (Same Science)  Same underlying technology and efficiency BUSINESS MODEL 12
  • 14. ATCGTA TCGGCA CTAATC Regulatory GCTCGG Compliant CTATAG Genomics Cloud Appistry Private Cloud Data from Sequencers Annotated Results & Ayrris Pipelines Visualizations Your Science Data from other instruments Integrated with Integrated with Secured, Integrated Workflows, Research Data Medical Data – Data Management and Analysis Systems (Genomics, EMR, Biller/Payer Pharma) CLOUD WORKFLOW 13
  • 16. Genomic Information Decisions for prevention or early treatment Breast cancer Osteoporosis Lung cancer Heart disease Autism Leukemia ADHD Genetic disorders 15
  • 17. Thanks for Your Attention main: 314.450.5720 fax: 314.450.5722 sultan@appistry.com appistry.com 1141 South 7th St., Suite 300 St. Louis, MO 63141

Notas del editor

  1. Voice over points:Next Gen Sequencing (NGS) has provided a powerful new tool for the investigation of genomic informationThe growing number of sequencers on the market are generating a huge demand for tools for data analysisThere is a lack of fast,affordable, easy-to-use, and comprehensive bioinformatics tools in the marketAnother quote: (backup)“Data handling is now the bottleneck. It costs more to analyze a genome than to sequence a genome.” - David Haussler, Director of the Center for Biomolecular Science & Engineering at the University of California, Santa Cruz, in NYT
  2. Seen years ago in finance, logistics, geospatial & defense areas5 universal issues when dealing with “big data”StorageComputationNetwork bandwidth (movement of the data)Operational complexity Complex programming tasks