SlideShare una empresa de Scribd logo
1 de 52
Descargar para leer sin conexión
Calpont InfiniDB®
Accelerating Data Insights

Huge Data Analytics: Calpont InfiniDB
Columnar DBMS Empowers New Research
with The World’s First Searchable Genotype
Database
Strata Conference 2012


                     Calpont Proprietary and Confidential
Today’s Agenda
  •Introduction of today’s speakers
  •What is InfiniDB?
  •Announced today: InfiniDB 3
  •Huge Data Analytics: InfiniDB Empowers New
   Research with The World’s First Searchable
   Genotype Database
  •Questions
  •More information and resources


InfiniDB® Scalable. Fast. Simple.   2   Copyright © 2012 Calpont. All Rights Reserved.
Today’s Presenters

                                    Fernanda Foertter
                                    HPC Administrator / Scientific Programmer
                                    Genus plc



                                    Jim Tommaney
                                    Chief Technology Officer
                                    Calpont Corporation



InfiniDB® Scalable. Fast. Simple.                3             Copyright © 2012 Calpont. All Rights Reserved.
What is InfiniDB?
Calpont Corporation
  • Company
          o Privately held and backed              Calpont Mission
          o Offices                                 To provide a highly
                  Dallas (Headquarters)               scalable data
                  Silicon Valley                  platform that enables
                                                     analytic business
  • Business                                        decisions as timely
          o Scale-out MPP analytic database          as customers and
                                                      markets dictate.
          o MySQL Columnar + Map Reduction
          o Commercial Open Core model
  • Products
          o InfiniDB Enterprise
                  Forthcoming 4th major release
          o InfiniDB Community
                  Modified Open Source license

InfiniDB® Scalable. Fast. Simple.              5    Copyright © 2012 Calpont. All Rights Reserved.
Innovative Companies Turning to InfiniDB




InfiniDB® Scalable. Fast. Simple.   6        Copyright © 2012 Calpont. All Rights Reserved.
What is InfiniDB?
                                           ®




                Scalable            Fast               Simple




InfiniDB® Scalable. Fast. Simple.   7      Copyright © 2012 Calpont. All Rights Reserved.
What is InfiniDB?
                                       Big Data
                                    Analytics Engine


  Full-Featured                                        Familiar MySQL
        SQL                                            Look and Feel
                                    InfiniDB


                          Game Changing Performance
InfiniDB® Scalable. Fast. Simple.          8            Copyright © 2012 Calpont. All Rights Reserved.
Focus on Analytics Workloads




  InfiniDB is …
        Engineered for large queries
        Engineered for ad-hoc flexibility
        Analytics, not OLTP
  Unique combination of columnar + map-reduce

InfiniDB® Scalable. Fast. Simple.   9   Copyright © 2012 Calpont. All Rights Reserved.
What is InfiniDB?
                                           ®




                Scalable            Fast               Simple




InfiniDB® Scalable. Fast. Simple.   10     Copyright © 2012 Calpont. All Rights Reserved.
InfiniDB – Two Tier Architecture



                                           or …



 Purpose built for big data analytics.
    • User Module (UM)                                          Single Server
                Understands SQL.
    • Performance Module (PM)
                Operates on data blocks.


InfiniDB® Scalable. Fast. Simple.   11            Copyright © 2012 Calpont. All Rights Reserved.
InfiniDB Performance Foundations
                                           ®




                The Power and Scale of Map-Reduce
                                plus
                   Transformational I/O Efficiency



InfiniDB® Scalable. Fast. Simple.   12     Copyright © 2012 Calpont. All Rights Reserved.
Power and Scalability of Map-Reduce


                                    Map ↓↓↓↓↓   Reduce ↑↑↑↑↑




  SQL Operations are mapped to Performance Module threads
     • Parallel/Distributed Data Access
     • Parallel/Distributed Joins (Inner, Outer)
     • Parallel/Distributed Sub-queries (From, Where, Select)
     • Parallel/Distributed Group By, Distinct, and Aggregation
     • Extensible with Parallel/Distributed User Defined Functions
  Results are returned to User Module in Reduce Phase

InfiniDB® Scalable. Fast. Simple.                13            Copyright © 2012 Calpont. All Rights Reserved.
Power and Scalability of Map-Reduce


                                    Map ↓↓↓↓↓   Reduce ↑↑↑↑↑




      InfiniDB is not:
                … a hadoop style map-reduce framework.




InfiniDB® Scalable. Fast. Simple.                14            Copyright © 2012 Calpont. All Rights Reserved.
Power and Scalability of Map-Reduce


                                    Map ↓↓↓↓↓   Reduce ↑↑↑↑↑




      InfiniDB is:
           … custom built and highly optimized map-
           reduce framework for queries.


InfiniDB® Scalable. Fast. Simple.                15            Copyright © 2012 Calpont. All Rights Reserved.
Transformational I/O Efficiency

          Techniques to Avoid Unnecessary I/O

          oVertical Partitioning: read only the columns required
          oHorizontal Partition: focus on the rows required
          oJust-in-time materialization




InfiniDB® Scalable. Fast. Simple.   16           Copyright © 2012 Calpont. All Rights Reserved.
Transformational I/O Efficiency

          Techniques for Efficient I/O

          oColumnar compression reduces I/O from disk
          oGlobal data buffer cache can reduce disk I/O
          oReal-time decompression accelerates reads from disk
          oAvoidance of Random I/O




InfiniDB® Scalable. Fast. Simple.   17         Copyright © 2012 Calpont. All Rights Reserved.
Simple - Automatic Everything


  •     Vertical Partitioning
  •     Horizontal Partitioning                    Simple
  •     Compression
  •     Compression Algorithm Selection
  •     Distribution of data across disk resources
  •     Distribution of work across CPU resources




InfiniDB® Scalable. Fast. Simple.   18            Copyright © 2012 Calpont. All Rights Reserved.
InfiniDB
                                           ®




                Scalable            Fast               Simple




InfiniDB® Scalable. Fast. Simple.   19     Copyright © 2012 Calpont. All Rights Reserved.
InfiniDB 3 Announced Today
InfiniDB 3: It is Now Possible...




                                    InfiniDB
                                        3




                                       21


InfiniDB® Scalable. Fast. Simple.              Copyright © 2012 Calpont. All Rights Reserved.
Today’s Presenters

                                    Fernanda Foertter
                                    HPC Administrator / Scientific Programmer
                                    Genus plc



                                    Jim Tommaney
                                    Chief Technology Officer
                                    Calpont Corporation



InfiniDB® Scalable. Fast. Simple.                22            Copyright © 2012 Calpont. All Rights Reserved.
Where I Work




InfiniDB® Scalable. Fast. Simple.   Copyright © 2012 Calpont. All Rights Reserved.
Genetic Evaluation




                                    Breeding Values
InfiniDB® Scalable. Fast. Simple.                     Copyright © 2012 Calpont. All Rights Reserved.
Phenotype: Meat Quality




InfiniDB® Scalable. Fast. Simple.   Copyright © 2012 Calpont. All Rights Reserved.
Selection for Lean Growth

                             1980   2005




InfiniDB® Scalable. Fast. Simple.   Copyright © 2012 Calpont. All Rights Reserved.
Selection for Lean Growth

                             1980   2005




InfiniDB® Scalable. Fast. Simple.   Copyright © 2012 Calpont. All Rights Reserved.
Halothane Gene (1991)
  • Gene is associated
          o High carcass yield                          (NN)
          o Stress triggers
            hyperthermia

          o Poor meat quality

                                                     (Nn/nn)
                                    X

InfiniDB® Scalable. Fast. Simple.       Copyright © 2012 Calpont. All Rights Reserved.
DNA Marker Use


                                                                                                         2004
                                                                      1999                     Large-scale SNP discovery
                                                                  FUT1 & PRKAG3
          1991                 1994                      1998                                     2003
          HAL                   ESR                    RN & MC4R                                   MIS




          1991   1992   1993   1994   1995   1996   1997   1998    1999   2000   2001   2002   2003   2004   2005   2006   2007   2008   2009

   1990                                                                                                  Large-scale SNP discovery,              2009
                                                                                                               genome scans,
                                                                                                                 sequencing
                                                1991 - 2002
                                             Single genes, QTL
                                              Candidate genes




InfiniDB® Scalable. Fast. Simple.                                                                             Copyright © 2012 Calpont. All Rights Reserved.
Sudden Data Growth

                        70000              Porcine SNP Panel Density
                        60000
            Number of SNPs



                        50000

                        40000

                        30000

                        20000

                        10000

                             0
                                    2004    2005   2006    2007   2008             2009




InfiniDB® Scalable. Fast. Simple.                                  Copyright © 2012 Calpont. All Rights Reserved.
Sudden Data Growth

                                                 Sample Collection
16,000,000                                             3,500,000
                          Animals (cumulative)                       Tissue(cumulative)
14,000,000
                                                       3,000,000

12,000,000
                                                       2,500,000
10,000,000
                                                       2,000,000
 8,000,000

 6,000,000                                             1,500,000


 4,000,000                                             1,000,000

 2,000,000
                                                        500,000
         0
             1991
             1992
             1993
             1994
             1995
             1996
             1997
             1998
             1999
             2000
             2001
             2002
             2003
             2004
             2005
             2006
             2007
             2008
             2009
             2010
             2011



                                                              0



                                                                   1991
                                                                   1992
                                                                   1993
                                                                   1994
                                                                   1995
                                                                   1996
                                                                   1997
                                                                   1998
                                                                   1999
                                                                   2000
                                                                   2001
                                                                   2002
                                                                   2003
                                                                   2004
                                                                   2005
                                                                   2006
                                                                   2007
                                                                   2008
                                                                   2009
                                                                   2010
                                                                   2011
                                     Year
                                                                             Year




 InfiniDB® Scalable. Fast. Simple.                                      Copyright © 2012 Calpont. All Rights Reserved.
Genetic Evaluation




                                    EBV                    economic weights
                                                                       Lean Yield
                                                                       Meat Quality
                                                                       Robustness
                                                                       Feed efficiency
                                                                       Etc


                   Index = a1 × EBV1 + a2 × EBV2 + . . .

InfiniDB® Scalable. Fast. Simple.               Copyright © 2012 Calpont. All Rights Reserved.
Data Pipeline




InfiniDB® Scalable. Fast. Simple.   Copyright © 2012 Calpont. All Rights Reserved.
Genomic Data Deluge




InfiniDB® Scalable. Fast. Simple.   Copyright © 2012 Calpont. All Rights Reserved.
Project: Genotyping DB

      The Need                         Other Considerations
      • Accumulating SNP chip data     • Store large data…BIG data
      • Difficulty searching through   • Scalable
      • Next Gen Sequencing            • Alternative to Oracle
      • Cheaper SNP chips              • Minimally impact
      • LOTS of animals                  infrastructure
      • Other projects needed the      • Easy for scientists to use
        data




InfiniDB® Scalable. Fast. Simple.                    Copyright © 2012 Calpont. All Rights Reserved.
What Do Vendors Provide for Genotype
  Data?




                                          nothing


InfiniDB® Scalable. Fast. Simple.   Copyright © 2012 Calpont. All Rights Reserved.
Think Outside the (Vendor’s) Box…




InfiniDB® Scalable. Fast. Simple.   Copyright © 2012 Calpont. All Rights Reserved.
All Databases are Not Created Equal




InfiniDB® Scalable. Fast. Simple.   Copyright © 2012 Calpont. All Rights Reserved.
All Vehicles are Not Created Equal




InfiniDB® Scalable. Fast. Simple.   Copyright © 2012 Calpont. All Rights Reserved.
Genomic Data




InfiniDB® Scalable. Fast. Simple.   Copyright © 2012 Calpont. All Rights Reserved.
SNP Data

          Animal ID                 SNP1   SNP2   SNP3    …                   SNP65K

                 1                   0      1      2      1                         2

                 2                   1      1      0      0                         0

                 3

                 4

                 5                   1      2      2      0                         2

                …

              XXXX




InfiniDB® Scalable. Fast. Simple.                        Copyright © 2012 Calpont. All Rights Reserved.
Single Research Cohort
       What about selection and cohort comparisons?




InfiniDB® Scalable. Fast. Simple.       Copyright © 2012 Calpont. All Rights Reserved.
Column Bases Make More Sense




InfiniDB® Scalable. Fast. Simple.   Copyright © 2012 Calpont. All Rights Reserved.
InfiniDB: Parallel Columnar DB




                                    2

                                    3
                                    7

                                    9




InfiniDB® Scalable. Fast. Simple.   Copyright © 2012 Calpont. All Rights Reserved.
Complicated Searches are Faster!




InfiniDB® Scalable. Fast. Simple.   Copyright © 2012 Calpont. All Rights Reserved.
Scales for a Fraction of the Cost


      Compression                   Up 75%

      Speed vs RDBMS                15X faster

      Scalability                   100’s TB, parallel queries/ingest

      Cost vs Oracle                25%




InfiniDB® Scalable. Fast. Simple.                        Copyright © 2012 Calpont. All Rights Reserved.
Future Projects: Imputation


                              $150          $150




                                    $15   $15




InfiniDB® Scalable. Fast. Simple.           Copyright © 2012 Calpont. All Rights Reserved.
Caution: Data multiplies in a BIG way




InfiniDB® Scalable. Fast. Simple.   Copyright © 2012 Calpont. All Rights Reserved.
Conclusions
  • Helps to have a deep understanding of the scientific
    problems being solved
  • Have a good understanding of the data access pattern
  • Tool should solve 80% of the highest use patterns
  • Use combination of software, hardware knowledge to
    improve performance
  • Think “out of the vendor box”, especially where
    research is cutting edge
  • Take the lead to show new tools users may not even be
    aware they want/ need

InfiniDB® Scalable. Fast. Simple.          Copyright © 2012 Calpont. All Rights Reserved.
Questions




InfiniDB® Scalable. Fast. Simple.   Copyright © 2012 Calpont. All Rights Reserved.
More Information on InfiniDB
  Visit us at:
     o www.Calpont.com

          o www.InfiniDB.org

          o Visit Booth #414 to register to win an iPad 3




InfiniDB® Scalable. Fast. Simple.            Copyright © 2012 Calpont. All Rights Reserved.
InfiniDB® Scalable. Fast. Simple.
                                    Enter for a Chance to Win an iPad 3
                                             52            Copyright © 2012 Calpont. All Rights Reserved.

Más contenido relacionado

Último

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 

Último (20)

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 

Destacado

AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 

Destacado (20)

AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 

Huge Data Analytics: Calpont InfiniDB Columnar DBMS Empowers New Research with The World’s First Searchable Genotype Database

  • 1. Calpont InfiniDB® Accelerating Data Insights Huge Data Analytics: Calpont InfiniDB Columnar DBMS Empowers New Research with The World’s First Searchable Genotype Database Strata Conference 2012 Calpont Proprietary and Confidential
  • 2. Today’s Agenda •Introduction of today’s speakers •What is InfiniDB? •Announced today: InfiniDB 3 •Huge Data Analytics: InfiniDB Empowers New Research with The World’s First Searchable Genotype Database •Questions •More information and resources InfiniDB® Scalable. Fast. Simple. 2 Copyright © 2012 Calpont. All Rights Reserved.
  • 3. Today’s Presenters Fernanda Foertter HPC Administrator / Scientific Programmer Genus plc Jim Tommaney Chief Technology Officer Calpont Corporation InfiniDB® Scalable. Fast. Simple. 3 Copyright © 2012 Calpont. All Rights Reserved.
  • 5. Calpont Corporation • Company o Privately held and backed Calpont Mission o Offices To provide a highly  Dallas (Headquarters) scalable data  Silicon Valley platform that enables analytic business • Business decisions as timely o Scale-out MPP analytic database as customers and markets dictate. o MySQL Columnar + Map Reduction o Commercial Open Core model • Products o InfiniDB Enterprise  Forthcoming 4th major release o InfiniDB Community  Modified Open Source license InfiniDB® Scalable. Fast. Simple. 5 Copyright © 2012 Calpont. All Rights Reserved.
  • 6. Innovative Companies Turning to InfiniDB InfiniDB® Scalable. Fast. Simple. 6 Copyright © 2012 Calpont. All Rights Reserved.
  • 7. What is InfiniDB? ® Scalable Fast Simple InfiniDB® Scalable. Fast. Simple. 7 Copyright © 2012 Calpont. All Rights Reserved.
  • 8. What is InfiniDB? Big Data Analytics Engine Full-Featured Familiar MySQL SQL Look and Feel InfiniDB Game Changing Performance InfiniDB® Scalable. Fast. Simple. 8 Copyright © 2012 Calpont. All Rights Reserved.
  • 9. Focus on Analytics Workloads InfiniDB is … Engineered for large queries Engineered for ad-hoc flexibility Analytics, not OLTP Unique combination of columnar + map-reduce InfiniDB® Scalable. Fast. Simple. 9 Copyright © 2012 Calpont. All Rights Reserved.
  • 10. What is InfiniDB? ® Scalable Fast Simple InfiniDB® Scalable. Fast. Simple. 10 Copyright © 2012 Calpont. All Rights Reserved.
  • 11. InfiniDB – Two Tier Architecture or … Purpose built for big data analytics. • User Module (UM) Single Server Understands SQL. • Performance Module (PM) Operates on data blocks. InfiniDB® Scalable. Fast. Simple. 11 Copyright © 2012 Calpont. All Rights Reserved.
  • 12. InfiniDB Performance Foundations ® The Power and Scale of Map-Reduce plus Transformational I/O Efficiency InfiniDB® Scalable. Fast. Simple. 12 Copyright © 2012 Calpont. All Rights Reserved.
  • 13. Power and Scalability of Map-Reduce Map ↓↓↓↓↓ Reduce ↑↑↑↑↑ SQL Operations are mapped to Performance Module threads • Parallel/Distributed Data Access • Parallel/Distributed Joins (Inner, Outer) • Parallel/Distributed Sub-queries (From, Where, Select) • Parallel/Distributed Group By, Distinct, and Aggregation • Extensible with Parallel/Distributed User Defined Functions Results are returned to User Module in Reduce Phase InfiniDB® Scalable. Fast. Simple. 13 Copyright © 2012 Calpont. All Rights Reserved.
  • 14. Power and Scalability of Map-Reduce Map ↓↓↓↓↓ Reduce ↑↑↑↑↑ InfiniDB is not: … a hadoop style map-reduce framework. InfiniDB® Scalable. Fast. Simple. 14 Copyright © 2012 Calpont. All Rights Reserved.
  • 15. Power and Scalability of Map-Reduce Map ↓↓↓↓↓ Reduce ↑↑↑↑↑ InfiniDB is: … custom built and highly optimized map- reduce framework for queries. InfiniDB® Scalable. Fast. Simple. 15 Copyright © 2012 Calpont. All Rights Reserved.
  • 16. Transformational I/O Efficiency Techniques to Avoid Unnecessary I/O oVertical Partitioning: read only the columns required oHorizontal Partition: focus on the rows required oJust-in-time materialization InfiniDB® Scalable. Fast. Simple. 16 Copyright © 2012 Calpont. All Rights Reserved.
  • 17. Transformational I/O Efficiency Techniques for Efficient I/O oColumnar compression reduces I/O from disk oGlobal data buffer cache can reduce disk I/O oReal-time decompression accelerates reads from disk oAvoidance of Random I/O InfiniDB® Scalable. Fast. Simple. 17 Copyright © 2012 Calpont. All Rights Reserved.
  • 18. Simple - Automatic Everything • Vertical Partitioning • Horizontal Partitioning Simple • Compression • Compression Algorithm Selection • Distribution of data across disk resources • Distribution of work across CPU resources InfiniDB® Scalable. Fast. Simple. 18 Copyright © 2012 Calpont. All Rights Reserved.
  • 19. InfiniDB ® Scalable Fast Simple InfiniDB® Scalable. Fast. Simple. 19 Copyright © 2012 Calpont. All Rights Reserved.
  • 21. InfiniDB 3: It is Now Possible... InfiniDB 3 21 InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
  • 22. Today’s Presenters Fernanda Foertter HPC Administrator / Scientific Programmer Genus plc Jim Tommaney Chief Technology Officer Calpont Corporation InfiniDB® Scalable. Fast. Simple. 22 Copyright © 2012 Calpont. All Rights Reserved.
  • 23. Where I Work InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
  • 24. Genetic Evaluation Breeding Values InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
  • 25. Phenotype: Meat Quality InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
  • 26. Selection for Lean Growth 1980 2005 InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
  • 27. Selection for Lean Growth 1980 2005 InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
  • 28. Halothane Gene (1991) • Gene is associated o High carcass yield (NN) o Stress triggers hyperthermia o Poor meat quality (Nn/nn) X InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
  • 29. DNA Marker Use 2004 1999 Large-scale SNP discovery FUT1 & PRKAG3 1991 1994 1998 2003 HAL ESR RN & MC4R MIS 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 1990 Large-scale SNP discovery, 2009 genome scans, sequencing 1991 - 2002 Single genes, QTL Candidate genes InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
  • 30. Sudden Data Growth 70000 Porcine SNP Panel Density 60000 Number of SNPs 50000 40000 30000 20000 10000 0 2004 2005 2006 2007 2008 2009 InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
  • 31. Sudden Data Growth Sample Collection 16,000,000 3,500,000 Animals (cumulative) Tissue(cumulative) 14,000,000 3,000,000 12,000,000 2,500,000 10,000,000 2,000,000 8,000,000 6,000,000 1,500,000 4,000,000 1,000,000 2,000,000 500,000 0 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 0 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 Year Year InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
  • 32. Genetic Evaluation EBV economic weights Lean Yield Meat Quality Robustness Feed efficiency Etc Index = a1 × EBV1 + a2 × EBV2 + . . . InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
  • 33. Data Pipeline InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
  • 34. Genomic Data Deluge InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
  • 35. Project: Genotyping DB The Need Other Considerations • Accumulating SNP chip data • Store large data…BIG data • Difficulty searching through • Scalable • Next Gen Sequencing • Alternative to Oracle • Cheaper SNP chips • Minimally impact • LOTS of animals infrastructure • Other projects needed the • Easy for scientists to use data InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
  • 36. What Do Vendors Provide for Genotype Data? nothing InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
  • 37. Think Outside the (Vendor’s) Box… InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
  • 38. All Databases are Not Created Equal InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
  • 39. All Vehicles are Not Created Equal InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
  • 40. Genomic Data InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
  • 41. SNP Data Animal ID SNP1 SNP2 SNP3 … SNP65K 1 0 1 2 1 2 2 1 1 0 0 0 3 4 5 1 2 2 0 2 … XXXX InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
  • 42. Single Research Cohort What about selection and cohort comparisons? InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
  • 43. Column Bases Make More Sense InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
  • 44. InfiniDB: Parallel Columnar DB 2 3 7 9 InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
  • 45. Complicated Searches are Faster! InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
  • 46. Scales for a Fraction of the Cost Compression Up 75% Speed vs RDBMS 15X faster Scalability 100’s TB, parallel queries/ingest Cost vs Oracle 25% InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
  • 47. Future Projects: Imputation $150 $150 $15 $15 InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
  • 48. Caution: Data multiplies in a BIG way InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
  • 49. Conclusions • Helps to have a deep understanding of the scientific problems being solved • Have a good understanding of the data access pattern • Tool should solve 80% of the highest use patterns • Use combination of software, hardware knowledge to improve performance • Think “out of the vendor box”, especially where research is cutting edge • Take the lead to show new tools users may not even be aware they want/ need InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
  • 50. Questions InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
  • 51. More Information on InfiniDB Visit us at: o www.Calpont.com o www.InfiniDB.org o Visit Booth #414 to register to win an iPad 3 InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
  • 52. InfiniDB® Scalable. Fast. Simple. Enter for a Chance to Win an iPad 3 52 Copyright © 2012 Calpont. All Rights Reserved.