SlideShare una empresa de Scribd logo
1 de 47
VoltDB presents


                Stonebraker Live!
Navigating the Database Universe
Co-founder and Chief Strategy Officer

              SCOTT JARR
Agenda
• The (proper) design of DBMSs
   – Presented by Dr. Michael Stonebraker, Co-founder

• The database universe
   – Presented by Scott Jarr, Co-founder and Chief Strategy Officer

• Introducing VoltDB 3.0
   – Presented by Mark Hydar, VP of Market Technology and Strategy
We Believe…

• “Big Data” is a rare, transformative market
• Velocity is becoming the cornerstone
• Specialized databases (working together) are
  the answer
• Products must provide tangible customer
  value... Fast
Dr. Michael Stonebraker

THE (PROPER) DESIGN
        OF THE DBMS
Lessons from 40 Years of Database Design
1.   Get the user interaction right
     – Bet on a small number of easy-to-



2.
       understand constructs
     – Plus standards

     Get the implementation right
                                               “   Those who don’t learn
                                                   from history are
     – Bet on a small number of easy-to-
       understand constructs
                                                   destined to repeat it.
                                                             -Winston Churchill   ”
3.   One size does not fit all
     – At least not if you want fast, big or
       complex
#1: Get the User Interaction Right

       Historical Lesson: RDBMS vs. CODASYL vs. OODB

Winner: RDBMS           Loser: CODASYL                             Loser: OODBs
• Simple data model     •   Complicated data model             •   Complex data model
                            (records; participate in “sets”;       (hierarchical
  (tables)                  set has one owner                      records, pointers, sets, ar
• Simple access             and, perhaps, many
                                                                   rays, etc.)
                            members, etc.)
  language (SQL)                                               •   Complex access
                        •   Messy access language (sea
• ACID (transactions)       of “cursors”; some -- but not          language
                            all -- move on every                   (navigation, through this
• Standards (SQL)           command, navigation                    sea)
                            programming)
                                                               •   No standards
Interaction Take Away − Simple is Good

• ACID was easy for people to understand

• SQL provided a standard, high-level language and
  made people productive (transportable skills)
#2: Get the Implementation Right
• Leverage a few simple ideas: Early relational implementations




                                                                          Historical Winners
    – System R storage system dropped links
    – Views (protection, schema modification, performance)
    – Cost-based optimizer
• Leverage a few simple ideas: Postgres
    – User-defined data types and functions (adopted by most everybody)
    – Rules/triggers
    – No-overwrite storage
• Leverage a few simple ideas: Vertica
   – Store data by column
    – Compressed up the ging gong
    – Parallel load without compromising ACID
#3: One Size Does NOT Fit All
• OSFA is an old technology with hundreds
  of bags hanging off it
• It breaks 100% of the time when under
                                             “   …specialized systems
                                                 can each be a factor of
  load                                           50 faster than the
• Load = size or speed or complexity             single ‘one size fits all’
• Load is increasing at a startling rate         system…A factor of 50
                                                 is nothing to sneeze at.
• Purpose-built will exceed by 10x to 100x
• History has not been completely written
  yet…but let’s look at VoltDB as an
                                                       -My Top 10 Assertions About
                                                           Data Warehouses, 2010
                                                                                     ”
  example
Example: VoltDB
• Get the interface right
   – SQL
   – ACID

• Implementation: Leverage a few simple ideas
   – Main memory
   – Stored procedures
   – Deterministic scheduling

• Specialization
   – OLTP focus allowed for above implementation choices
Proving the Theory
                                    Useful Work
• Challenge: OLTP                       4%

  performance
                                                  Recovery 24%
                          Latching 24%
  – TPC-C CPU cycles
                                                   Buffer Pool 24%
  – On the Shore DBMS       Locking 24%
    prototype

  – Elephants should be
    similar
Single Threaded
• Gets rid of the latching problem
• What about Multicore?
   – Divide the memory on an N-core node so it looks like N single-core nodes
   – Which are single threaded…
Implementation Construct #1: Main Memory
• Main memory format for data
    – Disk format gets you buffer pool overhead
• What happens if data doesn’t fit?
    – Return to disk-buffer pool architecture (slow)
    – Anti-caching
        • Main memory format for data
        • When memory fills up, then bundle together elderly tuples and write them out
        • Run a transaction in “sleuth mode”; find the required records and move to main
          memory (and pin)
        • Run Xact normally
Implementation Construct #2: Stored Procedures

• Round trip to the DBMS is expensive
   – Do it once per transaction
   – Not once per command
   – Or even once per cursor move
• Ad-hoc queries supported
   – Turn them into dynamic stored procedures
Implementation Construct #3: Deterministic Scheduling

• Transactions are ordered and run to completion
   – No locking
• Active-active replication (HA)
   – Run transaction at all replicas – in the same pre-determined order
• What about a cluster-wide power failure?
   – Asyn checkpointing
   – With a command log
   – Wildly faster than data logging
Result of Design Principles: VoltDB Example

• Good interface decisions – made developers more productive
   – SQL & ACID

• Leveraging a few simple implementation ideas – made
  VoltDB wicked fast
   – Main memory
   – Stored procedures
   – Deterministic scheduling
Proving the Theory

• Answer: OLTP performance
  – 3 million transactions per second
                                        “   …we are heading
                                            toward a world with at
                                            least 5 (and probably
  – 7x Cassandra
                                            more) specialized
  – 15 million SQL statements per           engines and the death
    second
                                            of the ‘one size fits all’
  – 100,000+ transactions per               legacy systems.
    commodity server
                                                                   ”
                                                  -The End of an Architectural
                                                  Era (It’s Time for a Complete
                                                                 Rewrite), 2007
Scott Jarr

THE DATABASE UNIVERSE
Technology Meets the Market
Believe
   –   “Big Data” is a rare, transformative market
   –   Velocity is becoming the cornerstone
   –   Specialized databases (working together) are the answer
   –   Products must provide tangible customer value… Fast

Observations
   – Noisy, crowded and new – kinda like Christmas shopping at the mall
   – Everyone wants to understand where the pieces fit
   – Analysts build maps on technology NOT use cases

What we need is…
Data Value Chain




                                                 Age of Data

     Interactive         Real-time Analytics         Record Lookup          Historical Analytics       Exploratory Analytics

     Milliseconds        Hundredths of seconds         Second(s)                  Minutes                      Hours

•   Place trade      •     Calculate risk        •     Retrieve click   •      Backtest algo       •     Algo discovery
•   Serve ad         •     Leaderboard                 stream           •      BI                  •     Log analysis
•   Enrich stream    •     Aggregate             •     Show orders      •      Daily reports       •     Fraud pattern match
•   Examine packet   •     Count
•   Approve trans.
Data Value Chain
            Value of Individual                                                                 Aggregate
                Data Item                                                                       Data Value




                                                                                                                                Data Value
                                                  Age of Data

     Interactive          Real-time Analytics         Record Lookup          Historical Analytics       Exploratory Analytics

     Milliseconds         Hundredths of seconds         Second(s)                  Minutes                      Hours

•   Place trade       •     Calculate risk        •     Retrieve click   •      Backtest algo       •     Algo discovery
•   Serve ad          •     Leaderboard                 stream           •      BI                  •     Log analysis
•   Enrich stream     •     Aggregate             •     Show orders      •      Daily reports       •     Fraud pattern match
•   Examine packet    •     Count
•   Approve trans.
The Database Universe
 Fast
 Complex
 Large
                               Value of Individual Data Item                               Aggregate Data Value
      Application Complexity




                                                                                                                          Data Value
                                                         Traditional RDBMS
Simple Slow
Small
                               Transactional                                                                Analytic
                                                                                                            Exploratory
                                Interactive    Real-time Analytics   Record Lookup   Historical Analytics
                                                                                                             Analytics
The Database Universe
 Fast
 Complex
 Large
                               Value of Individual Data Item                                Aggregate Data Value
      Application Complexity




                                                                                                                             Data Value
                                              Velocity                                                  Hadoop, etc.
                                                                         NoSQL
                                                                                            Data
                                     NewSQL                                               Warehouse
                                                          Traditional RDBMS
Simple Slow
Small
                               Transactional                                                                   Analytic
                                                                                                               Exploratory
                                Interactive     Real-time Analytics   Record Lookup   Historical Analytics
                                                                                                                Analytics
logins trades authorizations clicks
      sensors orders impressions
                                      Closed-loop Big Data

 Interactive & Real-time Analytics



  Historical Reports & Analytics



      Exploratory Analytics
logins trades authorizations clicks
                  sensors orders impressions
                                                  Closed-loop Big Data
                                                  • Make the most
             Interactive & Real-time Analytics      informed decision
                                                    every time there is an
                                                    interaction

                                                  • Real-time decisions
              Historical Reports & Analytics        are informed by
Knowledge                                           operational analytics
                                                    and past knowledge

                  Exploratory Analytics
The Velocity Use Case
What’s it look like?
    –   High throughput, relentless data feeds
    –   Fast decisions on high-value data
    –   Real-time, operational analytics present immediate visibility

What’s the big deal?
    –   Batch visibility converts to real time = immediate business impact
    –   Decisions made at time of event = higher impact decisions with immediate returns

    –   Ability to ingest and manage massive amounts of data = business differentiation and disruption
Mark Hydar

HELLO 3.0!
Introducing VoltDB 3.0

• Available now!
   – Both commercial and open source offerings
   – www.voltdb.com/downloads
Introducing VoltDB 3.0
• Key improvements
   – Even faster
   – Easier to build high-velocity applications
   – Expanded reach across developers and applications
   – Extensible to integrate with existing data infrastructure
Latency and Throughput, 50-50 Read/Write Workload
                                    VoltDB 3.0 vs. v2.8.4.1
                              Key/Value 50/50 read/write workload
                     16
                                      3 Node, K=1 Cluster


Latency and Throughput, 50-
                     14

                     12
   Latency (ms)




  50 Read/Write Workload
                     10

                      8
                                                                          3.0
                                                                          2.8.4.1



                      6

                      4

                      2

                      0
            -50000        0   50000   100000   150000   200000   250000         300000
                                               TPS
Read/Write Workload Latency/Throughput
                       9                             VoltDB 3.0
                                       Key/Value various read/write workload
                       8
                                               3 Node, K=1 Cluster
   Avg. Latency (ms)




Read/Write Workload    7

                       6

                       5
                                                                                              10% read/90% write

                                                                                              50% read/50% write




Latency/Throughput
                                                                                              90% read/10% write
                       4

                       3

                       2

                       1

                       0
 -50000                    0   50000    100000   150000   200000   250000   300000   350000


                                                     TPS
Faster: Ad Hoc SQL Performance

• Conversational SQL

  Faster: Ad Hoc SQL
• Thousands to 10,000+ ad hoc SQL transactions/second
• Single or multiple (batch) SQL statement transaction

     Performance
Easier Development: New SQL Support

• SQL LIKE and NOT LIKE

Easier Development:
• UNION
• Column Functions

 New SQL Support
• Counting function (leaderboard ranking queries)
• Ability to define index using column functions
Easier Development: JSON Support

• JSON values stored in a varchar column

Easier Development:
• Field() column function
• Indexing on JSON elements

   JSON Support
   CREATE INDEX session_site_moderator
       ON user_session_table (field(json_data, 'site'),
                   field(json_data, 'moderator'), username);

• New JSON sample in kit
Easier Development: Online Operations

• Ability to re-join a failed node to cluster with no impact to
  existing operations
Easier Development:
• Online schema update
• No service window
 Online Operations
Easier Development: Streamlined Development

• Elimination of project.xml
• VoltDB-specific configuration now defined in DDL
  Easier Development:
• Defaulting of deployment.xml

Streamlined Development
• New Volt Compiler CLI:
      voltdb compile
Expanded Reach: Cloud-Friendly

• Reduce impact of variable node performance and latency

     Expanded Reach:
• Elimination of strict NTP configuration
• Scales to large # of nodes

       Cloud-Friendly
Integration: High-Performance Export

• Parallelized export

    Integration: High-
• New connectors: JDBC, Netezza, Vertica



   Performance Export
Integration: Client Library Updates

• New PHP Client

     Integration: Client
• Node.js client v1.0
• Go Client

       Library Updates
• Coming soon: updated Erlang client


                                       http://golang.org
Other Notable New Features
• Explain command
• CSV loader utility
           Other Notable
• CSV snapshots
• New Administration CLI: voltadmin

           New Features
   – voltadmin save
   – voltadmin restore
   – voltadmin pause
   – voltadmin resume
   – voltadmin shutdown
More Samples Available for Download



More Samples Available
    for Download              http://voltdb.com/comm
                                 unity/volt-labs.php
Volt University
• Portfolio of instructional content, classes, tools, and other
  resources to help them built applications quickly
• Curriculum and supporting material range from beginner to
  advanced
           Volt University
• Three types of instruction:
   – Volt University Online
   – Volt University Classroom
   – Volt Vanguard Certification
Summary: VoltDB v3.0 Features
• Even faster
• Easier to build high-velocity applications


        VoltDB v3.0
• Expanded reach across developers and applications
• Extensible to integrate with existing data infrastructure
• Volt Labs
• Volt University
DOWNLOAD 3.0
  Imagine the
      at
  Possibilities
 www.voltdb.com
More Information?
                               E-mail
                           info@voltdb.com

                        Visit our forums

  More Information?
                  http://community.voltdb.com/forum

        Read the VoltDB “Getting Started Guide”
         http://community.voltdb.com/docs/GettingStarted/index

                               Follow
                         @VoltDB on Twitter
QUESTIONS?
THANK YOU

Más contenido relacionado

La actualidad más candente

Storage Systems For Scalable systems
Storage Systems For Scalable systemsStorage Systems For Scalable systems
Storage Systems For Scalable systems
elliando dias
 
State of Cassandra 2012
State of Cassandra 2012State of Cassandra 2012
State of Cassandra 2012
jbellis
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfs
TrendProgContest13
 

La actualidad más candente (19)

Mongo db model relationships with documents
Mongo db model relationships with documentsMongo db model relationships with documents
Mongo db model relationships with documents
 
MyCassandra: A Cloud Storage Supporting both Read Heavy and Write Heavy Workl...
MyCassandra: A Cloud Storage Supporting both Read Heavy and Write Heavy Workl...MyCassandra: A Cloud Storage Supporting both Read Heavy and Write Heavy Workl...
MyCassandra: A Cloud Storage Supporting both Read Heavy and Write Heavy Workl...
 
Scaling SQL and NoSQL Databases in the Cloud
Scaling SQL and NoSQL Databases in the Cloud Scaling SQL and NoSQL Databases in the Cloud
Scaling SQL and NoSQL Databases in the Cloud
 
Mongo db groundup-0-nosql-intro-syedawasekhirni
Mongo db groundup-0-nosql-intro-syedawasekhirniMongo db groundup-0-nosql-intro-syedawasekhirni
Mongo db groundup-0-nosql-intro-syedawasekhirni
 
Oracle strategy for_information_management
Oracle strategy for_information_managementOracle strategy for_information_management
Oracle strategy for_information_management
 
HPTS 2011: The NoSQL Ecosystem
HPTS 2011: The NoSQL EcosystemHPTS 2011: The NoSQL Ecosystem
HPTS 2011: The NoSQL Ecosystem
 
Sql vs nosql
Sql vs nosqlSql vs nosql
Sql vs nosql
 
NoSQL A brief look at Apache Cassandra Distributed Database
NoSQL A brief look at Apache Cassandra Distributed DatabaseNoSQL A brief look at Apache Cassandra Distributed Database
NoSQL A brief look at Apache Cassandra Distributed Database
 
Introduction to h base
Introduction to h baseIntroduction to h base
Introduction to h base
 
Storage Systems For Scalable systems
Storage Systems For Scalable systemsStorage Systems For Scalable systems
Storage Systems For Scalable systems
 
State of Cassandra 2012
State of Cassandra 2012State of Cassandra 2012
State of Cassandra 2012
 
Hadoop World 2011: Building Scalable Data Platforms ; Hadoop & Netezza Deploy...
Hadoop World 2011: Building Scalable Data Platforms ; Hadoop & Netezza Deploy...Hadoop World 2011: Building Scalable Data Platforms ; Hadoop & Netezza Deploy...
Hadoop World 2011: Building Scalable Data Platforms ; Hadoop & Netezza Deploy...
 
Conference tutorial: MySQL Cluster as NoSQL
Conference tutorial: MySQL Cluster as NoSQLConference tutorial: MySQL Cluster as NoSQL
Conference tutorial: MySQL Cluster as NoSQL
 
NoSQL databases, the CAP theorem, and the theory of relativity
NoSQL databases, the CAP theorem, and the theory of relativityNoSQL databases, the CAP theorem, and the theory of relativity
NoSQL databases, the CAP theorem, and the theory of relativity
 
Cloud computing era
Cloud computing eraCloud computing era
Cloud computing era
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfs
 
redis
redisredis
redis
 
Cassandra tech talk
Cassandra tech talkCassandra tech talk
Cassandra tech talk
 
Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...
Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...
Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...
 

Destacado

Profiting from customer profitability + big data fitzgerald analytics
Profiting from customer profitability + big data fitzgerald analyticsProfiting from customer profitability + big data fitzgerald analytics
Profiting from customer profitability + big data fitzgerald analytics
Fitzgerald Analytics, Inc.
 
An Introduction to Big Data, NoSQL and MongoDB
An Introduction to Big Data, NoSQL and MongoDBAn Introduction to Big Data, NoSQL and MongoDB
An Introduction to Big Data, NoSQL and MongoDB
William LaForest
 

Destacado (7)

Michael stonebraker mit session
Michael stonebraker mit sessionMichael stonebraker mit session
Michael stonebraker mit session
 
How to Build Fast Data Applications: Evaluating the Top Contenders
How to Build Fast Data Applications: Evaluating the Top ContendersHow to Build Fast Data Applications: Evaluating the Top Contenders
How to Build Fast Data Applications: Evaluating the Top Contenders
 
Governing the Data to Dollars Value Chain™ - Sept 2012 NYC Data Governance Co...
Governing the Data to Dollars Value Chain™ - Sept 2012 NYC Data Governance Co...Governing the Data to Dollars Value Chain™ - Sept 2012 NYC Data Governance Co...
Governing the Data to Dollars Value Chain™ - Sept 2012 NYC Data Governance Co...
 
Profiting from customer profitability + big data fitzgerald analytics
Profiting from customer profitability + big data fitzgerald analyticsProfiting from customer profitability + big data fitzgerald analytics
Profiting from customer profitability + big data fitzgerald analytics
 
An Introduction to Big Data, NoSQL and MongoDB
An Introduction to Big Data, NoSQL and MongoDBAn Introduction to Big Data, NoSQL and MongoDB
An Introduction to Big Data, NoSQL and MongoDB
 
Big data characteristics, value chain and challenges
Big data characteristics, value chain and challengesBig data characteristics, value chain and challenges
Big data characteristics, value chain and challenges
 
The big data value chain r1-31 oct13
The big data value chain r1-31 oct13The big data value chain r1-31 oct13
The big data value chain r1-31 oct13
 

Similar a Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB

Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQL
Don Demcsak
 
Millions quotes per second in pure java
Millions quotes per second in pure javaMillions quotes per second in pure java
Millions quotes per second in pure java
Roman Elizarov
 
Where Does Big Data Meet Big Database - QCon 2012
Where Does Big Data Meet Big Database - QCon 2012Where Does Big Data Meet Big Database - QCon 2012
Where Does Big Data Meet Big Database - QCon 2012
Ben Stopford
 
Big iron 2 (published)
Big iron 2 (published)Big iron 2 (published)
Big iron 2 (published)
Ben Stopford
 
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
Qian Lin
 
Cloud connect 03 08-2011
Cloud connect 03 08-2011Cloud connect 03 08-2011
Cloud connect 03 08-2011
Colin Clark
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentation
Edward Capriolo
 

Similar a Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB (20)

Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQL
 
OSDC 2018 | The operational brain: how new Paradigms like Machine Learning ar...
OSDC 2018 | The operational brain: how new Paradigms like Machine Learning ar...OSDC 2018 | The operational brain: how new Paradigms like Machine Learning ar...
OSDC 2018 | The operational brain: how new Paradigms like Machine Learning ar...
 
Beyond The Data Grid: Coherence, Normalisation, Joins and Linear Scalability
Beyond The Data Grid: Coherence, Normalisation, Joins and Linear ScalabilityBeyond The Data Grid: Coherence, Normalisation, Joins and Linear Scalability
Beyond The Data Grid: Coherence, Normalisation, Joins and Linear Scalability
 
Webinar: The Future of SQL
Webinar: The Future of SQLWebinar: The Future of SQL
Webinar: The Future of SQL
 
PayPal Big Data and MySQL Cluster
PayPal Big Data and MySQL ClusterPayPal Big Data and MySQL Cluster
PayPal Big Data and MySQL Cluster
 
SQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data ArchitectureSQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data Architecture
 
What Does Big Data Mean and Who Will Win
What Does Big Data Mean and Who Will WinWhat Does Big Data Mean and Who Will Win
What Does Big Data Mean and Who Will Win
 
Millions quotes per second in pure java
Millions quotes per second in pure javaMillions quotes per second in pure java
Millions quotes per second in pure java
 
Is NoSQL The Future of Data Storage?
Is NoSQL The Future of Data Storage?Is NoSQL The Future of Data Storage?
Is NoSQL The Future of Data Storage?
 
SpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud Computing
 
Yes sql08 inmemorydb
Yes sql08 inmemorydbYes sql08 inmemorydb
Yes sql08 inmemorydb
 
NoSQL Intro with cassandra
NoSQL Intro with cassandraNoSQL Intro with cassandra
NoSQL Intro with cassandra
 
Where Does Big Data Meet Big Database - QCon 2012
Where Does Big Data Meet Big Database - QCon 2012Where Does Big Data Meet Big Database - QCon 2012
Where Does Big Data Meet Big Database - QCon 2012
 
Big iron 2 (published)
Big iron 2 (published)Big iron 2 (published)
Big iron 2 (published)
 
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
 
Solving k8s persistent workloads using k8s DevOps style
Solving k8s persistent workloads using k8s DevOps styleSolving k8s persistent workloads using k8s DevOps style
Solving k8s persistent workloads using k8s DevOps style
 
Cloud connect 03 08-2011
Cloud connect 03 08-2011Cloud connect 03 08-2011
Cloud connect 03 08-2011
 
Big Data Platforms: An Overview
Big Data Platforms: An OverviewBig Data Platforms: An Overview
Big Data Platforms: An Overview
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentation
 
Wolfgang Lehner Technische Universitat Dresden
Wolfgang Lehner Technische Universitat DresdenWolfgang Lehner Technische Universitat Dresden
Wolfgang Lehner Technische Universitat Dresden
 

Más de BigDataCloud

Deep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
Deep Learning for NLP (without Magic) - Richard Socher and Christopher ManningDeep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
Deep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
BigDataCloud
 
Recommendation Engines - An Architectural Guide
Recommendation Engines - An Architectural GuideRecommendation Engines - An Architectural Guide
Recommendation Engines - An Architectural Guide
BigDataCloud
 
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentationBigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation
BigDataCloud
 

Más de BigDataCloud (20)

Webinar - Comparative Analysis of Cloud based Machine Learning Platforms
Webinar - Comparative Analysis of Cloud based Machine Learning PlatformsWebinar - Comparative Analysis of Cloud based Machine Learning Platforms
Webinar - Comparative Analysis of Cloud based Machine Learning Platforms
 
Crime Analysis & Prediction System
Crime Analysis & Prediction SystemCrime Analysis & Prediction System
Crime Analysis & Prediction System
 
REAL-TIME RECOMMENDATION SYSTEMS
REAL-TIME RECOMMENDATION SYSTEMS REAL-TIME RECOMMENDATION SYSTEMS
REAL-TIME RECOMMENDATION SYSTEMS
 
Cloud Computing Services
Cloud Computing ServicesCloud Computing Services
Cloud Computing Services
 
Google Enterprise Cloud Platform - Resources & $2000 credit!
Google Enterprise Cloud Platform - Resources & $2000 credit!Google Enterprise Cloud Platform - Resources & $2000 credit!
Google Enterprise Cloud Platform - Resources & $2000 credit!
 
Big Data in the Cloud - Solutions & Apps
Big Data in the Cloud - Solutions & AppsBig Data in the Cloud - Solutions & Apps
Big Data in the Cloud - Solutions & Apps
 
Big Data Analytics in Motorola on the Google Cloud Platform
Big Data Analytics in Motorola on the Google Cloud PlatformBig Data Analytics in Motorola on the Google Cloud Platform
Big Data Analytics in Motorola on the Google Cloud Platform
 
Streak + Google Cloud Platform
Streak + Google Cloud PlatformStreak + Google Cloud Platform
Streak + Google Cloud Platform
 
Using Advanced Analyics to bring Business Value
Using Advanced Analyics to bring Business Value Using Advanced Analyics to bring Business Value
Using Advanced Analyics to bring Business Value
 
Creating Business Value from Big Data, Analytics & Technology.
Creating Business Value from Big Data, Analytics & Technology.Creating Business Value from Big Data, Analytics & Technology.
Creating Business Value from Big Data, Analytics & Technology.
 
Deep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
Deep Learning for NLP (without Magic) - Richard Socher and Christopher ManningDeep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
Deep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
 
Recommendation Engines - An Architectural Guide
Recommendation Engines - An Architectural GuideRecommendation Engines - An Architectural Guide
Recommendation Engines - An Architectural Guide
 
Why Hadoop is the New Infrastructure for the CMO?
Why Hadoop is the New Infrastructure for the CMO?Why Hadoop is the New Infrastructure for the CMO?
Why Hadoop is the New Infrastructure for the CMO?
 
Hadoop : A Foundation for Change - Milind Bhandarkar Chief Scientist, Pivotal
Hadoop : A Foundation for Change - Milind Bhandarkar Chief Scientist, PivotalHadoop : A Foundation for Change - Milind Bhandarkar Chief Scientist, Pivotal
Hadoop : A Foundation for Change - Milind Bhandarkar Chief Scientist, Pivotal
 
Big Data Cloud Meetup - Jan 24 2013 - Zettaset
Big Data Cloud Meetup - Jan 24 2013 - ZettasetBig Data Cloud Meetup - Jan 24 2013 - Zettaset
Big Data Cloud Meetup - Jan 24 2013 - Zettaset
 
A Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
A Survey of Petabyte Scale Databases and Storage Systems Deployed at FacebookA Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
A Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
 
Big Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
Big Data Analytics in a Heterogeneous World - Joydeep Das of SybaseBig Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
Big Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
 
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentationBigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation
 
BigDataCloud Sept 8 2011 Meetup - Fail-Proofing Hadoop Clusters with Automati...
BigDataCloud Sept 8 2011 Meetup - Fail-Proofing Hadoop Clusters with Automati...BigDataCloud Sept 8 2011 Meetup - Fail-Proofing Hadoop Clusters with Automati...
BigDataCloud Sept 8 2011 Meetup - Fail-Proofing Hadoop Clusters with Automati...
 
BigDataCloud Sept 8 2011 Meetup - Big Data Analytics for DoddFrank Regulation...
BigDataCloud Sept 8 2011 Meetup - Big Data Analytics for DoddFrank Regulation...BigDataCloud Sept 8 2011 Meetup - Big Data Analytics for DoddFrank Regulation...
BigDataCloud Sept 8 2011 Meetup - Big Data Analytics for DoddFrank Regulation...
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 

Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB

  • 1. VoltDB presents Stonebraker Live! Navigating the Database Universe
  • 2. Co-founder and Chief Strategy Officer SCOTT JARR
  • 3. Agenda • The (proper) design of DBMSs – Presented by Dr. Michael Stonebraker, Co-founder • The database universe – Presented by Scott Jarr, Co-founder and Chief Strategy Officer • Introducing VoltDB 3.0 – Presented by Mark Hydar, VP of Market Technology and Strategy
  • 4. We Believe… • “Big Data” is a rare, transformative market • Velocity is becoming the cornerstone • Specialized databases (working together) are the answer • Products must provide tangible customer value... Fast
  • 5. Dr. Michael Stonebraker THE (PROPER) DESIGN OF THE DBMS
  • 6. Lessons from 40 Years of Database Design 1. Get the user interaction right – Bet on a small number of easy-to- 2. understand constructs – Plus standards Get the implementation right “ Those who don’t learn from history are – Bet on a small number of easy-to- understand constructs destined to repeat it. -Winston Churchill ” 3. One size does not fit all – At least not if you want fast, big or complex
  • 7. #1: Get the User Interaction Right Historical Lesson: RDBMS vs. CODASYL vs. OODB Winner: RDBMS Loser: CODASYL Loser: OODBs • Simple data model • Complicated data model • Complex data model (records; participate in “sets”; (hierarchical (tables) set has one owner records, pointers, sets, ar • Simple access and, perhaps, many rays, etc.) members, etc.) language (SQL) • Complex access • Messy access language (sea • ACID (transactions) of “cursors”; some -- but not language all -- move on every (navigation, through this • Standards (SQL) command, navigation sea) programming) • No standards
  • 8. Interaction Take Away − Simple is Good • ACID was easy for people to understand • SQL provided a standard, high-level language and made people productive (transportable skills)
  • 9. #2: Get the Implementation Right • Leverage a few simple ideas: Early relational implementations Historical Winners – System R storage system dropped links – Views (protection, schema modification, performance) – Cost-based optimizer • Leverage a few simple ideas: Postgres – User-defined data types and functions (adopted by most everybody) – Rules/triggers – No-overwrite storage • Leverage a few simple ideas: Vertica – Store data by column – Compressed up the ging gong – Parallel load without compromising ACID
  • 10. #3: One Size Does NOT Fit All • OSFA is an old technology with hundreds of bags hanging off it • It breaks 100% of the time when under “ …specialized systems can each be a factor of load 50 faster than the • Load = size or speed or complexity single ‘one size fits all’ • Load is increasing at a startling rate system…A factor of 50 is nothing to sneeze at. • Purpose-built will exceed by 10x to 100x • History has not been completely written yet…but let’s look at VoltDB as an -My Top 10 Assertions About Data Warehouses, 2010 ” example
  • 11. Example: VoltDB • Get the interface right – SQL – ACID • Implementation: Leverage a few simple ideas – Main memory – Stored procedures – Deterministic scheduling • Specialization – OLTP focus allowed for above implementation choices
  • 12. Proving the Theory Useful Work • Challenge: OLTP 4% performance Recovery 24% Latching 24% – TPC-C CPU cycles Buffer Pool 24% – On the Shore DBMS Locking 24% prototype – Elephants should be similar
  • 13. Single Threaded • Gets rid of the latching problem • What about Multicore? – Divide the memory on an N-core node so it looks like N single-core nodes – Which are single threaded…
  • 14. Implementation Construct #1: Main Memory • Main memory format for data – Disk format gets you buffer pool overhead • What happens if data doesn’t fit? – Return to disk-buffer pool architecture (slow) – Anti-caching • Main memory format for data • When memory fills up, then bundle together elderly tuples and write them out • Run a transaction in “sleuth mode”; find the required records and move to main memory (and pin) • Run Xact normally
  • 15. Implementation Construct #2: Stored Procedures • Round trip to the DBMS is expensive – Do it once per transaction – Not once per command – Or even once per cursor move • Ad-hoc queries supported – Turn them into dynamic stored procedures
  • 16. Implementation Construct #3: Deterministic Scheduling • Transactions are ordered and run to completion – No locking • Active-active replication (HA) – Run transaction at all replicas – in the same pre-determined order • What about a cluster-wide power failure? – Asyn checkpointing – With a command log – Wildly faster than data logging
  • 17. Result of Design Principles: VoltDB Example • Good interface decisions – made developers more productive – SQL & ACID • Leveraging a few simple implementation ideas – made VoltDB wicked fast – Main memory – Stored procedures – Deterministic scheduling
  • 18. Proving the Theory • Answer: OLTP performance – 3 million transactions per second “ …we are heading toward a world with at least 5 (and probably – 7x Cassandra more) specialized – 15 million SQL statements per engines and the death second of the ‘one size fits all’ – 100,000+ transactions per legacy systems. commodity server ” -The End of an Architectural Era (It’s Time for a Complete Rewrite), 2007
  • 20. Technology Meets the Market Believe – “Big Data” is a rare, transformative market – Velocity is becoming the cornerstone – Specialized databases (working together) are the answer – Products must provide tangible customer value… Fast Observations – Noisy, crowded and new – kinda like Christmas shopping at the mall – Everyone wants to understand where the pieces fit – Analysts build maps on technology NOT use cases What we need is…
  • 21. Data Value Chain Age of Data Interactive Real-time Analytics Record Lookup Historical Analytics Exploratory Analytics Milliseconds Hundredths of seconds Second(s) Minutes Hours • Place trade • Calculate risk • Retrieve click • Backtest algo • Algo discovery • Serve ad • Leaderboard stream • BI • Log analysis • Enrich stream • Aggregate • Show orders • Daily reports • Fraud pattern match • Examine packet • Count • Approve trans.
  • 22. Data Value Chain Value of Individual Aggregate Data Item Data Value Data Value Age of Data Interactive Real-time Analytics Record Lookup Historical Analytics Exploratory Analytics Milliseconds Hundredths of seconds Second(s) Minutes Hours • Place trade • Calculate risk • Retrieve click • Backtest algo • Algo discovery • Serve ad • Leaderboard stream • BI • Log analysis • Enrich stream • Aggregate • Show orders • Daily reports • Fraud pattern match • Examine packet • Count • Approve trans.
  • 23. The Database Universe Fast Complex Large Value of Individual Data Item Aggregate Data Value Application Complexity Data Value Traditional RDBMS Simple Slow Small Transactional Analytic Exploratory Interactive Real-time Analytics Record Lookup Historical Analytics Analytics
  • 24. The Database Universe Fast Complex Large Value of Individual Data Item Aggregate Data Value Application Complexity Data Value Velocity Hadoop, etc. NoSQL Data NewSQL Warehouse Traditional RDBMS Simple Slow Small Transactional Analytic Exploratory Interactive Real-time Analytics Record Lookup Historical Analytics Analytics
  • 25. logins trades authorizations clicks sensors orders impressions Closed-loop Big Data Interactive & Real-time Analytics Historical Reports & Analytics Exploratory Analytics
  • 26. logins trades authorizations clicks sensors orders impressions Closed-loop Big Data • Make the most Interactive & Real-time Analytics informed decision every time there is an interaction • Real-time decisions Historical Reports & Analytics are informed by Knowledge operational analytics and past knowledge Exploratory Analytics
  • 27. The Velocity Use Case What’s it look like? – High throughput, relentless data feeds – Fast decisions on high-value data – Real-time, operational analytics present immediate visibility What’s the big deal? – Batch visibility converts to real time = immediate business impact – Decisions made at time of event = higher impact decisions with immediate returns – Ability to ingest and manage massive amounts of data = business differentiation and disruption
  • 29. Introducing VoltDB 3.0 • Available now! – Both commercial and open source offerings – www.voltdb.com/downloads Introducing VoltDB 3.0 • Key improvements – Even faster – Easier to build high-velocity applications – Expanded reach across developers and applications – Extensible to integrate with existing data infrastructure
  • 30. Latency and Throughput, 50-50 Read/Write Workload VoltDB 3.0 vs. v2.8.4.1 Key/Value 50/50 read/write workload 16 3 Node, K=1 Cluster Latency and Throughput, 50- 14 12 Latency (ms) 50 Read/Write Workload 10 8 3.0 2.8.4.1 6 4 2 0 -50000 0 50000 100000 150000 200000 250000 300000 TPS
  • 31. Read/Write Workload Latency/Throughput 9 VoltDB 3.0 Key/Value various read/write workload 8 3 Node, K=1 Cluster Avg. Latency (ms) Read/Write Workload 7 6 5 10% read/90% write 50% read/50% write Latency/Throughput 90% read/10% write 4 3 2 1 0 -50000 0 50000 100000 150000 200000 250000 300000 350000 TPS
  • 32. Faster: Ad Hoc SQL Performance • Conversational SQL Faster: Ad Hoc SQL • Thousands to 10,000+ ad hoc SQL transactions/second • Single or multiple (batch) SQL statement transaction Performance
  • 33. Easier Development: New SQL Support • SQL LIKE and NOT LIKE Easier Development: • UNION • Column Functions New SQL Support • Counting function (leaderboard ranking queries) • Ability to define index using column functions
  • 34. Easier Development: JSON Support • JSON values stored in a varchar column Easier Development: • Field() column function • Indexing on JSON elements JSON Support CREATE INDEX session_site_moderator ON user_session_table (field(json_data, 'site'), field(json_data, 'moderator'), username); • New JSON sample in kit
  • 35. Easier Development: Online Operations • Ability to re-join a failed node to cluster with no impact to existing operations Easier Development: • Online schema update • No service window Online Operations
  • 36. Easier Development: Streamlined Development • Elimination of project.xml • VoltDB-specific configuration now defined in DDL Easier Development: • Defaulting of deployment.xml Streamlined Development • New Volt Compiler CLI: voltdb compile
  • 37. Expanded Reach: Cloud-Friendly • Reduce impact of variable node performance and latency Expanded Reach: • Elimination of strict NTP configuration • Scales to large # of nodes Cloud-Friendly
  • 38. Integration: High-Performance Export • Parallelized export Integration: High- • New connectors: JDBC, Netezza, Vertica Performance Export
  • 39. Integration: Client Library Updates • New PHP Client Integration: Client • Node.js client v1.0 • Go Client Library Updates • Coming soon: updated Erlang client http://golang.org
  • 40. Other Notable New Features • Explain command • CSV loader utility Other Notable • CSV snapshots • New Administration CLI: voltadmin New Features – voltadmin save – voltadmin restore – voltadmin pause – voltadmin resume – voltadmin shutdown
  • 41. More Samples Available for Download More Samples Available for Download http://voltdb.com/comm unity/volt-labs.php
  • 42. Volt University • Portfolio of instructional content, classes, tools, and other resources to help them built applications quickly • Curriculum and supporting material range from beginner to advanced Volt University • Three types of instruction: – Volt University Online – Volt University Classroom – Volt Vanguard Certification
  • 43. Summary: VoltDB v3.0 Features • Even faster • Easier to build high-velocity applications VoltDB v3.0 • Expanded reach across developers and applications • Extensible to integrate with existing data infrastructure • Volt Labs • Volt University
  • 44. DOWNLOAD 3.0 Imagine the at Possibilities www.voltdb.com
  • 45. More Information? E-mail info@voltdb.com Visit our forums More Information? http://community.voltdb.com/forum Read the VoltDB “Getting Started Guide” http://community.voltdb.com/docs/GettingStarted/index Follow @VoltDB on Twitter

Notas del editor

  1. done on the volt10'sDell R510 server2 x Intel(R) Xeon(R) (quad core) CPU X5670  @ 2.93GHz64GB RAM