SlideShare una empresa de Scribd logo
1 de 35
Iván de Prado Alonso – CEO of Datasalt
www.datasalt.es
@ivanprado
@datasalt




                       Splout SQL
       When Big Data Output is also Big
                    Data
Full SQL*                 Unlike NoSQL

For Big Data              Unlike RDBMS

Web latency &             Unlike Impala,
throughput                Apache Drill, etc.

* Within each partition
How does it work?




  Isolation between generation and serving
Generate tablespace CLIENTS_INFO with
Generation                  table CLIENTS partitioned by CID
                            table SALES    partitioned by CID
   Table CLIENTS                           Tablespace CLIENTS_INFO
  CID      Name                Partition U10 – U35
  U20      Doug                   Table CLIENTS             Table SALES
  U21      Ted                  CID      Name        SID     CID     Amount
  U40      John                 U20      Doug        S100    U20     102
                                U21      Ted         S101    U20     60

         Table SALES           Partition U36 – U60
  SID     CID      Amount
                                  Table CLIENTS             Table SALES
  S100    U20      102
                                CID      Name        SID     CID     Amount
  S101    U20      60
                                U40      John        S223    U40     99
  S223    U40      99
For key = ‘U20’, tablespace=‘CLIENTS_INFO’
                   SELECT Name, sum(Amount) FROM
Serving            CLIENTS c, SALES s WHERE
                   c.CID = s.CID AND CID = ‘U20’;


   Partition U10 – U35                Partition U36 – U60
          Table CLIENTS                      Table CLIENTS
      CID         Name                   CID          Name
      U20         Doug                   U40          John
      U21         Ted

           Table SALES                         Table SALES
    SID     CID     Amount             SID      CID      Amount
    S100    U20     102                S223     U40      99
    S101    U20     60
For key = ‘U40’, tablespace=‘CLIENTS_INFO’
                   SELECT Name, sum(Amount) FROM
Serving            CLIENTS c, SALES s WHERE
                   c.CID = s.CID AND CID = ‘U40’;


   Partition U10 – U35                Partition U36 – U60
          Table CLIENTS                      Table CLIENTS
      CID         Name                   CID          Name
      U20         Doug                   U40          John
      U21         Ted

           Table SALES                         Table SALES
    SID     CID     Amount             SID      CID      Amount
    S100    U20     102                S223     U40      99
    S101    U20     60
Why does it scale?
   Data is partitioned

   Partitions are distributed across nodes

   Adding more nodes increases capacity

   Queries restricted to a single partition

   Generation does not impact serving
Ok, so what is
 Splout SQL
 useful for?
Big Data
Analytics




   Manageable output
Big Data
                   Analytics




Sometimes Big Data output is also Big Data
Splout SQL allows
     to serve
 Big Data results
Let’s see an example …
Building a Google Analytics
Imagine that one crazy day you decide to build
some kind of Google Analytics…

       Zillions of events
       Millions of domains
       Individual panel per domain
Requirements
 Time-based charts (day/hour aggregations)




 Flexible dimension breakdown
    Per page, per browser
    Per country, per language
    …
With Splout SQL
Splout SQL provides
 SQL consolidated
 views for Hadoop
        data
Let’s see more
 details about
  Splout SQL
Splout SQL Architecture
Each partition is …
      Backed by SQLite

      Generated on Hadoop
        Including any indexes needed
        Data can be sorted before insertion to
        minimize disk seeks at query time
        Pre-sampling for balancing partition size
      Distributed on Splout SQL cluster
        With replication for failover
Atomicity
   A tablespace is a set of tables that
   share the same partitioning schema

   Tablespaces are versioned
        Only one version served at a time

   Several tablespaces can be deployed
   at once
        All-or-nothing semantics (atomicity)
        Rollback support
Characteristics
    Ensured ms latencies
     Even when queries hit disk

     Controlled by the developer selecting the
     proper:
        -   Cluster topology
        -   Partitioning
        -   Indexes
        -   Data collocation (insertion order)
Characteristics (II)
    100% SQL
      But restricted to a single partition
      Real-time aggregations
       Joins

     Scalability
      In data capacity
      In performance
Characteristics (III)
    Atomicity
       New data replaces old data all at once

     High availability
       Through the use of replication

    Open Source
Characteristics (IV)
    Easy to manage
      Changing the size of the cluster can be done
      without any downtime

    Read only
      Data is updated in batches
      Updates come from new tablespace
      deployments
Characteristics (V)
    Native connectors
      Hive
      Pig
      Cascading
API - Generation
    Command line
     Loading CSV files
      $ hadoop jar splout-*-hadoop.jar generate …


    Java API



    Connectors
API - Service
    Rest API



                JSON response
API - Console
Benchmark
   350 GB Wikipedia logs
   Aggregation queries impacting 15 rows in
   average
   2-machines cluster
    900 queries/second, 80 ms/query, 80 threads
Benchmark (II)
   4-machines cluster
     3150 queries/second, 40 ms/query, 160 threads




 More info:
    http://sploutsql.com/performance.html
Web latency

       SQL

       Consolidated Views

       For Hadoop
“A good candidate for the serving layer of a lambda architecture”
www.SploutCloud.com - Splout SQL as a service
Future work
   Growing the community
     Do you want to collaborate? 

   Automatic rebalancing on failover
     Almost done

   Some read/write capabilities
     Enabling Splout SQL to become the speed
     layer on lambda architectures
Iván de Prado Alonso – CEO of Datasalt
www.datasalt.es
@ivanprado
@datasalt




        Questions?

Más contenido relacionado

Similar a Splout SQL a richer open source database for Hadoop

Tabular Data Stream: The Binding Between Client and SAP ASE
Tabular Data Stream: The Binding Between Client and SAP ASETabular Data Stream: The Binding Between Client and SAP ASE
Tabular Data Stream: The Binding Between Client and SAP ASESAP Technology
 
Uncovering SQL Server query problems with execution plans - Tony Davis
Uncovering SQL Server query problems with execution plans - Tony DavisUncovering SQL Server query problems with execution plans - Tony Davis
Uncovering SQL Server query problems with execution plans - Tony DavisRed Gate Software
 
Big Data or Data Warehousing? How to Leverage Both in the Enterprise
Big Data or Data Warehousing? How to Leverage Both in the EnterpriseBig Data or Data Warehousing? How to Leverage Both in the Enterprise
Big Data or Data Warehousing? How to Leverage Both in the EnterpriseDean Hallman
 
PNWPHP -- What are Databases so &#%-ing Difficult
PNWPHP -- What are Databases so &#%-ing DifficultPNWPHP -- What are Databases so &#%-ing Difficult
PNWPHP -- What are Databases so &#%-ing DifficultDave Stokes
 
Cisco Switches vs. Huawei Switches
Cisco Switches vs. Huawei SwitchesCisco Switches vs. Huawei Switches
Cisco Switches vs. Huawei Switches美兰 曾
 
Directions EMEA Choosing the best possible Azure platform for NAV
Directions EMEA Choosing the best possible Azure platform for NAVDirections EMEA Choosing the best possible Azure platform for NAV
Directions EMEA Choosing the best possible Azure platform for NAVAleksandar Totovic
 
SQL for JSON: Rich, Declarative Querying for NoSQL Databases and Applications 
SQL for JSON: Rich, Declarative Querying for NoSQL Databases and Applications SQL for JSON: Rich, Declarative Querying for NoSQL Databases and Applications 
SQL for JSON: Rich, Declarative Querying for NoSQL Databases and Applications Keshav Murthy
 
AWS Summit Seoul 2015 - AWS 최신 서비스 살펴보기 - Aurora, Lambda, EFS, Machine Learn...
AWS Summit Seoul 2015 -  AWS 최신 서비스 살펴보기 - Aurora, Lambda, EFS, Machine Learn...AWS Summit Seoul 2015 -  AWS 최신 서비스 살펴보기 - Aurora, Lambda, EFS, Machine Learn...
AWS Summit Seoul 2015 - AWS 최신 서비스 살펴보기 - Aurora, Lambda, EFS, Machine Learn...Amazon Web Services Korea
 
USQ Landdemos Azure Data Lake
USQ Landdemos Azure Data LakeUSQ Landdemos Azure Data Lake
USQ Landdemos Azure Data LakeTrivadis
 
Relocation
RelocationRelocation
Relocationbkelley1
 
RDBMS to NoSQL: Practical Advice from Successful Migrations
RDBMS to NoSQL: Practical Advice from Successful MigrationsRDBMS to NoSQL: Practical Advice from Successful Migrations
RDBMS to NoSQL: Practical Advice from Successful MigrationsScyllaDB
 
RedisConf17 - Searching Billions of Documents with Redis
RedisConf17 - Searching Billions of Documents with RedisRedisConf17 - Searching Billions of Documents with Redis
RedisConf17 - Searching Billions of Documents with RedisRedis Labs
 
Adbms final group project report
Adbms final group project reportAdbms final group project report
Adbms final group project reportJijo Johny
 
Stop competing and start leading: A user experience case study.
Stop competing and start leading: A user experience case study.Stop competing and start leading: A user experience case study.
Stop competing and start leading: A user experience case study.Aspenware
 
Cassandra 3 new features 2016
Cassandra 3 new features 2016Cassandra 3 new features 2016
Cassandra 3 new features 2016Duyhai Doan
 

Similar a Splout SQL a richer open source database for Hadoop (20)

Tabular Data Stream: The Binding Between Client and SAP ASE
Tabular Data Stream: The Binding Between Client and SAP ASETabular Data Stream: The Binding Between Client and SAP ASE
Tabular Data Stream: The Binding Between Client and SAP ASE
 
Uncovering SQL Server query problems with execution plans - Tony Davis
Uncovering SQL Server query problems with execution plans - Tony DavisUncovering SQL Server query problems with execution plans - Tony Davis
Uncovering SQL Server query problems with execution plans - Tony Davis
 
Big Data or Data Warehousing? How to Leverage Both in the Enterprise
Big Data or Data Warehousing? How to Leverage Both in the EnterpriseBig Data or Data Warehousing? How to Leverage Both in the Enterprise
Big Data or Data Warehousing? How to Leverage Both in the Enterprise
 
PNWPHP -- What are Databases so &#%-ing Difficult
PNWPHP -- What are Databases so &#%-ing DifficultPNWPHP -- What are Databases so &#%-ing Difficult
PNWPHP -- What are Databases so &#%-ing Difficult
 
Optimiser votre infrastructure SQL Server avec Azure
Optimiser votre infrastructure SQL Server avec AzureOptimiser votre infrastructure SQL Server avec Azure
Optimiser votre infrastructure SQL Server avec Azure
 
Cisco Switches vs. Huawei Switches
Cisco Switches vs. Huawei SwitchesCisco Switches vs. Huawei Switches
Cisco Switches vs. Huawei Switches
 
Mysql rab2-student
Mysql rab2-studentMysql rab2-student
Mysql rab2-student
 
Mysql rab2-student
Mysql rab2-studentMysql rab2-student
Mysql rab2-student
 
Directions EMEA Choosing the best possible Azure platform for NAV
Directions EMEA Choosing the best possible Azure platform for NAVDirections EMEA Choosing the best possible Azure platform for NAV
Directions EMEA Choosing the best possible Azure platform for NAV
 
SQL for JSON: Rich, Declarative Querying for NoSQL Databases and Applications 
SQL for JSON: Rich, Declarative Querying for NoSQL Databases and Applications SQL for JSON: Rich, Declarative Querying for NoSQL Databases and Applications 
SQL for JSON: Rich, Declarative Querying for NoSQL Databases and Applications 
 
AWS Summit Seoul 2015 - AWS 최신 서비스 살펴보기 - Aurora, Lambda, EFS, Machine Learn...
AWS Summit Seoul 2015 -  AWS 최신 서비스 살펴보기 - Aurora, Lambda, EFS, Machine Learn...AWS Summit Seoul 2015 -  AWS 최신 서비스 살펴보기 - Aurora, Lambda, EFS, Machine Learn...
AWS Summit Seoul 2015 - AWS 최신 서비스 살펴보기 - Aurora, Lambda, EFS, Machine Learn...
 
USQ Landdemos Azure Data Lake
USQ Landdemos Azure Data LakeUSQ Landdemos Azure Data Lake
USQ Landdemos Azure Data Lake
 
Relocation
RelocationRelocation
Relocation
 
RDBMS to NoSQL: Practical Advice from Successful Migrations
RDBMS to NoSQL: Practical Advice from Successful MigrationsRDBMS to NoSQL: Practical Advice from Successful Migrations
RDBMS to NoSQL: Practical Advice from Successful Migrations
 
Modernizando plataforma de bi
Modernizando plataforma de biModernizando plataforma de bi
Modernizando plataforma de bi
 
RedisConf17 - Searching Billions of Documents with Redis
RedisConf17 - Searching Billions of Documents with RedisRedisConf17 - Searching Billions of Documents with Redis
RedisConf17 - Searching Billions of Documents with Redis
 
Adbms final group project report
Adbms final group project reportAdbms final group project report
Adbms final group project report
 
Les00 Intoduction
Les00 IntoductionLes00 Intoduction
Les00 Intoduction
 
Stop competing and start leading: A user experience case study.
Stop competing and start leading: A user experience case study.Stop competing and start leading: A user experience case study.
Stop competing and start leading: A user experience case study.
 
Cassandra 3 new features 2016
Cassandra 3 new features 2016Cassandra 3 new features 2016
Cassandra 3 new features 2016
 

Más de DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Más de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Último

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 

Último (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 

Splout SQL a richer open source database for Hadoop

  • 1. Iván de Prado Alonso – CEO of Datasalt www.datasalt.es @ivanprado @datasalt Splout SQL When Big Data Output is also Big Data
  • 2.
  • 3. Full SQL* Unlike NoSQL For Big Data Unlike RDBMS Web latency & Unlike Impala, throughput Apache Drill, etc. * Within each partition
  • 4. How does it work? Isolation between generation and serving
  • 5. Generate tablespace CLIENTS_INFO with Generation table CLIENTS partitioned by CID table SALES partitioned by CID Table CLIENTS Tablespace CLIENTS_INFO CID Name Partition U10 – U35 U20 Doug Table CLIENTS Table SALES U21 Ted CID Name SID CID Amount U40 John U20 Doug S100 U20 102 U21 Ted S101 U20 60 Table SALES Partition U36 – U60 SID CID Amount Table CLIENTS Table SALES S100 U20 102 CID Name SID CID Amount S101 U20 60 U40 John S223 U40 99 S223 U40 99
  • 6. For key = ‘U20’, tablespace=‘CLIENTS_INFO’ SELECT Name, sum(Amount) FROM Serving CLIENTS c, SALES s WHERE c.CID = s.CID AND CID = ‘U20’; Partition U10 – U35 Partition U36 – U60 Table CLIENTS Table CLIENTS CID Name CID Name U20 Doug U40 John U21 Ted Table SALES Table SALES SID CID Amount SID CID Amount S100 U20 102 S223 U40 99 S101 U20 60
  • 7. For key = ‘U40’, tablespace=‘CLIENTS_INFO’ SELECT Name, sum(Amount) FROM Serving CLIENTS c, SALES s WHERE c.CID = s.CID AND CID = ‘U40’; Partition U10 – U35 Partition U36 – U60 Table CLIENTS Table CLIENTS CID Name CID Name U20 Doug U40 John U21 Ted Table SALES Table SALES SID CID Amount SID CID Amount S100 U20 102 S223 U40 99 S101 U20 60
  • 8. Why does it scale? Data is partitioned Partitions are distributed across nodes Adding more nodes increases capacity Queries restricted to a single partition Generation does not impact serving
  • 9. Ok, so what is Splout SQL useful for?
  • 10. Big Data Analytics Manageable output
  • 11. Big Data Analytics Sometimes Big Data output is also Big Data
  • 12. Splout SQL allows to serve Big Data results
  • 13. Let’s see an example …
  • 14. Building a Google Analytics Imagine that one crazy day you decide to build some kind of Google Analytics… Zillions of events Millions of domains Individual panel per domain
  • 15. Requirements Time-based charts (day/hour aggregations) Flexible dimension breakdown Per page, per browser Per country, per language …
  • 17. Splout SQL provides SQL consolidated views for Hadoop data
  • 18. Let’s see more details about Splout SQL
  • 20. Each partition is … Backed by SQLite Generated on Hadoop Including any indexes needed Data can be sorted before insertion to minimize disk seeks at query time Pre-sampling for balancing partition size Distributed on Splout SQL cluster With replication for failover
  • 21. Atomicity A tablespace is a set of tables that share the same partitioning schema Tablespaces are versioned Only one version served at a time Several tablespaces can be deployed at once All-or-nothing semantics (atomicity) Rollback support
  • 22. Characteristics Ensured ms latencies Even when queries hit disk Controlled by the developer selecting the proper: - Cluster topology - Partitioning - Indexes - Data collocation (insertion order)
  • 23. Characteristics (II) 100% SQL But restricted to a single partition Real-time aggregations Joins Scalability In data capacity In performance
  • 24. Characteristics (III) Atomicity New data replaces old data all at once High availability Through the use of replication Open Source
  • 25. Characteristics (IV) Easy to manage Changing the size of the cluster can be done without any downtime Read only Data is updated in batches Updates come from new tablespace deployments
  • 26. Characteristics (V) Native connectors Hive Pig Cascading
  • 27. API - Generation Command line Loading CSV files $ hadoop jar splout-*-hadoop.jar generate … Java API Connectors
  • 28. API - Service Rest API JSON response
  • 30. Benchmark 350 GB Wikipedia logs Aggregation queries impacting 15 rows in average 2-machines cluster 900 queries/second, 80 ms/query, 80 threads
  • 31. Benchmark (II) 4-machines cluster 3150 queries/second, 40 ms/query, 160 threads More info: http://sploutsql.com/performance.html
  • 32. Web latency SQL Consolidated Views For Hadoop “A good candidate for the serving layer of a lambda architecture”
  • 33. www.SploutCloud.com - Splout SQL as a service
  • 34. Future work Growing the community Do you want to collaborate?  Automatic rebalancing on failover Almost done Some read/write capabilities Enabling Splout SQL to become the speed layer on lambda architectures
  • 35. Iván de Prado Alonso – CEO of Datasalt www.datasalt.es @ivanprado @datasalt Questions?