6. For key = ‘U20’, tablespace=‘CLIENTS_INFO’
SELECT Name, sum(Amount) FROM
Serving CLIENTS c, SALES s WHERE
c.CID = s.CID AND CID = ‘U20’;
Partition U10 – U35 Partition U36 – U60
Table CLIENTS Table CLIENTS
CID Name CID Name
U20 Doug U40 John
U21 Ted
Table SALES Table SALES
SID CID Amount SID CID Amount
S100 U20 102 S223 U40 99
S101 U20 60
7. For key = ‘U40’, tablespace=‘CLIENTS_INFO’
SELECT Name, sum(Amount) FROM
Serving CLIENTS c, SALES s WHERE
c.CID = s.CID AND CID = ‘U40’;
Partition U10 – U35 Partition U36 – U60
Table CLIENTS Table CLIENTS
CID Name CID Name
U20 Doug U40 John
U21 Ted
Table SALES Table SALES
SID CID Amount SID CID Amount
S100 U20 102 S223 U40 99
S101 U20 60
8. Why does it scale?
Data is partitioned
Partitions are distributed across nodes
Adding more nodes increases capacity
Queries restricted to a single partition
Generation does not impact serving
14. Building a Google Analytics
Imagine that one crazy day you decide to build
some kind of Google Analytics…
Zillions of events
Millions of domains
Individual panel per domain
15. Requirements
Time-based charts (day/hour aggregations)
Flexible dimension breakdown
Per page, per browser
Per country, per language
…
20. Each partition is …
Backed by SQLite
Generated on Hadoop
Including any indexes needed
Data can be sorted before insertion to
minimize disk seeks at query time
Pre-sampling for balancing partition size
Distributed on Splout SQL cluster
With replication for failover
21. Atomicity
A tablespace is a set of tables that
share the same partitioning schema
Tablespaces are versioned
Only one version served at a time
Several tablespaces can be deployed
at once
All-or-nothing semantics (atomicity)
Rollback support
22. Characteristics
Ensured ms latencies
Even when queries hit disk
Controlled by the developer selecting the
proper:
- Cluster topology
- Partitioning
- Indexes
- Data collocation (insertion order)
23. Characteristics (II)
100% SQL
But restricted to a single partition
Real-time aggregations
Joins
Scalability
In data capacity
In performance
24. Characteristics (III)
Atomicity
New data replaces old data all at once
High availability
Through the use of replication
Open Source
25. Characteristics (IV)
Easy to manage
Changing the size of the cluster can be done
without any downtime
Read only
Data is updated in batches
Updates come from new tablespace
deployments
34. Future work
Growing the community
Do you want to collaborate?
Automatic rebalancing on failover
Almost done
Some read/write capabilities
Enabling Splout SQL to become the speed
layer on lambda architectures
35. Iván de Prado Alonso – CEO of Datasalt
www.datasalt.es
@ivanprado
@datasalt
Questions?