SlideShare una empresa de Scribd logo
1 de 53
Writing Applications
for Scylla
Shlomi Livne, VP R&D
Presenter
Shlomi Livne, VP of R&D
Shlomi is VP of R&D at ScyllaDB. Prior to ScyllaDB he led the
research and development team at Convergin, which was
acquired by Oracle.
Part 1: Enhancement
for Application Development
The basics
Development Cycle NoSQL Databases
Think about the queries you are going to run
Development Cycle NoSQL Databases
Think about the queries you are going to run
Create a Data Model
Development Cycle NoSQL Databases
Think about the queries you are going to run
Create a Data Model
Use cassandra-stress (or
other) to validate (*)
Develop
Development Cycle NoSQL Databases
Think about the queries you are going to run
Create a Data Model
Use cassandra-stress (or
other) to validate (*)
Develop
Development Cycle NoSQL Databases
Think about the queries you are going to run
Create a Data Model
Use cassandra-stress (or
other) to validate (*)
Develop
Development Cycle NoSQL Databases
Think about the queries you are going to run
Create a Data Model
Use cassandra-stress (or
other) to validate (*)
Develop
CQL Optimization
Development Cycle NoSQL Databases
Think about the queries you are going to run
Create a Data Model
Use cassandra-stress (or
other) to validate (*)
Develop
Scale test
Development Cycle NoSQL Databases
Think about the queries you are going to run
Create a Data Model
Use cassandra-stress (or
other) to validate (*)
Develop
Scale test
Development Cycle NoSQL Databases
Think about the queries you are going to run
Create a Data Model
Use cassandra-stress (or
other) to validate (*)
Develop
Scale test
Deploy
Development Cycle NoSQL Databases
Think about the queries you are going to run
Create a Data Model
Use cassandra-stress (or
other) to validate (*)
Develop
Scale test
Deploy
Use Disk Access
Development Cycle NoSQL Databases
Think about the queries you are going to run
Create a Data Model
Use cassandra-stress (or
other) to validate (*)
Develop
Scale test
Deploy
Disk Access
What will Disk Access track
■ Disk Access looks at:
● Amount of I/O operations
● Overall amount of read bytes
■ When sstables are read from disk there are two related components
(everything else is in memory):
● Data - stores the actual data
● Index - provides lookup into the data file “blocks” that contain the partition (if the
partition is large - it contains promoted index)
Disk Access - Why
● The ratio memory : disk is increasing:
○ EC2 i3 family memory : disk ratio for is: 1:30
○ EC2 i3en family memory : disk ratio for is 1:78
○ More queries will be served from disk
Disk Access - Why
● The ratio memory : disk is increasing:
○ EC2 i3 family memory : disk ratio for is: 1:30
○ EC2 i3en family memory : disk ratio for is 1:78
○ More queries will be served from disk
● There are workloads that you will always prefer running from disk
(background analytics)
Sample App Billy(+)
An IoT application(+)
Total amount of data points
526 billion
temperature readings
1,000,000 sensors, representing homes in an area
365 days (1 year storage requirement) 1 reading per second
Analytics over the entire data?
How long would it take at
normal speeds?
We need more if analytics
are a part of the pipeline
That means we need Scylla
We need a good application
And we need hardware
200,000 points/second
730 hours (30 days)
1 million points/second
146 hours (almost a week)
Why climb Mount Everest?
Because it’s there.
George Leigh Mallory
What kind of performance are we after?
Data Model
CREATE TABLE readings (
sensor_id int,
date date,
time time,
temperature float,
PRIMARY KEY ((sensor_id, date), time))
What kind of queries can we reasonably support?
■ SELECT * from readings where sensor_id = ? and date = ?;
■ SELECT * from readings where sensor_id = ? and date = ? and time > ?;
Analytics Application Option 1
■ Let the server do as much work as possible
SELECT sensor_id,
date,
min(temperature) as minTemperature,
max(temperature) as maxTemperature
FROM readings where sensor_id = ? and date = ?`
Application
(Example) Total amount of data to scan: 1.44 billion points/day
Coordinator
Worker
(loader machine)
ScyllaDB cluster
Worker
(loader machine)
Worker
(loader machine)
Set time frame,
compute average,
min, max of
all sensors
Disk Access Analysis Option 1 (in theory)
● For simplification lets assume
○ Every partition:
■ is fully stored in a single sstable
■ is exactly placed in a single data block
○ Bloom filters do not provide false positives
● Analysis
Number of partitions 365 * 10^6 = 365 Million
I/O for index 365 Million
I/O for data 365 Million
Analytics Application Option 2
■ Do range scan’s and use CQL GROUP BY (new in 3.2)
SELECT sensor_id,
date,
min(temperature) as minTemperature,
max(temperature) as maxTemperature
FROM readings where token(sensor_id, token_id) > X and
token(sensor_id, token_id) < Y GROUP BY sensor_id, date
Application
(Example) Total amount of data to scan: 1.44 billion points/day
Coordinator
Worker
(loader machine)
ScyllaDB cluster
Worker
(loader machine)
Worker
(loader machine)
Set time frame,
compute average,
min, max of
all sensors
Disk Access Analysis Option 2 (in theory)
● For simplification lets assume
○ Application breaks requests by vnode token ranges
○ Every partition:
■ is fully stored in a single sstable
■ is exactly placed in a single data block (and the only one there)
■ vnode token ranges do not share data blocks
○ Bloom filters do not provide false positives
● Analysis
Number of scans Number of vnode token ranges
I/O for index Number of vnode token ranges *
Number of shards
I/O for data Number of data blocks
Disk Access Comparison
Option1: Single Partition Option 2: Range Scans
Number of ops Number of partitions
365 * 10^6 = 365 Million
Number of scans
Number of vnode token ranges
83 * 256 = 21248
I/O for index 365 Million Number of vnode token ranges *
Number of shards
83 * 256 * 54 = 1147392
I/O for data 365 Million 365 Million
Billy using Full Scan (theoretical) gain
1. The number of I/O ops for Index on the cluster drops from 365 Million
to ~ 1.2 Million
● In reality SSTable Bloom Filters are not perfect so single partitions reads will be
attempted on sstables that don’t have the partition - even a bigger win for scans)
1. The number of CQL operations on the cluster drops from 365 Million
to ~22K
● Returning a result per partition - 365M / 5000 (page size) = 73K pages (in optimal
case) so we will need more than 22K requests.
1. In reality partitions do share data blocks they are not perfectly
aligned
Putting Data Access into practice
● Queries
● Data model
● Some test data (at small scale)
● Docker
● Scylla-Nightly (Pre Scylla 3.2)
○ Tracing including disk access
‘ ... mc-132-big-Index.db: finished bulk DMA read of size 538 at offset 0,
successfully read 4096 bytes [shard 0] ‘
● A simple script that parses system_trace.events after running a
traced query
Billy on small scale
■ 1000 sensors, 100 dates, 1 sample per minute
■ 1 M partitions, 1440 M rows
■ # shards 4
Billy on small scale
■ 1000 sensors, 100 dates, 1 sample per minute
■ 1 M partitions, 1440 M rows
■ # shards 4
Results
Single Partition Range Scan Gain
Index I/O ~1.3 M
Index Bytes ~2.8 GB
Data I/O ~1 M
Data Bytes ~14.4 GB
Billy on small scale
■ 1000 sensors, 100 dates, 1 sample per minute
■ 1 M partitions, 1440 M rows
■ # shards 4
Results
Single Partition Range Scan Gain
Index I/O ~1.3 M 3318 X 392
Index Bytes ~2.8 GB ~6.9 M X 424
Data I/O ~1 M 10738 X 93
Data Bytes ~14.4 GB ~1.3 GB X 11
Billy using Full Scan gain is even bigger
1. Read aheads for the full scans - utilizing better the disk
● Single Partition Avg Data Byts: 14748600348÷1024089 = ~14.5K
● Range Scan Avg Data Bytes: 1355390291÷10738 = ~126K
1. AIO reads are sent to the disk aligning to Index/Data placement - yet
disks do block size reads:
● Doing 2 reads for two halves of a disk block will result in reading the block twice and
returning part of it each time.
Should Range Scans always be used
for analytics ?
■ No
■ If Number of Partitions < Number of Token Ranges * Number of
Shards
■ What if we are doing a partial scan - what should we do ?
a. Example: What was the max & min temperature over the last 7/30/90 days
Billy+: Partial Scan
SELECT sensor_id,
date,
min(temperature) as minTemperature,
max(temperature) as maxTemperature
FROM readings where token(sensor_id, date) > X and
token(sensor_id, date) < Y and date >= Z GROUP BY sensor_id,
date ALLOW FILTERING
Billy+: Partial Scan
● If we are back to the simplifications: ~7% seems to be a good mark:
○ Partial Scan < 7% data use single partitions
○ Partial Scan > 7% data use full scan and filter
● General case: it depends how big the partitions are
○ Larger partitions have a higher penalty on reading them unnecessarily
Single Partition Range Scan
Total I/O ~2.3 M 14056 0.6%
Total Bytes ~17.7 GB ~1.3 GB 7.7%
We deployed we are on the
beach and drinking a Mojito
Evaluating a data model
We need this done faster - for simplicity lets add static min/max for each
partition that will cache the info - does this help
CREATE TABLE readings (
sensor_id int,
date date,
time time,
temperature float,
temp_min float static,
temp_max float static,
PRIMARY KEY ((sensor_id, date), time))
■ Do range scan’s and use CQL PER PARTITION LIMIT (new in 3.1)
SELECT sensor_id,
date,
temp_min,
temp_max
FROM readings where token(sensor_id, token_id) > X and
token(sensor_id, token_id) < Y PER PARTITION LIMIT 1
Results
Range Scan Range Scan pre-computed Gain
Index I/O 3318 2874 X 1.15
Index Bytes ~6.9 M ~5.9 M X 1.15
Data I/O 10738 3520 X 3.05
Data Bytes ~1.3 GB ~430 M X 3.15
Part 2: Scylla 3.1 / 3.2 Additional
CQL Features
CQL BYPASS CACHE
■ Scylla uses Read-Through caching - if information read is not in the
cache it will be added
■ CQL BYPASS CACHE allows overriding that for a specific query - don’t
read via the cache / don’t populate the cache
CQL PER PARTITION LIMIT
Limits the number of rows that are returned for each partition
cqlsh:ks> select * from samples ;
pk | ck | val
----+----+-----
10 | 1 | 1
10 | 2 | 2
11 | 1 | 3
11 | 2 | 4
cqlsh:ks> select * from samples PER
PARTITION LIMIT 1;
pk | ck | val
----+----+-----
10 | 1 | 1
11 | 1 | 3
CQL GROUP BY
■ The GROUP BY option allows to condense into a single row all
selected rows that share the same values for a set of columns (that
are limited to partition key + optionally clustering keys)
■ Aggregate functions will produce a separate value for each group.
cqlsh:ks> select * from samples ;
pk | ck | val
----+----+-----
10 | 1 | 1
10 | 2 | 2
11 | 1 | 3
11 | 2 | 4
select pk, min(val),max (val) from
samples GROUP BY PK;
pk | system.min(val) |
system.max(val)
----+-----------------+--------------
---
10 | 1 |
2
11 | 3 |
4
CQL LIKE
■ Filtering using LIKE syntax
■ No need for indexing
cqlsh:ks> select * from samples ;
pk | ck | val
----+----+-----
10 | 1 | 1
10 | 2 | 2
11 | 1 | 3
11 | 2 | 4
cqlsh:ks> select * from samples
where pk like '%0' ALLOW FILTERING;
pk | ck | val
----+----+-----
10 | 1 | 1
10 | 2 | 2
To Summarize
● Disk Access:
○ Is another form that can be used to evaluate data models
○ Its especially useful for the analytics / background batch processing jobs - since
those will access data from disk
● Scylla 3.1 includes
○ CQL:
■ BYPASS CACHE( )
■ PER PARTITION LIMIT
● Upcoming Scylla 3.2 will include:
○ Tracing with Disk Access
○ CQL:
■ GROUP BY
■ LIKE ( )
■ Non Frozen UDTS (not covered)
● Optimized(*) full scans reduce the overall amount of disk access -
when compared to aggregated single partition scans
Thank you Stay in touch
Any questions?
Shlomi Livne
shlomi@scylladb.com
@shlomilivne

Más contenido relacionado

La actualidad más candente

A glimpse of cassandra 4.0 features netflix
A glimpse of cassandra 4.0 features   netflixA glimpse of cassandra 4.0 features   netflix
A glimpse of cassandra 4.0 features netflixVinay Kumar Chella
 
Sizing Your Scylla Cluster
Sizing Your Scylla ClusterSizing Your Scylla Cluster
Sizing Your Scylla ClusterScyllaDB
 
Safer restarts, faster streaming, and better repair, just a glimpse of cassan...
Safer restarts, faster streaming, and better repair, just a glimpse of cassan...Safer restarts, faster streaming, and better repair, just a glimpse of cassan...
Safer restarts, faster streaming, and better repair, just a glimpse of cassan...Vinay Kumar Chella
 
Scylla Summit 2019 Keynote - Dor Laor - Beyond Cassandra
Scylla Summit 2019 Keynote - Dor Laor - Beyond CassandraScylla Summit 2019 Keynote - Dor Laor - Beyond Cassandra
Scylla Summit 2019 Keynote - Dor Laor - Beyond CassandraScyllaDB
 
FireEye & Scylla: Intel Threat Analysis Using a Graph Database
FireEye & Scylla: Intel Threat Analysis Using a Graph DatabaseFireEye & Scylla: Intel Threat Analysis Using a Graph Database
FireEye & Scylla: Intel Threat Analysis Using a Graph DatabaseScyllaDB
 
iFood on Delivering 100 Million Events a Month to Restaurants with Scylla
iFood on Delivering 100 Million Events a Month to Restaurants with ScyllaiFood on Delivering 100 Million Events a Month to Restaurants with Scylla
iFood on Delivering 100 Million Events a Month to Restaurants with ScyllaScyllaDB
 
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...DataStax
 
Seastar Summit 2019 Keynote
Seastar Summit 2019 KeynoteSeastar Summit 2019 Keynote
Seastar Summit 2019 KeynoteScyllaDB
 
ScyllaDB @ Apache BigData, may 2016
ScyllaDB @ Apache BigData, may 2016ScyllaDB @ Apache BigData, may 2016
ScyllaDB @ Apache BigData, may 2016Tzach Livyatan
 
How Workload Prioritization Reduces Your Datacenter Footprint
How Workload Prioritization Reduces Your Datacenter FootprintHow Workload Prioritization Reduces Your Datacenter Footprint
How Workload Prioritization Reduces Your Datacenter FootprintScyllaDB
 
Scylla Summit 2016: Compose on Containing the Database
Scylla Summit 2016: Compose on Containing the DatabaseScylla Summit 2016: Compose on Containing the Database
Scylla Summit 2016: Compose on Containing the DatabaseScyllaDB
 
Seastar / ScyllaDB, or how we implemented a 10-times faster Cassandra
Seastar / ScyllaDB,  or how we implemented a 10-times faster CassandraSeastar / ScyllaDB,  or how we implemented a 10-times faster Cassandra
Seastar / ScyllaDB, or how we implemented a 10-times faster CassandraTzach Livyatan
 
Looking towards an official cassandra sidecar netflix
Looking towards an official cassandra sidecar   netflixLooking towards an official cassandra sidecar   netflix
Looking towards an official cassandra sidecar netflixVinay Kumar Chella
 
Scylla Summit 2018: Consensus in Eventually Consistent Databases
Scylla Summit 2018: Consensus in Eventually Consistent DatabasesScylla Summit 2018: Consensus in Eventually Consistent Databases
Scylla Summit 2018: Consensus in Eventually Consistent DatabasesScyllaDB
 
Building and running cloud native cassandra
Building and running cloud native cassandraBuilding and running cloud native cassandra
Building and running cloud native cassandraVinay Kumar Chella
 
Comparing Apache Cassandra 4.0, 3.0, and ScyllaDB
Comparing Apache Cassandra 4.0, 3.0, and ScyllaDBComparing Apache Cassandra 4.0, 3.0, and ScyllaDB
Comparing Apache Cassandra 4.0, 3.0, and ScyllaDBScyllaDB
 
Cassandra Exports as a Trivially Parallelizable Problem (Emilio Del Tessandor...
Cassandra Exports as a Trivially Parallelizable Problem (Emilio Del Tessandor...Cassandra Exports as a Trivially Parallelizable Problem (Emilio Del Tessandor...
Cassandra Exports as a Trivially Parallelizable Problem (Emilio Del Tessandor...DataStax
 
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...ScyllaDB
 
S3 cassandra or outer space? dumping time series data using spark
S3 cassandra or outer space? dumping time series data using sparkS3 cassandra or outer space? dumping time series data using spark
S3 cassandra or outer space? dumping time series data using sparkDemi Ben-Ari
 
ScyllaDB: NoSQL at Ludicrous Speed
ScyllaDB: NoSQL at Ludicrous SpeedScyllaDB: NoSQL at Ludicrous Speed
ScyllaDB: NoSQL at Ludicrous SpeedJ On The Beach
 

La actualidad más candente (20)

A glimpse of cassandra 4.0 features netflix
A glimpse of cassandra 4.0 features   netflixA glimpse of cassandra 4.0 features   netflix
A glimpse of cassandra 4.0 features netflix
 
Sizing Your Scylla Cluster
Sizing Your Scylla ClusterSizing Your Scylla Cluster
Sizing Your Scylla Cluster
 
Safer restarts, faster streaming, and better repair, just a glimpse of cassan...
Safer restarts, faster streaming, and better repair, just a glimpse of cassan...Safer restarts, faster streaming, and better repair, just a glimpse of cassan...
Safer restarts, faster streaming, and better repair, just a glimpse of cassan...
 
Scylla Summit 2019 Keynote - Dor Laor - Beyond Cassandra
Scylla Summit 2019 Keynote - Dor Laor - Beyond CassandraScylla Summit 2019 Keynote - Dor Laor - Beyond Cassandra
Scylla Summit 2019 Keynote - Dor Laor - Beyond Cassandra
 
FireEye & Scylla: Intel Threat Analysis Using a Graph Database
FireEye & Scylla: Intel Threat Analysis Using a Graph DatabaseFireEye & Scylla: Intel Threat Analysis Using a Graph Database
FireEye & Scylla: Intel Threat Analysis Using a Graph Database
 
iFood on Delivering 100 Million Events a Month to Restaurants with Scylla
iFood on Delivering 100 Million Events a Month to Restaurants with ScyllaiFood on Delivering 100 Million Events a Month to Restaurants with Scylla
iFood on Delivering 100 Million Events a Month to Restaurants with Scylla
 
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
 
Seastar Summit 2019 Keynote
Seastar Summit 2019 KeynoteSeastar Summit 2019 Keynote
Seastar Summit 2019 Keynote
 
ScyllaDB @ Apache BigData, may 2016
ScyllaDB @ Apache BigData, may 2016ScyllaDB @ Apache BigData, may 2016
ScyllaDB @ Apache BigData, may 2016
 
How Workload Prioritization Reduces Your Datacenter Footprint
How Workload Prioritization Reduces Your Datacenter FootprintHow Workload Prioritization Reduces Your Datacenter Footprint
How Workload Prioritization Reduces Your Datacenter Footprint
 
Scylla Summit 2016: Compose on Containing the Database
Scylla Summit 2016: Compose on Containing the DatabaseScylla Summit 2016: Compose on Containing the Database
Scylla Summit 2016: Compose on Containing the Database
 
Seastar / ScyllaDB, or how we implemented a 10-times faster Cassandra
Seastar / ScyllaDB,  or how we implemented a 10-times faster CassandraSeastar / ScyllaDB,  or how we implemented a 10-times faster Cassandra
Seastar / ScyllaDB, or how we implemented a 10-times faster Cassandra
 
Looking towards an official cassandra sidecar netflix
Looking towards an official cassandra sidecar   netflixLooking towards an official cassandra sidecar   netflix
Looking towards an official cassandra sidecar netflix
 
Scylla Summit 2018: Consensus in Eventually Consistent Databases
Scylla Summit 2018: Consensus in Eventually Consistent DatabasesScylla Summit 2018: Consensus in Eventually Consistent Databases
Scylla Summit 2018: Consensus in Eventually Consistent Databases
 
Building and running cloud native cassandra
Building and running cloud native cassandraBuilding and running cloud native cassandra
Building and running cloud native cassandra
 
Comparing Apache Cassandra 4.0, 3.0, and ScyllaDB
Comparing Apache Cassandra 4.0, 3.0, and ScyllaDBComparing Apache Cassandra 4.0, 3.0, and ScyllaDB
Comparing Apache Cassandra 4.0, 3.0, and ScyllaDB
 
Cassandra Exports as a Trivially Parallelizable Problem (Emilio Del Tessandor...
Cassandra Exports as a Trivially Parallelizable Problem (Emilio Del Tessandor...Cassandra Exports as a Trivially Parallelizable Problem (Emilio Del Tessandor...
Cassandra Exports as a Trivially Parallelizable Problem (Emilio Del Tessandor...
 
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
 
S3 cassandra or outer space? dumping time series data using spark
S3 cassandra or outer space? dumping time series data using sparkS3 cassandra or outer space? dumping time series data using spark
S3 cassandra or outer space? dumping time series data using spark
 
ScyllaDB: NoSQL at Ludicrous Speed
ScyllaDB: NoSQL at Ludicrous SpeedScyllaDB: NoSQL at Ludicrous Speed
ScyllaDB: NoSQL at Ludicrous Speed
 

Similar a Writing Applications for Scylla

Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018Codemotion
 
BWC Supercomputing 2008 Presentation
BWC Supercomputing 2008 PresentationBWC Supercomputing 2008 Presentation
BWC Supercomputing 2008 Presentationlilyco
 
Security sizing meetup
Security sizing meetupSecurity sizing meetup
Security sizing meetupDaliya Spasova
 
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...NETWAYS
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixC4Media
 
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...MongoDB
 
Using BigBench to compare Hive and Spark (short version)
Using BigBench to compare Hive and Spark (short version)Using BigBench to compare Hive and Spark (short version)
Using BigBench to compare Hive and Spark (short version)Nicolas Poggi
 
Apache con 2020 use cases and optimizations of iotdb
Apache con 2020 use cases and optimizations of iotdbApache con 2020 use cases and optimizations of iotdb
Apache con 2020 use cases and optimizations of iotdbZhangZhengming
 
Approximate "Now" is Better Than Accurate "Later"
Approximate "Now" is Better Than Accurate "Later"Approximate "Now" is Better Than Accurate "Later"
Approximate "Now" is Better Than Accurate "Later"NUS-ISS
 
Sizing MongoDB Clusters
Sizing MongoDB Clusters Sizing MongoDB Clusters
Sizing MongoDB Clusters MongoDB
 
Our journey with druid - from initial research to full production scale
Our journey with druid - from initial research to full production scaleOur journey with druid - from initial research to full production scale
Our journey with druid - from initial research to full production scaleItai Yaffe
 
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_SummaryHiram Fleitas León
 
MongoDB for Time Series Data: Sharding
MongoDB for Time Series Data: ShardingMongoDB for Time Series Data: Sharding
MongoDB for Time Series Data: ShardingMongoDB
 
Lessons learned from designing a QA Automation for analytics databases (big d...
Lessons learned from designing a QA Automation for analytics databases (big d...Lessons learned from designing a QA Automation for analytics databases (big d...
Lessons learned from designing a QA Automation for analytics databases (big d...Omid Vahdaty
 
AWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDB
AWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDBAWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDB
AWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDBAmazon Web Services
 
How to Develop and Operate Cloud First Data Platforms
How to Develop and Operate Cloud First Data PlatformsHow to Develop and Operate Cloud First Data Platforms
How to Develop and Operate Cloud First Data PlatformsAlluxio, Inc.
 
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...Amazon Web Services
 
Sqlmaterial 120414024230-phpapp01
Sqlmaterial 120414024230-phpapp01Sqlmaterial 120414024230-phpapp01
Sqlmaterial 120414024230-phpapp01Lalit009kumar
 
Lessons learned from designing QA automation event streaming platform(IoT big...
Lessons learned from designing QA automation event streaming platform(IoT big...Lessons learned from designing QA automation event streaming platform(IoT big...
Lessons learned from designing QA automation event streaming platform(IoT big...Omid Vahdaty
 
Adventures in RDS Load Testing
Adventures in RDS Load TestingAdventures in RDS Load Testing
Adventures in RDS Load TestingMike Harnish
 

Similar a Writing Applications for Scylla (20)

Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
 
BWC Supercomputing 2008 Presentation
BWC Supercomputing 2008 PresentationBWC Supercomputing 2008 Presentation
BWC Supercomputing 2008 Presentation
 
Security sizing meetup
Security sizing meetupSecurity sizing meetup
Security sizing meetup
 
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
 
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
 
Using BigBench to compare Hive and Spark (short version)
Using BigBench to compare Hive and Spark (short version)Using BigBench to compare Hive and Spark (short version)
Using BigBench to compare Hive and Spark (short version)
 
Apache con 2020 use cases and optimizations of iotdb
Apache con 2020 use cases and optimizations of iotdbApache con 2020 use cases and optimizations of iotdb
Apache con 2020 use cases and optimizations of iotdb
 
Approximate "Now" is Better Than Accurate "Later"
Approximate "Now" is Better Than Accurate "Later"Approximate "Now" is Better Than Accurate "Later"
Approximate "Now" is Better Than Accurate "Later"
 
Sizing MongoDB Clusters
Sizing MongoDB Clusters Sizing MongoDB Clusters
Sizing MongoDB Clusters
 
Our journey with druid - from initial research to full production scale
Our journey with druid - from initial research to full production scaleOur journey with druid - from initial research to full production scale
Our journey with druid - from initial research to full production scale
 
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
 
MongoDB for Time Series Data: Sharding
MongoDB for Time Series Data: ShardingMongoDB for Time Series Data: Sharding
MongoDB for Time Series Data: Sharding
 
Lessons learned from designing a QA Automation for analytics databases (big d...
Lessons learned from designing a QA Automation for analytics databases (big d...Lessons learned from designing a QA Automation for analytics databases (big d...
Lessons learned from designing a QA Automation for analytics databases (big d...
 
AWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDB
AWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDBAWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDB
AWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDB
 
How to Develop and Operate Cloud First Data Platforms
How to Develop and Operate Cloud First Data PlatformsHow to Develop and Operate Cloud First Data Platforms
How to Develop and Operate Cloud First Data Platforms
 
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
 
Sqlmaterial 120414024230-phpapp01
Sqlmaterial 120414024230-phpapp01Sqlmaterial 120414024230-phpapp01
Sqlmaterial 120414024230-phpapp01
 
Lessons learned from designing QA automation event streaming platform(IoT big...
Lessons learned from designing QA automation event streaming platform(IoT big...Lessons learned from designing QA automation event streaming platform(IoT big...
Lessons learned from designing QA automation event streaming platform(IoT big...
 
Adventures in RDS Load Testing
Adventures in RDS Load TestingAdventures in RDS Load Testing
Adventures in RDS Load Testing
 

Más de ScyllaDB

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLScyllaDB
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasScyllaDB
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasScyllaDB
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...ScyllaDB
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...ScyllaDB
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaScyllaDB
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityScyllaDB
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptxScyllaDB
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDBScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationScyllaDB
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsScyllaDB
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesScyllaDB
 
ScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB
 
DBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsDBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsScyllaDB
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBScyllaDB
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101ScyllaDB
 

Más de ScyllaDB (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQL
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & Pitfalls
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
 
ScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual Workshop
 
DBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsDBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & Tradeoffs
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDB
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101
 

Último

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 

Último (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 

Writing Applications for Scylla

  • 2. Presenter Shlomi Livne, VP of R&D Shlomi is VP of R&D at ScyllaDB. Prior to ScyllaDB he led the research and development team at Convergin, which was acquired by Oracle.
  • 3. Part 1: Enhancement for Application Development
  • 5. Development Cycle NoSQL Databases Think about the queries you are going to run
  • 6. Development Cycle NoSQL Databases Think about the queries you are going to run Create a Data Model
  • 7. Development Cycle NoSQL Databases Think about the queries you are going to run Create a Data Model Use cassandra-stress (or other) to validate (*) Develop
  • 8. Development Cycle NoSQL Databases Think about the queries you are going to run Create a Data Model Use cassandra-stress (or other) to validate (*) Develop
  • 9. Development Cycle NoSQL Databases Think about the queries you are going to run Create a Data Model Use cassandra-stress (or other) to validate (*) Develop
  • 10. Development Cycle NoSQL Databases Think about the queries you are going to run Create a Data Model Use cassandra-stress (or other) to validate (*) Develop CQL Optimization
  • 11. Development Cycle NoSQL Databases Think about the queries you are going to run Create a Data Model Use cassandra-stress (or other) to validate (*) Develop Scale test
  • 12. Development Cycle NoSQL Databases Think about the queries you are going to run Create a Data Model Use cassandra-stress (or other) to validate (*) Develop Scale test
  • 13. Development Cycle NoSQL Databases Think about the queries you are going to run Create a Data Model Use cassandra-stress (or other) to validate (*) Develop Scale test Deploy
  • 14. Development Cycle NoSQL Databases Think about the queries you are going to run Create a Data Model Use cassandra-stress (or other) to validate (*) Develop Scale test Deploy
  • 16. Development Cycle NoSQL Databases Think about the queries you are going to run Create a Data Model Use cassandra-stress (or other) to validate (*) Develop Scale test Deploy Disk Access
  • 17. What will Disk Access track ■ Disk Access looks at: ● Amount of I/O operations ● Overall amount of read bytes ■ When sstables are read from disk there are two related components (everything else is in memory): ● Data - stores the actual data ● Index - provides lookup into the data file “blocks” that contain the partition (if the partition is large - it contains promoted index)
  • 18. Disk Access - Why ● The ratio memory : disk is increasing: ○ EC2 i3 family memory : disk ratio for is: 1:30 ○ EC2 i3en family memory : disk ratio for is 1:78 ○ More queries will be served from disk
  • 19. Disk Access - Why ● The ratio memory : disk is increasing: ○ EC2 i3 family memory : disk ratio for is: 1:30 ○ EC2 i3en family memory : disk ratio for is 1:78 ○ More queries will be served from disk ● There are workloads that you will always prefer running from disk (background analytics)
  • 21. An IoT application(+) Total amount of data points 526 billion temperature readings 1,000,000 sensors, representing homes in an area 365 days (1 year storage requirement) 1 reading per second
  • 22. Analytics over the entire data? How long would it take at normal speeds? We need more if analytics are a part of the pipeline That means we need Scylla We need a good application And we need hardware 200,000 points/second 730 hours (30 days) 1 million points/second 146 hours (almost a week)
  • 23. Why climb Mount Everest? Because it’s there. George Leigh Mallory What kind of performance are we after?
  • 24. Data Model CREATE TABLE readings ( sensor_id int, date date, time time, temperature float, PRIMARY KEY ((sensor_id, date), time)) What kind of queries can we reasonably support? ■ SELECT * from readings where sensor_id = ? and date = ?; ■ SELECT * from readings where sensor_id = ? and date = ? and time > ?;
  • 25. Analytics Application Option 1 ■ Let the server do as much work as possible SELECT sensor_id, date, min(temperature) as minTemperature, max(temperature) as maxTemperature FROM readings where sensor_id = ? and date = ?`
  • 26. Application (Example) Total amount of data to scan: 1.44 billion points/day Coordinator Worker (loader machine) ScyllaDB cluster Worker (loader machine) Worker (loader machine) Set time frame, compute average, min, max of all sensors
  • 27. Disk Access Analysis Option 1 (in theory) ● For simplification lets assume ○ Every partition: ■ is fully stored in a single sstable ■ is exactly placed in a single data block ○ Bloom filters do not provide false positives ● Analysis Number of partitions 365 * 10^6 = 365 Million I/O for index 365 Million I/O for data 365 Million
  • 28. Analytics Application Option 2 ■ Do range scan’s and use CQL GROUP BY (new in 3.2) SELECT sensor_id, date, min(temperature) as minTemperature, max(temperature) as maxTemperature FROM readings where token(sensor_id, token_id) > X and token(sensor_id, token_id) < Y GROUP BY sensor_id, date
  • 29. Application (Example) Total amount of data to scan: 1.44 billion points/day Coordinator Worker (loader machine) ScyllaDB cluster Worker (loader machine) Worker (loader machine) Set time frame, compute average, min, max of all sensors
  • 30. Disk Access Analysis Option 2 (in theory) ● For simplification lets assume ○ Application breaks requests by vnode token ranges ○ Every partition: ■ is fully stored in a single sstable ■ is exactly placed in a single data block (and the only one there) ■ vnode token ranges do not share data blocks ○ Bloom filters do not provide false positives ● Analysis Number of scans Number of vnode token ranges I/O for index Number of vnode token ranges * Number of shards I/O for data Number of data blocks
  • 31. Disk Access Comparison Option1: Single Partition Option 2: Range Scans Number of ops Number of partitions 365 * 10^6 = 365 Million Number of scans Number of vnode token ranges 83 * 256 = 21248 I/O for index 365 Million Number of vnode token ranges * Number of shards 83 * 256 * 54 = 1147392 I/O for data 365 Million 365 Million
  • 32. Billy using Full Scan (theoretical) gain 1. The number of I/O ops for Index on the cluster drops from 365 Million to ~ 1.2 Million ● In reality SSTable Bloom Filters are not perfect so single partitions reads will be attempted on sstables that don’t have the partition - even a bigger win for scans) 1. The number of CQL operations on the cluster drops from 365 Million to ~22K ● Returning a result per partition - 365M / 5000 (page size) = 73K pages (in optimal case) so we will need more than 22K requests. 1. In reality partitions do share data blocks they are not perfectly aligned
  • 33. Putting Data Access into practice ● Queries ● Data model ● Some test data (at small scale) ● Docker ● Scylla-Nightly (Pre Scylla 3.2) ○ Tracing including disk access ‘ ... mc-132-big-Index.db: finished bulk DMA read of size 538 at offset 0, successfully read 4096 bytes [shard 0] ‘ ● A simple script that parses system_trace.events after running a traced query
  • 34. Billy on small scale ■ 1000 sensors, 100 dates, 1 sample per minute ■ 1 M partitions, 1440 M rows ■ # shards 4
  • 35. Billy on small scale ■ 1000 sensors, 100 dates, 1 sample per minute ■ 1 M partitions, 1440 M rows ■ # shards 4 Results Single Partition Range Scan Gain Index I/O ~1.3 M Index Bytes ~2.8 GB Data I/O ~1 M Data Bytes ~14.4 GB
  • 36. Billy on small scale ■ 1000 sensors, 100 dates, 1 sample per minute ■ 1 M partitions, 1440 M rows ■ # shards 4 Results Single Partition Range Scan Gain Index I/O ~1.3 M 3318 X 392 Index Bytes ~2.8 GB ~6.9 M X 424 Data I/O ~1 M 10738 X 93 Data Bytes ~14.4 GB ~1.3 GB X 11
  • 37. Billy using Full Scan gain is even bigger 1. Read aheads for the full scans - utilizing better the disk ● Single Partition Avg Data Byts: 14748600348÷1024089 = ~14.5K ● Range Scan Avg Data Bytes: 1355390291÷10738 = ~126K 1. AIO reads are sent to the disk aligning to Index/Data placement - yet disks do block size reads: ● Doing 2 reads for two halves of a disk block will result in reading the block twice and returning part of it each time.
  • 38. Should Range Scans always be used for analytics ? ■ No ■ If Number of Partitions < Number of Token Ranges * Number of Shards ■ What if we are doing a partial scan - what should we do ? a. Example: What was the max & min temperature over the last 7/30/90 days
  • 39. Billy+: Partial Scan SELECT sensor_id, date, min(temperature) as minTemperature, max(temperature) as maxTemperature FROM readings where token(sensor_id, date) > X and token(sensor_id, date) < Y and date >= Z GROUP BY sensor_id, date ALLOW FILTERING
  • 40. Billy+: Partial Scan ● If we are back to the simplifications: ~7% seems to be a good mark: ○ Partial Scan < 7% data use single partitions ○ Partial Scan > 7% data use full scan and filter ● General case: it depends how big the partitions are ○ Larger partitions have a higher penalty on reading them unnecessarily Single Partition Range Scan Total I/O ~2.3 M 14056 0.6% Total Bytes ~17.7 GB ~1.3 GB 7.7%
  • 41. We deployed we are on the beach and drinking a Mojito
  • 42.
  • 43. Evaluating a data model We need this done faster - for simplicity lets add static min/max for each partition that will cache the info - does this help CREATE TABLE readings ( sensor_id int, date date, time time, temperature float, temp_min float static, temp_max float static, PRIMARY KEY ((sensor_id, date), time))
  • 44. ■ Do range scan’s and use CQL PER PARTITION LIMIT (new in 3.1) SELECT sensor_id, date, temp_min, temp_max FROM readings where token(sensor_id, token_id) > X and token(sensor_id, token_id) < Y PER PARTITION LIMIT 1
  • 45. Results Range Scan Range Scan pre-computed Gain Index I/O 3318 2874 X 1.15 Index Bytes ~6.9 M ~5.9 M X 1.15 Data I/O 10738 3520 X 3.05 Data Bytes ~1.3 GB ~430 M X 3.15
  • 46. Part 2: Scylla 3.1 / 3.2 Additional CQL Features
  • 47. CQL BYPASS CACHE ■ Scylla uses Read-Through caching - if information read is not in the cache it will be added ■ CQL BYPASS CACHE allows overriding that for a specific query - don’t read via the cache / don’t populate the cache
  • 48. CQL PER PARTITION LIMIT Limits the number of rows that are returned for each partition cqlsh:ks> select * from samples ; pk | ck | val ----+----+----- 10 | 1 | 1 10 | 2 | 2 11 | 1 | 3 11 | 2 | 4 cqlsh:ks> select * from samples PER PARTITION LIMIT 1; pk | ck | val ----+----+----- 10 | 1 | 1 11 | 1 | 3
  • 49. CQL GROUP BY ■ The GROUP BY option allows to condense into a single row all selected rows that share the same values for a set of columns (that are limited to partition key + optionally clustering keys) ■ Aggregate functions will produce a separate value for each group. cqlsh:ks> select * from samples ; pk | ck | val ----+----+----- 10 | 1 | 1 10 | 2 | 2 11 | 1 | 3 11 | 2 | 4 select pk, min(val),max (val) from samples GROUP BY PK; pk | system.min(val) | system.max(val) ----+-----------------+-------------- --- 10 | 1 | 2 11 | 3 | 4
  • 50. CQL LIKE ■ Filtering using LIKE syntax ■ No need for indexing cqlsh:ks> select * from samples ; pk | ck | val ----+----+----- 10 | 1 | 1 10 | 2 | 2 11 | 1 | 3 11 | 2 | 4 cqlsh:ks> select * from samples where pk like '%0' ALLOW FILTERING; pk | ck | val ----+----+----- 10 | 1 | 1 10 | 2 | 2
  • 52. ● Disk Access: ○ Is another form that can be used to evaluate data models ○ Its especially useful for the analytics / background batch processing jobs - since those will access data from disk ● Scylla 3.1 includes ○ CQL: ■ BYPASS CACHE( ) ■ PER PARTITION LIMIT ● Upcoming Scylla 3.2 will include: ○ Tracing with Disk Access ○ CQL: ■ GROUP BY ■ LIKE ( ) ■ Non Frozen UDTS (not covered) ● Optimized(*) full scans reduce the overall amount of disk access - when compared to aggregated single partition scans
  • 53. Thank you Stay in touch Any questions? Shlomi Livne shlomi@scylladb.com @shlomilivne

Notas del editor

  1. NoSQL - you need to start with the queries
  2. Dama Model is built to answer those queries
  3. Testing the DataModel and the queries - Some start with c-s ot other simulating tool This is more complex then it sounds - simulating the data distribution and request distribution on the data set is not as simple
  4. Next step is to develop
  5. And once you start you find some queries need to be updated / the data model needs ot be changed
  6. Last year we showed how using Monitoring CQL optimization can have find development bugs earlier
  7. Next - you move to scale testing - trying to emulate the real production dtaa
  8. In this scale test - you find that you may get large partitions - and that changes ...
  9. You delpoy
  10. And you find yourself with hot partitions / large partitions that you may have not detected in scale testing So this requires changes
  11. Disk Access can be done around Data Model verification and can detect some issues detected longer down the line
  12. Billy is the internal code name for the system that Glauber + … presented at the keynote session doing mote than 1B ops per second
  13. Or phrasing it differently - Glauber just showed you we can do it My session is about showing you how we can do it EVEN BETTER
  14. We do expose metrics for disk access yet understanding they are 100% related to a single query is not possible (as such we looked for a different way - not to mislead you)
  15. Index Bytes = 2926800234
  16. BYPASS CACHE goes hand in hand with Workload Prioritization Workload Prioritization assures that the analytic workload co-exists side by side with the online workload BYPASS CACHE allows to enforce this even further to assure that analytics are only done from disk and do not “polute” the cache