SlideShare una empresa de Scribd logo
1 de 42
Descargar para leer sin conexión
Brought to you by
Rockset: Realtime Indexing for
Fast Queries on Massive
Semi-Structured Data
Dhruba Borthakur
CTO at
About Me
■ CTO and Co-Founder of Rockset
■ RocksDB @ Facebook
■ Hadoop File System @ Yahoo
■ HBase & Hive
■ Andrew File System (AFS)
2
Overview of the Talk
I. Changing Landscape of Data Analytics
II. Overview of Rockset’s Aggregator Leaf Tailer (ALT) Architecture
III. Smart Schemas for SQL
IV. Converged Indexing powers SQL
V. Cloud Scaling Architecture
3
Real-time Indexing on massive datasets
for building real-time apps on live data
without ETL or pipelines.
What is Rockset?
4
How is Rockset used?
5
 
 
 
The Aggregator Leaf Tailer (ALT) Architecture
6
https://rockset.com/blog/aggregator-leaf-tailer-an-architecture-for-live-analytics-on-event-streams/
Key Benefits of ALT
1. Makes queries fast: Indexing all data at time of ingest
2. Runs complex queries on the fly: Distributed aggregator tier that scales
compute and memory resources independently
3. It’s cost-effective: Tailer, Leaf, and Aggregator can be independently scaled
up and down
4. Optimizes read & writes in isolation: CQRS ensures that database writes do
not impact queries
7
Key Design Principles of Rockset
1. Converged Indexing
2. Smart Schemas
3. Cloud Native Architecture
● separate compute from storage
● separate write compute from query compute
8
Converged Indexing
9
What and Why converged indexing?
What? Rockset stores every column of every document in a row-based index,
column-store index, AND an inverted index.
Why?
■ No need to configure and maintain indexes
■ No slow queries because of missing indexes
10
INDEX ALL FIELDS
Under the Hood: Converged Indexing
■ Columnar and inverted indexes in the same system
■ Built on top of key-value store abstraction
■ Each document maps to many key-value pairs
11
<doc 0>
{
“name”: “Igor”
}
<doc 1>
{
“name”: “Dhruba”
}
Key Value
R.0.name Igor Row Store
R.1.name Dhruba
C.name.0 Igor Column Store
C.name.1 Dhruba
S.name.Dhruba.1 Inverted index
S.name.Igor.0
Inverted Indexing for point lookups
■ For each value, store documents containing that value (posting list)
■ Quickly retrieve a list of document IDs that match a predicate
“name”
“interests”
Dhruba 1
Igor 0
databases 0.0; 1.1
cars 1.0
snowboarding 0.1
“last_active”
2019/3/15 0
2019/3/22 1
<doc 0>
{
“name”: “Igor”,
“interests”: [“databases”, “snowboarding”],
“last_active”: 2019/3/15
}
<doc 1>
{
“name”: “Dhruba”,
“interests”: [“cars”, “databases”],
“last_active”: 2019/3/22
}
Columnar Store for aggregations
■ Store each column separately
■ Great compression
■ Only fetch columns the query needs
<doc 0>
{
“name”: “Igor”,
“interests”: [“databases”, “snowboarding”],
“last_active”: 2019/3/15
}
<doc 1>
{
“name”: “Dhruba”,
“interests”: [“cars”, “databases”],
“last_active”: 2019/3/22
}
“name”
“interests”
0 Igor
1 Dhruba
0.0 databases
0.1 snowboarding
1.0 cars
1.1 databases
“last_active”
0 2019/3/15
1 2019/3/22
SELECT keyword, count(*)
FROM search_logs
GROUP BY keyword
ORDER BY count(*) DESC
■ Low latency for both highly selective queries and large scans
■ Optimizer picks between columnar store or inverted index
The Power of the Query Optimizer
14
Inverted index
(for highly selective queries)
SELECT *
FROM search_logs
WHERE keyword = ‘hpts’
AND locale = ‘en’
Columnar store
(for large scans)
Challenges with Converged Indexing
■ Maintaining multiple indexes adversely impacts write throughput
■ Challenge 1: one new record = multiple servers updates
● Requires consensus coordination between servers
■ Challenge 2: one new field = multiple random writes
● Requires increased disk I/O
15
Challenge 1: one new record = multiple servers updates
■ In a traditional database with term sharding and n indexes, one write
incurs updates to n different indexes on n servers
■ Requires a distributed transaction (paxos, raft) between n servers
<doc 1>
{
“name”: “Dhruba”,
“interests”: [“cars”, “databases”],
“last_active”: 2019/3/22
}
“interests”
Dhruba 1
Igor 0
databases 0.0; 1.1
cars 1.0
snowboarding 0.1
“last_active”
2019/3/15 0
2019/3/22 1
“name”
Addressing challenge 1: doc sharding
Distributed
Log
Rockset SQL API
Aggregator Aggregator
Leaf
RocksDB
Leaf
RocksDB
Leaf
RocksDB
new
docs
● Updates are durably-buffered
to a distributed log
Addressing challenge 1: doc sharding
■ Updates are durably buffered
to a distributed log
■ Leafs tail only documents in
the shards they are
responsible for
■ Doc sharding means all new
keys will only affect a single
shard/leaf
Distributed
Log
Rockset SQL API
Aggregator Aggregator
Leaf
RocksDB
Leaf
RocksDB
Leaf
RocksDB
new
keys
What is document sharding?
19
Let’s say you were running a search on restaurant reviews in the area….
Traditional distributed databases Rockset
Optimized for query throughput Optimized for query latency
Big data analytics requires you to run complex queries at interactive speed
First, optimize for latency and then optimize for throughput
Restaurant Data Restaurant & Review Data
Review Data
■ Traditional systems use B-tree storage
structure
■ Keys are sorted across tables
■ A single record update with multiple
secondary index would incur writes to
multiple different locations
Storage
Memory Manager
Challenge 2: one new doc = multiple random writes
Memory Buffer
Table 1 Table 2
Table 3 Table 4
new
docs
■ RocksDB uses log-structured merge-tree
(LSM)
■ Multiple record updates accumulate in
memory and written into a single SST file
■ Keys are sorted between SST files via
compaction in a background process
■ Multiple index updates from multiple docs
result in one write to storage
Storage
Memory Manager
Memory Buffer
SST 1 SST 2
SST 3 SST 4
new
docs
Addressing challenge 2: RocksDB LSM
background
compaction
Smart Schema SQL
22
What and Why smart schemas?
What? Automatic generation of a schema based on the exact fields and
types present at the time of ingest. There is no need to ingest the data with
a predefined schema (ie: schemaless ingestion).
Why? Avoid data pipelines that cause data latency
■ Semi-structured data is complex and messy
■ Ingest any semi-structured data (similar to NoSQL)
■ Make it easy to read the data (using standard SQL)
23
Under the Hood: Smart Schemas
■ Type information stored
with values, not “columns”
■ Strongly types queries on
dynamically typed fields
■ Designed for nested
semi-structured data
24
Schema Binding at Query Time
25
Tailers ingest data without
predefined schemas (ie:
schemaless ingest)
Aggregators use the
schema to make queries
fast
schema binding at query
time
Challenges with Smart Schemas
■ Challenge 1: Additional CPU usage for processing queries
■ Challenge 2: Requires increased disk space to store types
26
We use field interning to reduce the space required to store schemas
27
Strict Schema
(Relational DB)
Schema Data Storage
City:
String
name:
String
age: Int
Rancho Santa Margarita Jonathan 35
Rancho Santa Margarita Katherine 25
Schemaless
(JSON DB)
“City”: “Rancho Santa Margarita” “name”: “Jonathan” “age”: 35
“City”: “Rancho Santa Margarita” “Name”: “Katherine” “age”: 25
Rockset
(with field interning)
~30% overhead
0: S “City”
1: S
“Jonathan”
2: S
“Katherine”
3: I 35 4: I 25 City: 0 name: 1 age: 3
City: 0 name: 2 age: 4
= Amount of storage
We use type hoisting to reduce CPU required to run queries
28
Strict schema 1 10 7 4 5
a b c d e
Schemaless
Rockset
(with type
hoisting)
Rows
I 1 I 10 I 7 I 4 I 5
S a S b S c S d S e
I 1 10 7 4 5
S a b c d e
Rockset query price/performance is on par with strict schema systems
Cloud Scaling Architecture
29
● Cost of 1 cpu for 100 minutes == Cost of 100 cpu for 1 minute!!
Key insight into economics of cloud
■ Cost of 1 cpu for 100 minutes == Cost of 100 cpu for 1 minute!!
● Without cloud: statically provision for peak demand
● With cloud: dynamically provision for current demand
Key insight into economics of cloud
■ Cost of 1 cpu for 100 minutes == Cost of 100 cpu for 1 minute!!
● Without cloud: statically provision for peak demand
● With cloud: dynamically provision for current demand
■ Goal: resources scale up and down as needed to achieve desired performance
Key insight into economics of cloud
If your query is slow, the challenge is in the software you are using.
What and Why cloud autoscaling?
What? Each tailer, leaf, or aggregator can be independently scaled up and
down as needed. Tailers scale when more data to ingest, leaves scale
when data size grows and aggregators scale when query volume
increases
Why?
■ No provisioning
■ Pay for usage
■ No need to provision for peak capacity
33
We are going to focus on scaling out the leaf
34
Scaling tailers and
aggregators are easy
because they are stateless
Scaling leaf nodes is tricky
because they are stateful
Quotation from Dr David Dewitt
David J. DeWitt
MIT
Willis Lang
Microsoft Jim Gray Systems Lab
35
© Permission to use contents only with attribution to the authors
(The End of ”Shared-Nothing”)
2017
Document Sharding Diagram in ALT
36
Rockset SQL API
Aggregator Aggregator
Leaf
RocksDB
Leaf
RocksDB
Leaf
RocksDB
Tailer We shard the
documents across
leaves
An entire document
including all its fields
and indices reside on
a single leaf server
Scale down leaves: Use durability of cloud storage
37
Rockset SQL API
Aggregator Aggregator
Object Storage (AWS S3, GCS, ...)
Leaf
RocksDB-Cloud
RocksDB
Leaf
RocksDB-Cloud
RocksDB
Leaf
RocksDB-Cloud
RocksDB
Tailer
SST files
SST files SST files
SST files are uploaded to
the cloud storage
Durability is achieved by
keeping data in cloud
storage
No data loss even if all
leaf servers crash and
burn!
Leaf
38
Leaf
RocksDB-Cloud
Object Storage (AWS S3, GCS, ...)
Rockset SQL API
Aggregator Aggregator
Leaf
RocksDB-Cloud
Leaf
RocksDB-Cloud
Tailer
RocksDB RocksDB RocksDB
SST files SST files
Scale up new leaves: use zero-copy clones of rocksdb-cloud
SST files
RocksDB
RocksDB-Cloud
SST files
No performance
impact on
existing leaf
servers
Separate write compute from query compute
39
Remote compaction in rocksdb-cloud
https://rockset.com/blog/remote-compactions-in-rocksdb-cloud/
Recap: ALT Architecture provides SQL on json
40
No need to manage indexes with converged indexing
No need to pre-define schemas with smart schemas
No need to provision servers with cloud scaling architecture
Summary: Analytics Applications on massive datasets
I. Index rather than partition-and-scan
II. Separate write compute from query compute
III. Optimized for low data latency, low query latency, and high QPS
41
Transaction
Processing
Analytics
Processing
Operational
Analytics
Processing
https://rockset.com/blog/operational-analytics-what-every-software-engineer-should-know/
Brought to you by
Dhruba Borthakur
dhruba@rockset.com
@dhruba_rocks
Community:
https://community.rockset.com/
Check us out: http://rockset.com

Más contenido relacionado

La actualidad más candente

Data Security at Scale through Spark and Parquet Encryption
Data Security at Scale through Spark and Parquet EncryptionData Security at Scale through Spark and Parquet Encryption
Data Security at Scale through Spark and Parquet Encryption
Databricks
 
Scaling Redis To 1M Ops/Sec: Jane Paek
Scaling Redis To 1M Ops/Sec: Jane PaekScaling Redis To 1M Ops/Sec: Jane Paek
Scaling Redis To 1M Ops/Sec: Jane Paek
Redis Labs
 

La actualidad más candente (20)

Databricks on AWS.pptx
Databricks on AWS.pptxDatabricks on AWS.pptx
Databricks on AWS.pptx
 
[pgday.Seoul 2022] PostgreSQL with Google Cloud
[pgday.Seoul 2022] PostgreSQL with Google Cloud[pgday.Seoul 2022] PostgreSQL with Google Cloud
[pgday.Seoul 2022] PostgreSQL with Google Cloud
 
Building Applications with a Graph Database
Building Applications with a Graph DatabaseBuilding Applications with a Graph Database
Building Applications with a Graph Database
 
On Improving Broadcast Joins in Apache Spark SQL
On Improving Broadcast Joins in Apache Spark SQLOn Improving Broadcast Joins in Apache Spark SQL
On Improving Broadcast Joins in Apache Spark SQL
 
Hyperspace for Delta Lake
Hyperspace for Delta LakeHyperspace for Delta Lake
Hyperspace for Delta Lake
 
Spark SQL
Spark SQLSpark SQL
Spark SQL
 
PySpark dataframe
PySpark dataframePySpark dataframe
PySpark dataframe
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
 
Data Security at Scale through Spark and Parquet Encryption
Data Security at Scale through Spark and Parquet EncryptionData Security at Scale through Spark and Parquet Encryption
Data Security at Scale through Spark and Parquet Encryption
 
Programming in Spark using PySpark
Programming in Spark using PySpark      Programming in Spark using PySpark
Programming in Spark using PySpark
 
Power Real Estate Property Analytics with MongoDB + Spark
Power Real Estate Property Analytics with MongoDB + SparkPower Real Estate Property Analytics with MongoDB + Spark
Power Real Estate Property Analytics with MongoDB + Spark
 
Scaling Redis To 1M Ops/Sec: Jane Paek
Scaling Redis To 1M Ops/Sec: Jane PaekScaling Redis To 1M Ops/Sec: Jane Paek
Scaling Redis To 1M Ops/Sec: Jane Paek
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
 
2022-06-23 Apache Arrow and DataFusion_ Changing the Game for implementing Da...
2022-06-23 Apache Arrow and DataFusion_ Changing the Game for implementing Da...2022-06-23 Apache Arrow and DataFusion_ Changing the Game for implementing Da...
2022-06-23 Apache Arrow and DataFusion_ Changing the Game for implementing Da...
 
What is AWS Glue
What is AWS GlueWhat is AWS Glue
What is AWS Glue
 
Deep Dive on Amazon Aurora
Deep Dive on Amazon AuroraDeep Dive on Amazon Aurora
Deep Dive on Amazon Aurora
 
Presto query optimizer: pursuit of performance
Presto query optimizer: pursuit of performancePresto query optimizer: pursuit of performance
Presto query optimizer: pursuit of performance
 
026 Neo4j Data Loading (ETL_ELT) Best Practices - NODES2022 AMERICAS Advanced...
026 Neo4j Data Loading (ETL_ELT) Best Practices - NODES2022 AMERICAS Advanced...026 Neo4j Data Loading (ETL_ELT) Best Practices - NODES2022 AMERICAS Advanced...
026 Neo4j Data Loading (ETL_ELT) Best Practices - NODES2022 AMERICAS Advanced...
 
AWS Aurora 100% 활용하기
AWS Aurora 100% 활용하기AWS Aurora 100% 활용하기
AWS Aurora 100% 활용하기
 
Amazon Aurora and AWS Database Migration Service
Amazon Aurora and AWS Database Migration ServiceAmazon Aurora and AWS Database Migration Service
Amazon Aurora and AWS Database Migration Service
 

Similar a Realtime Indexing for Fast Queries on Massive Semi-Structured Data

Apache Solr as a compressed, scalable, and high performance time series database
Apache Solr as a compressed, scalable, and high performance time series databaseApache Solr as a compressed, scalable, and high performance time series database
Apache Solr as a compressed, scalable, and high performance time series database
Florian Lautenschlager
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
confluent
 

Similar a Realtime Indexing for Fast Queries on Massive Semi-Structured Data (20)

Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
 
Deploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSDeploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWS
 
Simplifying Change Data Capture using Databricks Delta
Simplifying Change Data Capture using Databricks DeltaSimplifying Change Data Capture using Databricks Delta
Simplifying Change Data Capture using Databricks Delta
 
Apache Solr as a compressed, scalable, and high performance time series database
Apache Solr as a compressed, scalable, and high performance time series databaseApache Solr as a compressed, scalable, and high performance time series database
Apache Solr as a compressed, scalable, and high performance time series database
 
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsHybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGs
 
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
 
Black friday logs - Scaling Elasticsearch
Black friday logs - Scaling ElasticsearchBlack friday logs - Scaling Elasticsearch
Black friday logs - Scaling Elasticsearch
 
Managing your black friday logs Voxxed Luxembourg
Managing your black friday logs Voxxed LuxembourgManaging your black friday logs Voxxed Luxembourg
Managing your black friday logs Voxxed Luxembourg
 
Jump Start on Apache Spark 2.2 with Databricks
Jump Start on Apache Spark 2.2 with DatabricksJump Start on Apache Spark 2.2 with Databricks
Jump Start on Apache Spark 2.2 with Databricks
 
Suburface 2021 IBM Cloud Data Lake
Suburface 2021 IBM Cloud Data LakeSuburface 2021 IBM Cloud Data Lake
Suburface 2021 IBM Cloud Data Lake
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
Elastic Stack Introduction
Elastic Stack IntroductionElastic Stack Introduction
Elastic Stack Introduction
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
 
Big Data Analytics from Azure Cloud to Power BI Mobile
Big Data Analytics from Azure Cloud to Power BI MobileBig Data Analytics from Azure Cloud to Power BI Mobile
Big Data Analytics from Azure Cloud to Power BI Mobile
 
Modernise your Data Warehouse - AWS Summit Sydney 2018
Modernise your Data Warehouse - AWS Summit Sydney 2018Modernise your Data Warehouse - AWS Summit Sydney 2018
Modernise your Data Warehouse - AWS Summit Sydney 2018
 
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
 
No SQL and MongoDB - Hyderabad Scalability Meetup
No SQL and MongoDB - Hyderabad Scalability MeetupNo SQL and MongoDB - Hyderabad Scalability Meetup
No SQL and MongoDB - Hyderabad Scalability Meetup
 
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
 
Spark and Couchbase: Augmenting the Operational Database with Spark
Spark and Couchbase: Augmenting the Operational Database with SparkSpark and Couchbase: Augmenting the Operational Database with Spark
Spark and Couchbase: Augmenting the Operational Database with Spark
 

Más de ScyllaDB

Más de ScyllaDB (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQL
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & Pitfalls
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
 
ScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual Workshop
 
DBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsDBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & Tradeoffs
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDB
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 

Realtime Indexing for Fast Queries on Massive Semi-Structured Data

  • 1. Brought to you by Rockset: Realtime Indexing for Fast Queries on Massive Semi-Structured Data Dhruba Borthakur CTO at
  • 2. About Me ■ CTO and Co-Founder of Rockset ■ RocksDB @ Facebook ■ Hadoop File System @ Yahoo ■ HBase & Hive ■ Andrew File System (AFS) 2
  • 3. Overview of the Talk I. Changing Landscape of Data Analytics II. Overview of Rockset’s Aggregator Leaf Tailer (ALT) Architecture III. Smart Schemas for SQL IV. Converged Indexing powers SQL V. Cloud Scaling Architecture 3
  • 4. Real-time Indexing on massive datasets for building real-time apps on live data without ETL or pipelines. What is Rockset? 4
  • 5. How is Rockset used? 5      
  • 6. The Aggregator Leaf Tailer (ALT) Architecture 6 https://rockset.com/blog/aggregator-leaf-tailer-an-architecture-for-live-analytics-on-event-streams/
  • 7. Key Benefits of ALT 1. Makes queries fast: Indexing all data at time of ingest 2. Runs complex queries on the fly: Distributed aggregator tier that scales compute and memory resources independently 3. It’s cost-effective: Tailer, Leaf, and Aggregator can be independently scaled up and down 4. Optimizes read & writes in isolation: CQRS ensures that database writes do not impact queries 7
  • 8. Key Design Principles of Rockset 1. Converged Indexing 2. Smart Schemas 3. Cloud Native Architecture ● separate compute from storage ● separate write compute from query compute 8
  • 10. What and Why converged indexing? What? Rockset stores every column of every document in a row-based index, column-store index, AND an inverted index. Why? ■ No need to configure and maintain indexes ■ No slow queries because of missing indexes 10 INDEX ALL FIELDS
  • 11. Under the Hood: Converged Indexing ■ Columnar and inverted indexes in the same system ■ Built on top of key-value store abstraction ■ Each document maps to many key-value pairs 11 <doc 0> { “name”: “Igor” } <doc 1> { “name”: “Dhruba” } Key Value R.0.name Igor Row Store R.1.name Dhruba C.name.0 Igor Column Store C.name.1 Dhruba S.name.Dhruba.1 Inverted index S.name.Igor.0
  • 12. Inverted Indexing for point lookups ■ For each value, store documents containing that value (posting list) ■ Quickly retrieve a list of document IDs that match a predicate “name” “interests” Dhruba 1 Igor 0 databases 0.0; 1.1 cars 1.0 snowboarding 0.1 “last_active” 2019/3/15 0 2019/3/22 1 <doc 0> { “name”: “Igor”, “interests”: [“databases”, “snowboarding”], “last_active”: 2019/3/15 } <doc 1> { “name”: “Dhruba”, “interests”: [“cars”, “databases”], “last_active”: 2019/3/22 }
  • 13. Columnar Store for aggregations ■ Store each column separately ■ Great compression ■ Only fetch columns the query needs <doc 0> { “name”: “Igor”, “interests”: [“databases”, “snowboarding”], “last_active”: 2019/3/15 } <doc 1> { “name”: “Dhruba”, “interests”: [“cars”, “databases”], “last_active”: 2019/3/22 } “name” “interests” 0 Igor 1 Dhruba 0.0 databases 0.1 snowboarding 1.0 cars 1.1 databases “last_active” 0 2019/3/15 1 2019/3/22
  • 14. SELECT keyword, count(*) FROM search_logs GROUP BY keyword ORDER BY count(*) DESC ■ Low latency for both highly selective queries and large scans ■ Optimizer picks between columnar store or inverted index The Power of the Query Optimizer 14 Inverted index (for highly selective queries) SELECT * FROM search_logs WHERE keyword = ‘hpts’ AND locale = ‘en’ Columnar store (for large scans)
  • 15. Challenges with Converged Indexing ■ Maintaining multiple indexes adversely impacts write throughput ■ Challenge 1: one new record = multiple servers updates ● Requires consensus coordination between servers ■ Challenge 2: one new field = multiple random writes ● Requires increased disk I/O 15
  • 16. Challenge 1: one new record = multiple servers updates ■ In a traditional database with term sharding and n indexes, one write incurs updates to n different indexes on n servers ■ Requires a distributed transaction (paxos, raft) between n servers <doc 1> { “name”: “Dhruba”, “interests”: [“cars”, “databases”], “last_active”: 2019/3/22 } “interests” Dhruba 1 Igor 0 databases 0.0; 1.1 cars 1.0 snowboarding 0.1 “last_active” 2019/3/15 0 2019/3/22 1 “name”
  • 17. Addressing challenge 1: doc sharding Distributed Log Rockset SQL API Aggregator Aggregator Leaf RocksDB Leaf RocksDB Leaf RocksDB new docs ● Updates are durably-buffered to a distributed log
  • 18. Addressing challenge 1: doc sharding ■ Updates are durably buffered to a distributed log ■ Leafs tail only documents in the shards they are responsible for ■ Doc sharding means all new keys will only affect a single shard/leaf Distributed Log Rockset SQL API Aggregator Aggregator Leaf RocksDB Leaf RocksDB Leaf RocksDB new keys
  • 19. What is document sharding? 19 Let’s say you were running a search on restaurant reviews in the area…. Traditional distributed databases Rockset Optimized for query throughput Optimized for query latency Big data analytics requires you to run complex queries at interactive speed First, optimize for latency and then optimize for throughput Restaurant Data Restaurant & Review Data Review Data
  • 20. ■ Traditional systems use B-tree storage structure ■ Keys are sorted across tables ■ A single record update with multiple secondary index would incur writes to multiple different locations Storage Memory Manager Challenge 2: one new doc = multiple random writes Memory Buffer Table 1 Table 2 Table 3 Table 4 new docs
  • 21. ■ RocksDB uses log-structured merge-tree (LSM) ■ Multiple record updates accumulate in memory and written into a single SST file ■ Keys are sorted between SST files via compaction in a background process ■ Multiple index updates from multiple docs result in one write to storage Storage Memory Manager Memory Buffer SST 1 SST 2 SST 3 SST 4 new docs Addressing challenge 2: RocksDB LSM background compaction
  • 23. What and Why smart schemas? What? Automatic generation of a schema based on the exact fields and types present at the time of ingest. There is no need to ingest the data with a predefined schema (ie: schemaless ingestion). Why? Avoid data pipelines that cause data latency ■ Semi-structured data is complex and messy ■ Ingest any semi-structured data (similar to NoSQL) ■ Make it easy to read the data (using standard SQL) 23
  • 24. Under the Hood: Smart Schemas ■ Type information stored with values, not “columns” ■ Strongly types queries on dynamically typed fields ■ Designed for nested semi-structured data 24
  • 25. Schema Binding at Query Time 25 Tailers ingest data without predefined schemas (ie: schemaless ingest) Aggregators use the schema to make queries fast schema binding at query time
  • 26. Challenges with Smart Schemas ■ Challenge 1: Additional CPU usage for processing queries ■ Challenge 2: Requires increased disk space to store types 26
  • 27. We use field interning to reduce the space required to store schemas 27 Strict Schema (Relational DB) Schema Data Storage City: String name: String age: Int Rancho Santa Margarita Jonathan 35 Rancho Santa Margarita Katherine 25 Schemaless (JSON DB) “City”: “Rancho Santa Margarita” “name”: “Jonathan” “age”: 35 “City”: “Rancho Santa Margarita” “Name”: “Katherine” “age”: 25 Rockset (with field interning) ~30% overhead 0: S “City” 1: S “Jonathan” 2: S “Katherine” 3: I 35 4: I 25 City: 0 name: 1 age: 3 City: 0 name: 2 age: 4 = Amount of storage
  • 28. We use type hoisting to reduce CPU required to run queries 28 Strict schema 1 10 7 4 5 a b c d e Schemaless Rockset (with type hoisting) Rows I 1 I 10 I 7 I 4 I 5 S a S b S c S d S e I 1 10 7 4 5 S a b c d e Rockset query price/performance is on par with strict schema systems
  • 30. ● Cost of 1 cpu for 100 minutes == Cost of 100 cpu for 1 minute!! Key insight into economics of cloud
  • 31. ■ Cost of 1 cpu for 100 minutes == Cost of 100 cpu for 1 minute!! ● Without cloud: statically provision for peak demand ● With cloud: dynamically provision for current demand Key insight into economics of cloud
  • 32. ■ Cost of 1 cpu for 100 minutes == Cost of 100 cpu for 1 minute!! ● Without cloud: statically provision for peak demand ● With cloud: dynamically provision for current demand ■ Goal: resources scale up and down as needed to achieve desired performance Key insight into economics of cloud If your query is slow, the challenge is in the software you are using.
  • 33. What and Why cloud autoscaling? What? Each tailer, leaf, or aggregator can be independently scaled up and down as needed. Tailers scale when more data to ingest, leaves scale when data size grows and aggregators scale when query volume increases Why? ■ No provisioning ■ Pay for usage ■ No need to provision for peak capacity 33
  • 34. We are going to focus on scaling out the leaf 34 Scaling tailers and aggregators are easy because they are stateless Scaling leaf nodes is tricky because they are stateful
  • 35. Quotation from Dr David Dewitt David J. DeWitt MIT Willis Lang Microsoft Jim Gray Systems Lab 35 © Permission to use contents only with attribution to the authors (The End of ”Shared-Nothing”) 2017
  • 36. Document Sharding Diagram in ALT 36 Rockset SQL API Aggregator Aggregator Leaf RocksDB Leaf RocksDB Leaf RocksDB Tailer We shard the documents across leaves An entire document including all its fields and indices reside on a single leaf server
  • 37. Scale down leaves: Use durability of cloud storage 37 Rockset SQL API Aggregator Aggregator Object Storage (AWS S3, GCS, ...) Leaf RocksDB-Cloud RocksDB Leaf RocksDB-Cloud RocksDB Leaf RocksDB-Cloud RocksDB Tailer SST files SST files SST files SST files are uploaded to the cloud storage Durability is achieved by keeping data in cloud storage No data loss even if all leaf servers crash and burn!
  • 38. Leaf 38 Leaf RocksDB-Cloud Object Storage (AWS S3, GCS, ...) Rockset SQL API Aggregator Aggregator Leaf RocksDB-Cloud Leaf RocksDB-Cloud Tailer RocksDB RocksDB RocksDB SST files SST files Scale up new leaves: use zero-copy clones of rocksdb-cloud SST files RocksDB RocksDB-Cloud SST files No performance impact on existing leaf servers
  • 39. Separate write compute from query compute 39 Remote compaction in rocksdb-cloud https://rockset.com/blog/remote-compactions-in-rocksdb-cloud/
  • 40. Recap: ALT Architecture provides SQL on json 40 No need to manage indexes with converged indexing No need to pre-define schemas with smart schemas No need to provision servers with cloud scaling architecture
  • 41. Summary: Analytics Applications on massive datasets I. Index rather than partition-and-scan II. Separate write compute from query compute III. Optimized for low data latency, low query latency, and high QPS 41 Transaction Processing Analytics Processing Operational Analytics Processing https://rockset.com/blog/operational-analytics-what-every-software-engineer-should-know/
  • 42. Brought to you by Dhruba Borthakur dhruba@rockset.com @dhruba_rocks Community: https://community.rockset.com/ Check us out: http://rockset.com