SlideShare a Scribd company logo
1 of 45
Download to read offline
Materialized Views
Carl Yeksigian
What are Materialized Views?
• Two copies of the data using different partitioning and placed
on different replicas
• Automated, server-side denormalization of data
• Native Cassandra read performance
• Write penalty, but acceptable performance
Basic Rules of Data Modeling, Refresher
• Best practice: Denormalization
• Start by understanding the queries you need
• Create a table for each query
Why is Denormalization Hard?
• Implemented by every application
• No guarantees on performance or consistency
• Updates to existing rows require cleanup, read-before-write
Denormalization Example: User Playlists
Queries
• All Songs for a given
playlist
• Track Users who like
the same Artist
• Find most recently
played song
Denormalization in Practice
CREATE TABLE playlists
(user_name text,
playlist_name text,
song_id text,
artist_name text,
last_played timestamp)
SELECT song_id
FROM playlists
WHERE user_name=‘carl’
AND playlist_name=‘jams’
SELECT COUNT(song_id)
FROM playlists
WHERE artist_name=‘Weezer’
SELECT last_played, song_id
FROM playlists
WHERE user_name=‘carl’
AND playlist_name=‘jams’
ORDER BY last_played DESC
Denormalization in Practice
CREATE TABLE playlists
(user_name text,
playlist_name text,
song_id text,
artist_name text,
last_played timestamp,
PRIMARY KEY (user_name, playlist_name, song_id))
CREATE TABLE artists_to_playlists
(artist_name text,
user_name text,
playlist_name text,
song_id text,
PRIMARY KEY (artist_name, user_name, playlist_name, song_id))
Denormalization in Practice
CREATE TABLE last_played
(user_name text,
playlist_name text,
last_played timestamp,
song_id text,
PRIMARY KEY (user_name, playlist_name, last_played, song_id))
CLUSTERING ORDER BY (last_played DESC)
Denormalization in Practice: Inserts
BEGIN BATCH
INSERT INTO playlists
(user_name, playlist_name, song_id, artist_name, last_played)
VALUES (‘carl’, ‘jams’, ‘Undone’, ‘Weezer’, ‘2015-09-24 09:00’);
INSERT INTO artists_by_playlist
(artist_name, user_name, playlist_name, song_id)
VALUES (‘Weezer’, ‘carl’, ‘jams’, ‘Undone’);
INSERT INTO last_played
(user_name, playlist_name, last_played, song_id)
VALUES (‘carl’, ‘jams’, ‘Undone’, ‘2015-09-24 09:00’);
APPLY BATCH;
Denormalization in Concept: Updates
UPDATE playlists
SET last_updated=now()
WHERE user_name=‘carl’
AND playlist_name=‘jams’
AND song_id=‘Undone’
DELETE FROM playlists
WHERE user=‘carl’
Manual Denormalization with updates
Client Batchlog
Base Table
View Table
Coordinator
Manual Denormalization with updates
Client Batchlog
Base Table
View Table
Coordinator
Query Existing Data
Manual Denormalization with updates
Client Batchlog
Base Table
View Table
Coordinator
Query Existing Data
Query Existing Data
Manual Denormalization with updates
Client Batchlog
Base Table
View Table
Coordinator
Query Existing Data
Query Existing Data Return Existing Data
Return Existing Data
Manual Denormalization with updates
Client Batchlog
Base Table
View Table
Coordinator
Write New Values
Manual Denormalization with updates
Client Batchlog
Base Table
View Table
Coordinator
Write New Values Write New Values
Manual Denormalization with updates
Client Batchlog
View Table
Coordinator
Write New Values Write New Values
Base Table
Write New Values
Manual Denormalization with updates
Manual Denormalization Limitations
• Updates need to keep view in sync, including tombstoning
previous values
• How to keep the view and base in sync on failure?
• Contentious updates can potentially cause extra values
• Your application doesn’t always know what is a update or an
insert (i.e. upsert)
Manual Denormalization: Contentious Updates
Client 1
Cassandra
Client 2
playlists:(‘carl’, ‘jams’, ‘Undone’, 2015-09-24 9:00)
last_played:(‘carl’, ‘jams’, 2015-09-24 9:00, ‘Undone’)
Query existing last_played
Query existing last_played
Manual Denormalization: Contentious Updates
Client 1
Cassandra
Client 2
playlists:(‘carl’, ‘jams’, ‘Undone’, 2015-09-24 9:02)
last_played:(‘carl’, ‘jams’, 2015-09-24 9:01, ‘Undone’)
last_played:(‘carl’, ‘jams’, 2015-09-24 9:02, ‘Undone’)
Update last played time to 9:02
last_played: 2015-09-24 9:00
last_played: 2015-09-24 9:00
Update last played time to 9:01
Manual Denormalization Limitations
Materialized Views
• Provide automated server-side denormalization
• No need for read-before-write on the application side
• Simplify application code
• Provide safer guarantees
Materialized Views: Guarantees
• If a write is acknowledged, at least CL number of base and
view replicas will receive the write
• If a write is actually an update, the previous value will be
cleaned up in the view
• Even with contentious updates, view synchronized with base
for each update
• Takes care of deletions properly
• When a base table is repaired, the data will also be inserted
into the view
• TTL’d base data will remain TTL’d in view
Why Not Just Use Secondary Indexes?
• We can get most of the same functionality by using
secondary indexes
• Secondary indexes query each node, not being able to use
the ring
• On a node, not a single access
Secondary Indexes: Query Pattern
Client
Secondary Indexes: Query Pattern
Client
Secondary Indexes: Query Pattern
Client
Materialized Views: Query Pattern
Client
Materialized Views in Practice
CREATE TABLE playlists
(user_name text,
playlist_name text,
song_id text,
artist_name text,
last_played timestamp,
PRIMARY KEY (user_name, playlist_name, song_id))
Materialized Views in Practice
CREATE MATERIALIZED VIEW artist_to_user AS
SELECT song_id, user_name
FROM playlists
WHERE song_id IS NOT NULL
AND playlist_name IS NOT NULL
AND user_name IS NOT NULL
AND artist_name IS NOT NULL
PRIMARY KEY (artist_name, user_name, playlist_name, song_id)
Replica Placement
user_name:carl
Replica Placement
artist_name:Weezer
user_name:carl
Materialized Views in Practice
• On creation, a new materialized view will be populated with
existing base data
• Each node tracks the completion of the build independently
SELECT *
FROM system.built_views
WHERE keyspace=‘ks’
AND view_name=‘view’
Materialized Views in Practice
CREATE MATERIALIZED VIEW last_played AS
SELECT last_played, song_id
FROM playlists
WHERE user_name IS NOT NULL
AND last_played IS NOT NULL
AND song_id IS NOT NULL
PRIMARY KEY (user_name, playlist_name, last_played, song_id)
CLUSTERING ORDER BY (last_played DESC)
Materialized Views: Performance
• https://github.com/tjake/mvbench
• Uses java-driver to simulate MV and manual denormalization
Materialized Views: Performance (ops/s)
Manual Denormalization
Materialized Views
Materialized Views: Performance (p95 latency)
Manual Denormalization
Materialized Views
• Adding WHERE clause support (#9664)
CREATE MATERIALIZED VIEW carls_last_played AS
SELECT last_played, song_id
FROM plays
WHERE user_name=‘carl’
PRIMARY KEY (last_played, song_id)
• Knowing when a view is completely finished building without
querying each node (#9967)
• Insert only tables can skip read-before-write, lock acquisition
(#9779)
Materialized Views: Future Features
write<p1, p2, c1, c2, v1>
Node A Node B Node C
Coordinator
Base
View
Node D
write<p1, p2, c1, c2, v1>
del<v0, p1, p2, c1, c2>
Client
write<v1, p1, p2, c1, c2>
BL
Materialized Views: Under the Hood
• If update is partial, we will reinsert data from the read when
generating a new row
• If no tombstone generated, only new columns are written
to view
Materialized Views: The Edge
• Materialized Views have different failure properties than the
rest of the system
• Data from a single base replica can be on many view replicas
Materialized Views: The Edge
• When data is lost on all replicas of the base table, it can not
be cleaned up in the view (#10346)
• No read repair between the base and the view table
• Repair of base table will clean up view
• Requires local read-before-write.
• If you will never ever update/delete use manual MVs
Materialized Views: The Edge
• An update from a base table is asynchronously applied to the
view, so it is possible there will be a delay
• A MV on a low-cardinality table can cause hotspots in the
ring, overloading some nodes
Thanks

More Related Content

What's hot

Spring - Part 1 - IoC, Di and Beans
Spring - Part 1 - IoC, Di and Beans Spring - Part 1 - IoC, Di and Beans
Spring - Part 1 - IoC, Di and Beans Hitesh-Java
 
MongoDB World 2019: Tips and Tricks++ for Querying and Indexing MongoDB
MongoDB World 2019: Tips and Tricks++ for Querying and Indexing MongoDBMongoDB World 2019: Tips and Tricks++ for Querying and Indexing MongoDB
MongoDB World 2019: Tips and Tricks++ for Querying and Indexing MongoDBMongoDB
 
Shell Scripting Tutorial | Edureka
Shell Scripting Tutorial | EdurekaShell Scripting Tutorial | Edureka
Shell Scripting Tutorial | EdurekaEdureka!
 
MySQL Performance Schema in Action: the Complete Tutorial
MySQL Performance Schema in Action: the Complete TutorialMySQL Performance Schema in Action: the Complete Tutorial
MySQL Performance Schema in Action: the Complete TutorialSveta Smirnova
 
MongoDB World 2019: The Sights (and Smells) of a Bad Query
MongoDB World 2019: The Sights (and Smells) of a Bad QueryMongoDB World 2019: The Sights (and Smells) of a Bad Query
MongoDB World 2019: The Sights (and Smells) of a Bad QueryMongoDB
 
Postgresql 12 streaming replication hol
Postgresql 12 streaming replication holPostgresql 12 streaming replication hol
Postgresql 12 streaming replication holVijay Kumar N
 
[오픈소스컨설팅]Zabbix Installation and Configuration Guide
[오픈소스컨설팅]Zabbix Installation and Configuration Guide[오픈소스컨설팅]Zabbix Installation and Configuration Guide
[오픈소스컨설팅]Zabbix Installation and Configuration GuideJi-Woong Choi
 
Flexviews materialized views for my sql
Flexviews materialized views for my sqlFlexviews materialized views for my sql
Flexviews materialized views for my sqlJustin Swanhart
 
Building an Interactive Query Service in Kafka Streams With Bill Bejeck | Cur...
Building an Interactive Query Service in Kafka Streams With Bill Bejeck | Cur...Building an Interactive Query Service in Kafka Streams With Bill Bejeck | Cur...
Building an Interactive Query Service in Kafka Streams With Bill Bejeck | Cur...HostedbyConfluent
 
Advanced MySQL Query Tuning
Advanced MySQL Query TuningAdvanced MySQL Query Tuning
Advanced MySQL Query TuningAlexander Rubin
 
Introduction to GraphQL using Nautobot and Arista cEOS
Introduction to GraphQL using Nautobot and Arista cEOSIntroduction to GraphQL using Nautobot and Arista cEOS
Introduction to GraphQL using Nautobot and Arista cEOSJoel W. King
 
MongoDB Performance Tuning
MongoDB Performance TuningMongoDB Performance Tuning
MongoDB Performance TuningPuneet Behl
 
Database Anti Patterns
Database Anti PatternsDatabase Anti Patterns
Database Anti PatternsRobert Treat
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkTimothy Spann
 
[오픈소스컨설팅] EFK Stack 소개와 설치 방법
[오픈소스컨설팅] EFK Stack 소개와 설치 방법[오픈소스컨설팅] EFK Stack 소개와 설치 방법
[오픈소스컨설팅] EFK Stack 소개와 설치 방법Open Source Consulting
 
MySQL Performance Tuning: Top 10 Tips
MySQL Performance Tuning: Top 10 TipsMySQL Performance Tuning: Top 10 Tips
MySQL Performance Tuning: Top 10 TipsOSSCube
 

What's hot (20)

Spring - Part 1 - IoC, Di and Beans
Spring - Part 1 - IoC, Di and Beans Spring - Part 1 - IoC, Di and Beans
Spring - Part 1 - IoC, Di and Beans
 
MongoDB World 2019: Tips and Tricks++ for Querying and Indexing MongoDB
MongoDB World 2019: Tips and Tricks++ for Querying and Indexing MongoDBMongoDB World 2019: Tips and Tricks++ for Querying and Indexing MongoDB
MongoDB World 2019: Tips and Tricks++ for Querying and Indexing MongoDB
 
Preparing for Scala 3
Preparing for Scala 3Preparing for Scala 3
Preparing for Scala 3
 
Cassandra compaction
Cassandra compactionCassandra compaction
Cassandra compaction
 
Shell Scripting Tutorial | Edureka
Shell Scripting Tutorial | EdurekaShell Scripting Tutorial | Edureka
Shell Scripting Tutorial | Edureka
 
MySQL Performance Schema in Action: the Complete Tutorial
MySQL Performance Schema in Action: the Complete TutorialMySQL Performance Schema in Action: the Complete Tutorial
MySQL Performance Schema in Action: the Complete Tutorial
 
MongoDB World 2019: The Sights (and Smells) of a Bad Query
MongoDB World 2019: The Sights (and Smells) of a Bad QueryMongoDB World 2019: The Sights (and Smells) of a Bad Query
MongoDB World 2019: The Sights (and Smells) of a Bad Query
 
Postgresql 12 streaming replication hol
Postgresql 12 streaming replication holPostgresql 12 streaming replication hol
Postgresql 12 streaming replication hol
 
[오픈소스컨설팅]Zabbix Installation and Configuration Guide
[오픈소스컨설팅]Zabbix Installation and Configuration Guide[오픈소스컨설팅]Zabbix Installation and Configuration Guide
[오픈소스컨설팅]Zabbix Installation and Configuration Guide
 
Flexviews materialized views for my sql
Flexviews materialized views for my sqlFlexviews materialized views for my sql
Flexviews materialized views for my sql
 
Building an Interactive Query Service in Kafka Streams With Bill Bejeck | Cur...
Building an Interactive Query Service in Kafka Streams With Bill Bejeck | Cur...Building an Interactive Query Service in Kafka Streams With Bill Bejeck | Cur...
Building an Interactive Query Service in Kafka Streams With Bill Bejeck | Cur...
 
Advanced MySQL Query Tuning
Advanced MySQL Query TuningAdvanced MySQL Query Tuning
Advanced MySQL Query Tuning
 
Introduction to GraphQL using Nautobot and Arista cEOS
Introduction to GraphQL using Nautobot and Arista cEOSIntroduction to GraphQL using Nautobot and Arista cEOS
Introduction to GraphQL using Nautobot and Arista cEOS
 
MongoDB Performance Tuning
MongoDB Performance TuningMongoDB Performance Tuning
MongoDB Performance Tuning
 
Database Anti Patterns
Database Anti PatternsDatabase Anti Patterns
Database Anti Patterns
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
 
How to Use JSON in MySQL Wrong
How to Use JSON in MySQL WrongHow to Use JSON in MySQL Wrong
How to Use JSON in MySQL Wrong
 
[오픈소스컨설팅] EFK Stack 소개와 설치 방법
[오픈소스컨설팅] EFK Stack 소개와 설치 방법[오픈소스컨설팅] EFK Stack 소개와 설치 방법
[오픈소스컨설팅] EFK Stack 소개와 설치 방법
 
Practical Celery
Practical CeleryPractical Celery
Practical Celery
 
MySQL Performance Tuning: Top 10 Tips
MySQL Performance Tuning: Top 10 TipsMySQL Performance Tuning: Top 10 Tips
MySQL Performance Tuning: Top 10 Tips
 

Viewers also liked

Cassandra and materialized views
Cassandra and materialized viewsCassandra and materialized views
Cassandra and materialized viewsGrzegorz Duda
 
Cassandra UDF and Materialized Views
Cassandra UDF and Materialized ViewsCassandra UDF and Materialized Views
Cassandra UDF and Materialized ViewsDuyhai Doan
 
Cassandra Data Modeling - Practical Considerations @ Netflix
Cassandra Data Modeling - Practical Considerations @ NetflixCassandra Data Modeling - Practical Considerations @ Netflix
Cassandra Data Modeling - Practical Considerations @ Netflixnkorla1share
 
Data Modeling for Data Science: Simplify Your Workload with Complex Types in ...
Data Modeling for Data Science: Simplify Your Workload with Complex Types in ...Data Modeling for Data Science: Simplify Your Workload with Complex Types in ...
Data Modeling for Data Science: Simplify Your Workload with Complex Types in ...Cloudera, Inc.
 
User defined-functions-cassandra-summit-eu-2014
User defined-functions-cassandra-summit-eu-2014User defined-functions-cassandra-summit-eu-2014
User defined-functions-cassandra-summit-eu-2014Robert Stupp
 
Light Weight Transactions Under Stress (Christopher Batey, The Last Pickle) ...
Light Weight Transactions Under Stress  (Christopher Batey, The Last Pickle) ...Light Weight Transactions Under Stress  (Christopher Batey, The Last Pickle) ...
Light Weight Transactions Under Stress (Christopher Batey, The Last Pickle) ...DataStax
 
台灣政黨票立委席次分配方式
台灣政黨票立委席次分配方式台灣政黨票立委席次分配方式
台灣政黨票立委席次分配方式RainDog 雨狗
 
Advanced search and Top-K queries in Cassandra
Advanced search and Top-K queries in CassandraAdvanced search and Top-K queries in Cassandra
Advanced search and Top-K queries in CassandraStratio
 
FedX - Optimization Techniques for Federated Query Processing on Linked Data
FedX - Optimization Techniques for Federated Query Processing on Linked DataFedX - Optimization Techniques for Federated Query Processing on Linked Data
FedX - Optimization Techniques for Federated Query Processing on Linked Dataaschwarte
 
Whats A Data Warehouse
Whats A Data WarehouseWhats A Data Warehouse
Whats A Data WarehouseNone None
 
Introduction to Cassandra Architecture
Introduction to Cassandra ArchitectureIntroduction to Cassandra Architecture
Introduction to Cassandra Architecturenickmbailey
 
Postgre sql9.3新機能紹介
Postgre sql9.3新機能紹介Postgre sql9.3新機能紹介
Postgre sql9.3新機能紹介Daichi Egawa
 
Data Warehouse and OLAP - Lear-Fabini
Data Warehouse and OLAP - Lear-FabiniData Warehouse and OLAP - Lear-Fabini
Data Warehouse and OLAP - Lear-FabiniScott Fabini
 
Oracle Optimizer: 12c New Capabilities
Oracle Optimizer: 12c New CapabilitiesOracle Optimizer: 12c New Capabilities
Oracle Optimizer: 12c New CapabilitiesGuatemala User Group
 
Materialized views in PostgreSQL
Materialized views in PostgreSQLMaterialized views in PostgreSQL
Materialized views in PostgreSQLAshutosh Bapat
 
Cassandra background-and-architecture
Cassandra background-and-architectureCassandra background-and-architecture
Cassandra background-and-architectureMarkus Klems
 
Introduction to GraphQL
Introduction to GraphQLIntroduction to GraphQL
Introduction to GraphQLBrainhub
 
Python and cassandra
Python and cassandraPython and cassandra
Python and cassandraJon Haddad
 

Viewers also liked (20)

Cassandra and materialized views
Cassandra and materialized viewsCassandra and materialized views
Cassandra and materialized views
 
Cassandra UDF and Materialized Views
Cassandra UDF and Materialized ViewsCassandra UDF and Materialized Views
Cassandra UDF and Materialized Views
 
Cassandra Data Modeling - Practical Considerations @ Netflix
Cassandra Data Modeling - Practical Considerations @ NetflixCassandra Data Modeling - Practical Considerations @ Netflix
Cassandra Data Modeling - Practical Considerations @ Netflix
 
Data Modeling for Data Science: Simplify Your Workload with Complex Types in ...
Data Modeling for Data Science: Simplify Your Workload with Complex Types in ...Data Modeling for Data Science: Simplify Your Workload with Complex Types in ...
Data Modeling for Data Science: Simplify Your Workload with Complex Types in ...
 
User defined-functions-cassandra-summit-eu-2014
User defined-functions-cassandra-summit-eu-2014User defined-functions-cassandra-summit-eu-2014
User defined-functions-cassandra-summit-eu-2014
 
Light Weight Transactions Under Stress (Christopher Batey, The Last Pickle) ...
Light Weight Transactions Under Stress  (Christopher Batey, The Last Pickle) ...Light Weight Transactions Under Stress  (Christopher Batey, The Last Pickle) ...
Light Weight Transactions Under Stress (Christopher Batey, The Last Pickle) ...
 
Tc de literatura6º
Tc de literatura6ºTc de literatura6º
Tc de literatura6º
 
台灣政黨票立委席次分配方式
台灣政黨票立委席次分配方式台灣政黨票立委席次分配方式
台灣政黨票立委席次分配方式
 
Advanced search and Top-K queries in Cassandra
Advanced search and Top-K queries in CassandraAdvanced search and Top-K queries in Cassandra
Advanced search and Top-K queries in Cassandra
 
05 OLAP v6 weekend
05 OLAP  v6 weekend05 OLAP  v6 weekend
05 OLAP v6 weekend
 
FedX - Optimization Techniques for Federated Query Processing on Linked Data
FedX - Optimization Techniques for Federated Query Processing on Linked DataFedX - Optimization Techniques for Federated Query Processing on Linked Data
FedX - Optimization Techniques for Federated Query Processing on Linked Data
 
Whats A Data Warehouse
Whats A Data WarehouseWhats A Data Warehouse
Whats A Data Warehouse
 
Introduction to Cassandra Architecture
Introduction to Cassandra ArchitectureIntroduction to Cassandra Architecture
Introduction to Cassandra Architecture
 
Postgre sql9.3新機能紹介
Postgre sql9.3新機能紹介Postgre sql9.3新機能紹介
Postgre sql9.3新機能紹介
 
Data Warehouse and OLAP - Lear-Fabini
Data Warehouse and OLAP - Lear-FabiniData Warehouse and OLAP - Lear-Fabini
Data Warehouse and OLAP - Lear-Fabini
 
Oracle Optimizer: 12c New Capabilities
Oracle Optimizer: 12c New CapabilitiesOracle Optimizer: 12c New Capabilities
Oracle Optimizer: 12c New Capabilities
 
Materialized views in PostgreSQL
Materialized views in PostgreSQLMaterialized views in PostgreSQL
Materialized views in PostgreSQL
 
Cassandra background-and-architecture
Cassandra background-and-architectureCassandra background-and-architecture
Cassandra background-and-architecture
 
Introduction to GraphQL
Introduction to GraphQLIntroduction to GraphQL
Introduction to GraphQL
 
Python and cassandra
Python and cassandraPython and cassandra
Python and cassandra
 

Similar to Cassandra Materialized Views

Cassandra Basics, Counters and Time Series Modeling
Cassandra Basics, Counters and Time Series ModelingCassandra Basics, Counters and Time Series Modeling
Cassandra Basics, Counters and Time Series ModelingVassilis Bekiaris
 
Deep Dive into Cassandra
Deep Dive into CassandraDeep Dive into Cassandra
Deep Dive into CassandraBrent Theisen
 
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...Flink Forward
 
Access Data from XPages with the Relational Controls
Access Data from XPages with the Relational ControlsAccess Data from XPages with the Relational Controls
Access Data from XPages with the Relational ControlsTeamstudio
 
Exploring the Fundamentals of YugabyteDB - Mydbops
Exploring the Fundamentals of YugabyteDB - Mydbops Exploring the Fundamentals of YugabyteDB - Mydbops
Exploring the Fundamentals of YugabyteDB - Mydbops Mydbops
 
Migrating To PostgreSQL
Migrating To PostgreSQLMigrating To PostgreSQL
Migrating To PostgreSQLGrant Fritchey
 
Dan Hotka's Top 10 Oracle 12c New Features
Dan Hotka's Top 10 Oracle 12c New FeaturesDan Hotka's Top 10 Oracle 12c New Features
Dan Hotka's Top 10 Oracle 12c New FeaturesEmbarcadero Technologies
 
SQL Server 2014 Monitoring and Profiling
SQL Server 2014 Monitoring and ProfilingSQL Server 2014 Monitoring and Profiling
SQL Server 2014 Monitoring and ProfilingAbouzar Noori
 
MongoDB using Grails plugin by puneet behl
MongoDB using Grails plugin by puneet behlMongoDB using Grails plugin by puneet behl
MongoDB using Grails plugin by puneet behlTO THE NEW | Technology
 
Webinar: What's new in the .NET Driver
Webinar: What's new in the .NET DriverWebinar: What's new in the .NET Driver
Webinar: What's new in the .NET DriverMongoDB
 
How & When to Use NoSQL at Websummit Dublin
How & When to Use NoSQL at Websummit DublinHow & When to Use NoSQL at Websummit Dublin
How & When to Use NoSQL at Websummit DublinAmazon Web Services
 
MWLUG 2016 : AD117 : Xpages & jQuery DataTables
MWLUG 2016 : AD117 : Xpages & jQuery DataTablesMWLUG 2016 : AD117 : Xpages & jQuery DataTables
MWLUG 2016 : AD117 : Xpages & jQuery DataTablesMichael Smith
 
Sap abap
Sap abapSap abap
Sap abapnrj10
 
SQL Server 2016 novelties
SQL Server 2016 noveltiesSQL Server 2016 novelties
SQL Server 2016 noveltiesMSDEVMTL
 
Modernizing your database with SQL Server 2019
Modernizing your database with SQL Server 2019Modernizing your database with SQL Server 2019
Modernizing your database with SQL Server 2019Antonios Chatzipavlis
 
Stream Analytics with SQL on Apache Flink
 Stream Analytics with SQL on Apache Flink Stream Analytics with SQL on Apache Flink
Stream Analytics with SQL on Apache FlinkFabian Hueske
 

Similar to Cassandra Materialized Views (20)

Cassandra Basics, Counters and Time Series Modeling
Cassandra Basics, Counters and Time Series ModelingCassandra Basics, Counters and Time Series Modeling
Cassandra Basics, Counters and Time Series Modeling
 
Deep Dive into Cassandra
Deep Dive into CassandraDeep Dive into Cassandra
Deep Dive into Cassandra
 
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
 
Access Data from XPages with the Relational Controls
Access Data from XPages with the Relational ControlsAccess Data from XPages with the Relational Controls
Access Data from XPages with the Relational Controls
 
Exploring the Fundamentals of YugabyteDB - Mydbops
Exploring the Fundamentals of YugabyteDB - Mydbops Exploring the Fundamentals of YugabyteDB - Mydbops
Exploring the Fundamentals of YugabyteDB - Mydbops
 
Migrating To PostgreSQL
Migrating To PostgreSQLMigrating To PostgreSQL
Migrating To PostgreSQL
 
Sql server basics
Sql server basicsSql server basics
Sql server basics
 
Dan Hotka's Top 10 Oracle 12c New Features
Dan Hotka's Top 10 Oracle 12c New FeaturesDan Hotka's Top 10 Oracle 12c New Features
Dan Hotka's Top 10 Oracle 12c New Features
 
SQL Server 2014 Monitoring and Profiling
SQL Server 2014 Monitoring and ProfilingSQL Server 2014 Monitoring and Profiling
SQL Server 2014 Monitoring and Profiling
 
PostgreSQL
PostgreSQLPostgreSQL
PostgreSQL
 
MongoDB using Grails plugin by puneet behl
MongoDB using Grails plugin by puneet behlMongoDB using Grails plugin by puneet behl
MongoDB using Grails plugin by puneet behl
 
How and when to use NoSQL
How and when to use NoSQLHow and when to use NoSQL
How and when to use NoSQL
 
Couchbas for dummies
Couchbas for dummiesCouchbas for dummies
Couchbas for dummies
 
Webinar: What's new in the .NET Driver
Webinar: What's new in the .NET DriverWebinar: What's new in the .NET Driver
Webinar: What's new in the .NET Driver
 
How & When to Use NoSQL at Websummit Dublin
How & When to Use NoSQL at Websummit DublinHow & When to Use NoSQL at Websummit Dublin
How & When to Use NoSQL at Websummit Dublin
 
MWLUG 2016 : AD117 : Xpages & jQuery DataTables
MWLUG 2016 : AD117 : Xpages & jQuery DataTablesMWLUG 2016 : AD117 : Xpages & jQuery DataTables
MWLUG 2016 : AD117 : Xpages & jQuery DataTables
 
Sap abap
Sap abapSap abap
Sap abap
 
SQL Server 2016 novelties
SQL Server 2016 noveltiesSQL Server 2016 novelties
SQL Server 2016 novelties
 
Modernizing your database with SQL Server 2019
Modernizing your database with SQL Server 2019Modernizing your database with SQL Server 2019
Modernizing your database with SQL Server 2019
 
Stream Analytics with SQL on Apache Flink
 Stream Analytics with SQL on Apache Flink Stream Analytics with SQL on Apache Flink
Stream Analytics with SQL on Apache Flink
 

Recently uploaded

Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldRoberto Pérez Alcolea
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingShane Coughlan
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identityteam-WIBU
 
SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?Alexandre Beguel
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxAndreas Kunz
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptxVinzoCenzo
 
SoftTeco - Software Development Company Profile
SoftTeco - Software Development Company ProfileSoftTeco - Software Development Company Profile
SoftTeco - Software Development Company Profileakrivarotava
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonLeveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonApplitools
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfRTS corp
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shardsChristopher Curtin
 
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfExploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfkalichargn70th171
 
Best Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITBest Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITmanoharjgpsolutions
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 

Recently uploaded (20)

Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository world
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identity
 
SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptx
 
SoftTeco - Software Development Company Profile
SoftTeco - Software Development Company ProfileSoftTeco - Software Development Company Profile
SoftTeco - Software Development Company Profile
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonLeveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards
 
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfExploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
 
Best Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITBest Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh IT
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 

Cassandra Materialized Views

  • 2. What are Materialized Views? • Two copies of the data using different partitioning and placed on different replicas • Automated, server-side denormalization of data • Native Cassandra read performance • Write penalty, but acceptable performance
  • 3. Basic Rules of Data Modeling, Refresher • Best practice: Denormalization • Start by understanding the queries you need • Create a table for each query
  • 4. Why is Denormalization Hard? • Implemented by every application • No guarantees on performance or consistency • Updates to existing rows require cleanup, read-before-write
  • 5. Denormalization Example: User Playlists Queries • All Songs for a given playlist • Track Users who like the same Artist • Find most recently played song
  • 6. Denormalization in Practice CREATE TABLE playlists (user_name text, playlist_name text, song_id text, artist_name text, last_played timestamp) SELECT song_id FROM playlists WHERE user_name=‘carl’ AND playlist_name=‘jams’ SELECT COUNT(song_id) FROM playlists WHERE artist_name=‘Weezer’ SELECT last_played, song_id FROM playlists WHERE user_name=‘carl’ AND playlist_name=‘jams’ ORDER BY last_played DESC
  • 7. Denormalization in Practice CREATE TABLE playlists (user_name text, playlist_name text, song_id text, artist_name text, last_played timestamp, PRIMARY KEY (user_name, playlist_name, song_id)) CREATE TABLE artists_to_playlists (artist_name text, user_name text, playlist_name text, song_id text, PRIMARY KEY (artist_name, user_name, playlist_name, song_id))
  • 8. Denormalization in Practice CREATE TABLE last_played (user_name text, playlist_name text, last_played timestamp, song_id text, PRIMARY KEY (user_name, playlist_name, last_played, song_id)) CLUSTERING ORDER BY (last_played DESC)
  • 9. Denormalization in Practice: Inserts BEGIN BATCH INSERT INTO playlists (user_name, playlist_name, song_id, artist_name, last_played) VALUES (‘carl’, ‘jams’, ‘Undone’, ‘Weezer’, ‘2015-09-24 09:00’); INSERT INTO artists_by_playlist (artist_name, user_name, playlist_name, song_id) VALUES (‘Weezer’, ‘carl’, ‘jams’, ‘Undone’); INSERT INTO last_played (user_name, playlist_name, last_played, song_id) VALUES (‘carl’, ‘jams’, ‘Undone’, ‘2015-09-24 09:00’); APPLY BATCH;
  • 10. Denormalization in Concept: Updates UPDATE playlists SET last_updated=now() WHERE user_name=‘carl’ AND playlist_name=‘jams’ AND song_id=‘Undone’ DELETE FROM playlists WHERE user=‘carl’
  • 11. Manual Denormalization with updates Client Batchlog Base Table View Table Coordinator
  • 12. Manual Denormalization with updates Client Batchlog Base Table View Table Coordinator Query Existing Data
  • 13. Manual Denormalization with updates Client Batchlog Base Table View Table Coordinator Query Existing Data Query Existing Data
  • 14. Manual Denormalization with updates Client Batchlog Base Table View Table Coordinator Query Existing Data Query Existing Data Return Existing Data Return Existing Data
  • 15. Manual Denormalization with updates Client Batchlog Base Table View Table Coordinator Write New Values
  • 16. Manual Denormalization with updates Client Batchlog Base Table View Table Coordinator Write New Values Write New Values
  • 17. Manual Denormalization with updates Client Batchlog View Table Coordinator Write New Values Write New Values Base Table Write New Values
  • 19. Manual Denormalization Limitations • Updates need to keep view in sync, including tombstoning previous values • How to keep the view and base in sync on failure? • Contentious updates can potentially cause extra values • Your application doesn’t always know what is a update or an insert (i.e. upsert)
  • 20. Manual Denormalization: Contentious Updates Client 1 Cassandra Client 2 playlists:(‘carl’, ‘jams’, ‘Undone’, 2015-09-24 9:00) last_played:(‘carl’, ‘jams’, 2015-09-24 9:00, ‘Undone’) Query existing last_played Query existing last_played
  • 21. Manual Denormalization: Contentious Updates Client 1 Cassandra Client 2 playlists:(‘carl’, ‘jams’, ‘Undone’, 2015-09-24 9:02) last_played:(‘carl’, ‘jams’, 2015-09-24 9:01, ‘Undone’) last_played:(‘carl’, ‘jams’, 2015-09-24 9:02, ‘Undone’) Update last played time to 9:02 last_played: 2015-09-24 9:00 last_played: 2015-09-24 9:00 Update last played time to 9:01
  • 23. Materialized Views • Provide automated server-side denormalization • No need for read-before-write on the application side • Simplify application code • Provide safer guarantees
  • 24. Materialized Views: Guarantees • If a write is acknowledged, at least CL number of base and view replicas will receive the write • If a write is actually an update, the previous value will be cleaned up in the view • Even with contentious updates, view synchronized with base for each update • Takes care of deletions properly • When a base table is repaired, the data will also be inserted into the view • TTL’d base data will remain TTL’d in view
  • 25. Why Not Just Use Secondary Indexes? • We can get most of the same functionality by using secondary indexes • Secondary indexes query each node, not being able to use the ring • On a node, not a single access
  • 26. Secondary Indexes: Query Pattern Client
  • 27. Secondary Indexes: Query Pattern Client
  • 28. Secondary Indexes: Query Pattern Client
  • 29. Materialized Views: Query Pattern Client
  • 30. Materialized Views in Practice CREATE TABLE playlists (user_name text, playlist_name text, song_id text, artist_name text, last_played timestamp, PRIMARY KEY (user_name, playlist_name, song_id))
  • 31. Materialized Views in Practice CREATE MATERIALIZED VIEW artist_to_user AS SELECT song_id, user_name FROM playlists WHERE song_id IS NOT NULL AND playlist_name IS NOT NULL AND user_name IS NOT NULL AND artist_name IS NOT NULL PRIMARY KEY (artist_name, user_name, playlist_name, song_id)
  • 34. Materialized Views in Practice • On creation, a new materialized view will be populated with existing base data • Each node tracks the completion of the build independently SELECT * FROM system.built_views WHERE keyspace=‘ks’ AND view_name=‘view’
  • 35. Materialized Views in Practice CREATE MATERIALIZED VIEW last_played AS SELECT last_played, song_id FROM playlists WHERE user_name IS NOT NULL AND last_played IS NOT NULL AND song_id IS NOT NULL PRIMARY KEY (user_name, playlist_name, last_played, song_id) CLUSTERING ORDER BY (last_played DESC)
  • 36. Materialized Views: Performance • https://github.com/tjake/mvbench • Uses java-driver to simulate MV and manual denormalization
  • 37. Materialized Views: Performance (ops/s) Manual Denormalization Materialized Views
  • 38. Materialized Views: Performance (p95 latency) Manual Denormalization Materialized Views
  • 39. • Adding WHERE clause support (#9664) CREATE MATERIALIZED VIEW carls_last_played AS SELECT last_played, song_id FROM plays WHERE user_name=‘carl’ PRIMARY KEY (last_played, song_id) • Knowing when a view is completely finished building without querying each node (#9967) • Insert only tables can skip read-before-write, lock acquisition (#9779) Materialized Views: Future Features
  • 40. write<p1, p2, c1, c2, v1> Node A Node B Node C Coordinator Base View Node D write<p1, p2, c1, c2, v1> del<v0, p1, p2, c1, c2> Client write<v1, p1, p2, c1, c2> BL
  • 41. Materialized Views: Under the Hood • If update is partial, we will reinsert data from the read when generating a new row • If no tombstone generated, only new columns are written to view
  • 42. Materialized Views: The Edge • Materialized Views have different failure properties than the rest of the system • Data from a single base replica can be on many view replicas
  • 43. Materialized Views: The Edge • When data is lost on all replicas of the base table, it can not be cleaned up in the view (#10346) • No read repair between the base and the view table • Repair of base table will clean up view • Requires local read-before-write. • If you will never ever update/delete use manual MVs
  • 44. Materialized Views: The Edge • An update from a base table is asynchronously applied to the view, so it is possible there will be a delay • A MV on a low-cardinality table can cause hotspots in the ring, overloading some nodes