SlideShare una empresa de Scribd logo
1 de 41
Descargar para leer sin conexión
Updating Materialized Views
and Caches Using Kafka
-or-
Why You Should Publish
Data Changes to Kafka
Zach Cox
Prairie.Code() Oct 2016
https://github.com/zcox/twitter-microservices-example
About Me
● Building things with Apache Kafka since 2014
○ Currently at Uptake in Chicago: predictive analytics for industrial IoT
○ Previously at Banno in Des Moines: ad targeting for bank web sites
● Co-founded Pongr
○ Startup in Des Moines: photo marketing platform powered by messaging systems
● Software game since 1998
● Links
○ http://theza.ch
○ https://github.com/zcox
○ https://twitter.com/zcox
○ https://www.linkedin.com/in/zachcox
Remember These Things
1. Learn about Apache Kafka http://kafka.apache.org
2. Send events and data changes to Kafka
3. Denormalization is OK
4. Up-to-date materialized views and caches
Build a New Service
● Provides read access to data from many sources
○ Query multiple tables or databases
○ Complex joins, aggregations
● Response latency
○ 95% 5 msec
○ Max 10 msec
● Data update latency
○ 95% 1 sec
○ Max 10 sec
http://www.nngroup.com/articles/response-times-3-important-limits/
User Information Service
● One operation: get user information
● REST HTTP + JSON
● Input: userId
● GET /users/:userId
● Output:
○ userId
○ username
○ Name
○ Description
○ Location
○ Web page url
○ Joined date
○ Profile image url
○ Background image url
○ # tweets
○ # following
○ # followers
○ # likes
○ # lists
○ # moments
Existing RDBMS with Normalized Tables
● users
○ user_id
○ username
○ name
○ description
● tweets
○ tweet_id
○ text
○ user_id (FK users)
● follows
○ follow_id
○ follower_id (FK users)
○ followee_id (FK users)
● likes
○ like_id
○ user_id (FK users)
○ tweet_id (FK tweets)
Standard Solution: Query Existing Tables
● User fields
○ SELECT * FROM users WHERE user_id = ?
● # tweets
○ SELECT COUNT(*) FROM tweets WHERE user_id = ?
● # following
○ SELECT COUNT(*) FROM follows WHERE follower_id = ?
● # followers
○ SELECT COUNT(*) FROM follows WHERE followee_id = ?
● # likes
○ SELECT COUNT(*) FROM likes WHERE user_id = ?
Problems with Standard Solution
● Complex: multiple queries across multiple tables
● Potentially large aggregations at query time
○ Puts load on DB
○ Increases service response latency
○ Repeated on every query for same userId
● Shared data storage
○ Some other service writes to these tables (i.e. owns them)
○ When it changes schema, our service could break
Standard Solution: Add a Cache
● e.g. Redis
● Benefits
○ Faster key lookups than RDBMS queries
○ Store expensive computed values in cache and reuse them (i.e. materialized view)
● Usage
○ Read from cache first, if found then return cached data
○ Otherwise, read from DB, write to cache, return cached data
def getUser(id: String): User =
readUserFromCache(id) match {
case Some(user) => user
case None =>
val user = readUserFromDatabase(id)
writeUserToCache(user)
user
}
def getUser(id: String): User =
readUserFromCache(id) match {
case Some(user) => user
case None => //cache miss!
val user = readUserFromDatabase(id)
writeUserToCache(user)
user
}
def getUser(id: String): User =
readUserFromCache(id) match {
case Some(user) => user //stale?
case None => //cache miss!
val user = readUserFromDatabase(id)
writeUserToCache(user)
user
}
def getUser(id: String): User =
readUserFromCache(id) match { //network latency
case Some(user) => user //stale?
case None => //cache miss!
val user = readUserFromDatabase(id)
writeUserToCache(user)
user
}
Problems with Standard Approach to Caches
● Operational complexity: someone has to manage Redis
● Code complexity: now querying two data stores and writing to one
● Cache misses: still putting some load on DB
● Stale data: cache is not updated when data changes
● Network latency: cache is remote
Can We Solve These Problems?
● Yes: If cache is always updated
● Complexity: only read from cache
● Cache misses: cache always has all data
● Stale data: cache always has updated data
● Network latency: if cache is local to service (bonus)
Kafka: Topics, Producers, Consumers
● Horizontally scalable, durable, highly available, high throughput, low latency
Kafka: Messages
● Message is a (key, value) pair
● Key and value are byte arrays (BYO serialization)
● Key is typically an ID (e.g. userId)
● Value is some payload (e.g. page view event, user data updated)
Kafka: Producer API
val props = … //kafka host:port, other configs
val producer = new KafkaProducer[K, V](props)
producer.send(topic, key, value)
Kafka: Consumer API
val props = … //kafka host:port, other configs
val consumer = new KafkaConsumer[K, V](props)
consumer.subscribe(topics)
while (true) {
val messages = consumer.poll(timeout)
//process list of messages
}
Kafka: Types of Topics
● Record topic
○ Finite topic retention period (e.g. 7 days)
○ Good for user activity, logs, metrics
● Changelog topic
○ Log-compacted topic: retains newest message for each key
○ Good for entities/table data
Kafka: Tables and Changelogs are Dual
Database Replication
Credit: I Heart Logs http://shop.oreilly.com/product/0636920034339.do
DB to Kafka
● Change data capture
○ Kafka Connect http://kafka.apache.org/documentation#connect
○ Bottled Water https://github.com/confluentinc/bottledwater-pg
● Dual writes
○ Application writes to both DB and Kafka
○ Prefer CDC
Kafka Streams
● Higher-level API than producers and consumers
● Just a library (no Hadoop/Spark/Flink cluster to maintain)
val tweetCountsByUserId = builder.stream(tweetsTopic)
.selectKey((tweetId, tweet) => tweet.userId)
.countByKey("tweetCountsByUserId")
val userInformation = builder.table(usersTopic)
.leftJoin(tweetCountsByUserId,
(user, count) => new UserInformation(user, count))
userInformation.to(userInformationTopic)
RocksDB
● Key-value store
● In-process
○ Local (not remote)
○ Library (not a daemon/server)
● Mostly in-memory, spills to local disk
○ Usually an under-utilized resource on app servers
○ 100s of GBs? TBs?
○ AWS EBS 100GB SSD $10/mo
● http://rocksdb.org
HTTP Service Internals
Live Demo!
Kafka Streams Interactive Queries
http://www.confluent.io/blog/unifying-stream-processing-and-interactive-queries-in-apache-kafka/
Confluent Schema Registry
● Serialize messages in Kafka topics using Avro
● Avro schemas registered with central server
● Anyone can safely consume data in topics
● http://docs.confluent.io/3.0.1/schema-registry/docs/index.html

Más contenido relacionado

La actualidad más candente

U C2007 My S Q L Performance Cookbook
U C2007  My S Q L  Performance  CookbookU C2007  My S Q L  Performance  Cookbook
U C2007 My S Q L Performance Cookbook
guestae36d0
 
Intro To Mongo Db
Intro To Mongo DbIntro To Mongo Db
Intro To Mongo Db
chriskite
 

La actualidad más candente (20)

MongoDB
MongoDBMongoDB
MongoDB
 
Mongo db
Mongo dbMongo db
Mongo db
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Lokijs
LokijsLokijs
Lokijs
 
MongoDB for Beginners
MongoDB for BeginnersMongoDB for Beginners
MongoDB for Beginners
 
The Basics of MongoDB
The Basics of MongoDBThe Basics of MongoDB
The Basics of MongoDB
 
Mastering the MongoDB Shell
Mastering the MongoDB ShellMastering the MongoDB Shell
Mastering the MongoDB Shell
 
Mongo DB 102
Mongo DB 102Mongo DB 102
Mongo DB 102
 
Mongo DB Presentation
Mongo DB PresentationMongo DB Presentation
Mongo DB Presentation
 
MongoDB
MongoDBMongoDB
MongoDB
 
U C2007 My S Q L Performance Cookbook
U C2007  My S Q L  Performance  CookbookU C2007  My S Q L  Performance  Cookbook
U C2007 My S Q L Performance Cookbook
 
MongoDB 101
MongoDB 101MongoDB 101
MongoDB 101
 
Gsummit apis-2013
Gsummit apis-2013Gsummit apis-2013
Gsummit apis-2013
 
Mongo db1
Mongo db1Mongo db1
Mongo db1
 
Introduction to mongo db
Introduction to mongo dbIntroduction to mongo db
Introduction to mongo db
 
Tms training
Tms trainingTms training
Tms training
 
Connecting to a REST API in iOS
Connecting to a REST API in iOSConnecting to a REST API in iOS
Connecting to a REST API in iOS
 
Intro To Mongo Db
Intro To Mongo DbIntro To Mongo Db
Intro To Mongo Db
 
RethinkDB - the open-source database for the realtime web
RethinkDB - the open-source database for the realtime webRethinkDB - the open-source database for the realtime web
RethinkDB - the open-source database for the realtime web
 
Node js crash course session 5
Node js crash course   session 5Node js crash course   session 5
Node js crash course session 5
 

Destacado

Achieving Real-time Ingestion and Analysis of Security Events through Kafka a...
Achieving Real-time Ingestion and Analysis of Security Events through Kafka a...Achieving Real-time Ingestion and Analysis of Security Events through Kafka a...
Achieving Real-time Ingestion and Analysis of Security Events through Kafka a...
Kevin Mao
 

Destacado (11)

Polyalgebra
PolyalgebraPolyalgebra
Polyalgebra
 
SQL on everything, in memory
SQL on everything, in memorySQL on everything, in memory
SQL on everything, in memory
 
Apache Calcite overview
Apache Calcite overviewApache Calcite overview
Apache Calcite overview
 
Document Management With the Nuxeo Platform
Document Management With the Nuxeo PlatformDocument Management With the Nuxeo Platform
Document Management With the Nuxeo Platform
 
Improvements to Apache HBase and Its Applications in Alibaba Search
Improvements to Apache HBase and Its Applications in Alibaba Search Improvements to Apache HBase and Its Applications in Alibaba Search
Improvements to Apache HBase and Its Applications in Alibaba Search
 
I Heart Log: Real-time Data and Apache Kafka
I Heart Log: Real-time Data and Apache KafkaI Heart Log: Real-time Data and Apache Kafka
I Heart Log: Real-time Data and Apache Kafka
 
Streaming, Database & Distributed Systems Bridging the Divide
Streaming, Database & Distributed Systems Bridging the DivideStreaming, Database & Distributed Systems Bridging the Divide
Streaming, Database & Distributed Systems Bridging the Divide
 
Achieving Real-time Ingestion and Analysis of Security Events through Kafka a...
Achieving Real-time Ingestion and Analysis of Security Events through Kafka a...Achieving Real-time Ingestion and Analysis of Security Events through Kafka a...
Achieving Real-time Ingestion and Analysis of Security Events through Kafka a...
 
Data warehousing with Hadoop
Data warehousing with HadoopData warehousing with Hadoop
Data warehousing with Hadoop
 
Architecting a Next Generation Data Platform
Architecting a Next Generation Data PlatformArchitecting a Next Generation Data Platform
Architecting a Next Generation Data Platform
 
The Power of the Log
The Power of the LogThe Power of the Log
The Power of the Log
 

Similar a Updating materialized views and caches using kafka

Friends don't let friends do dual writes: Outbox pattern with OpenShift Strea...
Friends don't let friends do dual writes: Outbox pattern with OpenShift Strea...Friends don't let friends do dual writes: Outbox pattern with OpenShift Strea...
Friends don't let friends do dual writes: Outbox pattern with OpenShift Strea...
Red Hat Developers
 

Similar a Updating materialized views and caches using kafka (20)

Enabling Data Scientists to easily create and own Kafka Consumers
Enabling Data Scientists to easily create and own Kafka ConsumersEnabling Data Scientists to easily create and own Kafka Consumers
Enabling Data Scientists to easily create and own Kafka Consumers
 
Enabling Data Scientists to easily create and own Kafka Consumers | Stefan Kr...
Enabling Data Scientists to easily create and own Kafka Consumers | Stefan Kr...Enabling Data Scientists to easily create and own Kafka Consumers | Stefan Kr...
Enabling Data Scientists to easily create and own Kafka Consumers | Stefan Kr...
 
Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015
 
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafka
 
Using MongoDB with Kafka - Use Cases and Best Practices
Using MongoDB with Kafka -  Use Cases and Best PracticesUsing MongoDB with Kafka -  Use Cases and Best Practices
Using MongoDB with Kafka - Use Cases and Best Practices
 
Friends don't let friends do dual writes: Outbox pattern with OpenShift Strea...
Friends don't let friends do dual writes: Outbox pattern with OpenShift Strea...Friends don't let friends do dual writes: Outbox pattern with OpenShift Strea...
Friends don't let friends do dual writes: Outbox pattern with OpenShift Strea...
 
Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community Update
 
Architectural patterns for high performance microservices in kubernetes
Architectural patterns for high performance microservices in kubernetesArchitectural patterns for high performance microservices in kubernetes
Architectural patterns for high performance microservices in kubernetes
 
Kubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the DatacenterKubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the Datacenter
 
Introduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big DataIntroduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big Data
 
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
 
Data Lakes with Azure Databricks
Data Lakes with Azure DatabricksData Lakes with Azure Databricks
Data Lakes with Azure Databricks
 
Introduction to streaming and messaging flume,kafka,SQS,kinesis
Introduction to streaming and messaging  flume,kafka,SQS,kinesis Introduction to streaming and messaging  flume,kafka,SQS,kinesis
Introduction to streaming and messaging flume,kafka,SQS,kinesis
 
Go Big or Go Home: Approaching Kafka Replication at Scale
Go Big or Go Home: Approaching Kafka Replication at ScaleGo Big or Go Home: Approaching Kafka Replication at Scale
Go Big or Go Home: Approaching Kafka Replication at Scale
 
Cassandra Talk: Austin JUG
Cassandra Talk: Austin JUGCassandra Talk: Austin JUG
Cassandra Talk: Austin JUG
 
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
 
BDA311 Introduction to AWS Glue
BDA311 Introduction to AWS GlueBDA311 Introduction to AWS Glue
BDA311 Introduction to AWS Glue
 
Kafka Streams - From the Ground Up to the Cloud
Kafka Streams - From the Ground Up to the CloudKafka Streams - From the Ground Up to the Cloud
Kafka Streams - From the Ground Up to the Cloud
 
A Tour of Apache Kafka
A Tour of Apache KafkaA Tour of Apache Kafka
A Tour of Apache Kafka
 

Último

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
masabamasaba
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
masabamasaba
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
masabamasaba
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
masabamasaba
 

Último (20)

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 

Updating materialized views and caches using kafka

  • 1. Updating Materialized Views and Caches Using Kafka -or- Why You Should Publish Data Changes to Kafka Zach Cox Prairie.Code() Oct 2016 https://github.com/zcox/twitter-microservices-example
  • 2. About Me ● Building things with Apache Kafka since 2014 ○ Currently at Uptake in Chicago: predictive analytics for industrial IoT ○ Previously at Banno in Des Moines: ad targeting for bank web sites ● Co-founded Pongr ○ Startup in Des Moines: photo marketing platform powered by messaging systems ● Software game since 1998 ● Links ○ http://theza.ch ○ https://github.com/zcox ○ https://twitter.com/zcox ○ https://www.linkedin.com/in/zachcox
  • 3. Remember These Things 1. Learn about Apache Kafka http://kafka.apache.org 2. Send events and data changes to Kafka 3. Denormalization is OK 4. Up-to-date materialized views and caches
  • 4. Build a New Service ● Provides read access to data from many sources ○ Query multiple tables or databases ○ Complex joins, aggregations ● Response latency ○ 95% 5 msec ○ Max 10 msec ● Data update latency ○ 95% 1 sec ○ Max 10 sec
  • 5.
  • 6.
  • 7.
  • 9. User Information Service ● One operation: get user information ● REST HTTP + JSON ● Input: userId ● GET /users/:userId ● Output: ○ userId ○ username ○ Name ○ Description ○ Location ○ Web page url ○ Joined date ○ Profile image url ○ Background image url ○ # tweets ○ # following ○ # followers ○ # likes ○ # lists ○ # moments
  • 10. Existing RDBMS with Normalized Tables ● users ○ user_id ○ username ○ name ○ description ● tweets ○ tweet_id ○ text ○ user_id (FK users) ● follows ○ follow_id ○ follower_id (FK users) ○ followee_id (FK users) ● likes ○ like_id ○ user_id (FK users) ○ tweet_id (FK tweets)
  • 11. Standard Solution: Query Existing Tables ● User fields ○ SELECT * FROM users WHERE user_id = ? ● # tweets ○ SELECT COUNT(*) FROM tweets WHERE user_id = ? ● # following ○ SELECT COUNT(*) FROM follows WHERE follower_id = ? ● # followers ○ SELECT COUNT(*) FROM follows WHERE followee_id = ? ● # likes ○ SELECT COUNT(*) FROM likes WHERE user_id = ?
  • 12. Problems with Standard Solution ● Complex: multiple queries across multiple tables ● Potentially large aggregations at query time ○ Puts load on DB ○ Increases service response latency ○ Repeated on every query for same userId ● Shared data storage ○ Some other service writes to these tables (i.e. owns them) ○ When it changes schema, our service could break
  • 13. Standard Solution: Add a Cache ● e.g. Redis ● Benefits ○ Faster key lookups than RDBMS queries ○ Store expensive computed values in cache and reuse them (i.e. materialized view) ● Usage ○ Read from cache first, if found then return cached data ○ Otherwise, read from DB, write to cache, return cached data
  • 14. def getUser(id: String): User = readUserFromCache(id) match { case Some(user) => user case None => val user = readUserFromDatabase(id) writeUserToCache(user) user }
  • 15. def getUser(id: String): User = readUserFromCache(id) match { case Some(user) => user case None => //cache miss! val user = readUserFromDatabase(id) writeUserToCache(user) user }
  • 16. def getUser(id: String): User = readUserFromCache(id) match { case Some(user) => user //stale? case None => //cache miss! val user = readUserFromDatabase(id) writeUserToCache(user) user }
  • 17. def getUser(id: String): User = readUserFromCache(id) match { //network latency case Some(user) => user //stale? case None => //cache miss! val user = readUserFromDatabase(id) writeUserToCache(user) user }
  • 18. Problems with Standard Approach to Caches ● Operational complexity: someone has to manage Redis ● Code complexity: now querying two data stores and writing to one ● Cache misses: still putting some load on DB ● Stale data: cache is not updated when data changes ● Network latency: cache is remote
  • 19. Can We Solve These Problems? ● Yes: If cache is always updated ● Complexity: only read from cache ● Cache misses: cache always has all data ● Stale data: cache always has updated data ● Network latency: if cache is local to service (bonus)
  • 20.
  • 21.
  • 22. Kafka: Topics, Producers, Consumers ● Horizontally scalable, durable, highly available, high throughput, low latency
  • 23. Kafka: Messages ● Message is a (key, value) pair ● Key and value are byte arrays (BYO serialization) ● Key is typically an ID (e.g. userId) ● Value is some payload (e.g. page view event, user data updated)
  • 24. Kafka: Producer API val props = … //kafka host:port, other configs val producer = new KafkaProducer[K, V](props) producer.send(topic, key, value)
  • 25. Kafka: Consumer API val props = … //kafka host:port, other configs val consumer = new KafkaConsumer[K, V](props) consumer.subscribe(topics) while (true) { val messages = consumer.poll(timeout) //process list of messages }
  • 26. Kafka: Types of Topics ● Record topic ○ Finite topic retention period (e.g. 7 days) ○ Good for user activity, logs, metrics ● Changelog topic ○ Log-compacted topic: retains newest message for each key ○ Good for entities/table data
  • 27. Kafka: Tables and Changelogs are Dual
  • 28. Database Replication Credit: I Heart Logs http://shop.oreilly.com/product/0636920034339.do
  • 29. DB to Kafka ● Change data capture ○ Kafka Connect http://kafka.apache.org/documentation#connect ○ Bottled Water https://github.com/confluentinc/bottledwater-pg ● Dual writes ○ Application writes to both DB and Kafka ○ Prefer CDC
  • 30. Kafka Streams ● Higher-level API than producers and consumers ● Just a library (no Hadoop/Spark/Flink cluster to maintain) val tweetCountsByUserId = builder.stream(tweetsTopic) .selectKey((tweetId, tweet) => tweet.userId) .countByKey("tweetCountsByUserId") val userInformation = builder.table(usersTopic) .leftJoin(tweetCountsByUserId, (user, count) => new UserInformation(user, count)) userInformation.to(userInformationTopic)
  • 31.
  • 32.
  • 33. RocksDB ● Key-value store ● In-process ○ Local (not remote) ○ Library (not a daemon/server) ● Mostly in-memory, spills to local disk ○ Usually an under-utilized resource on app servers ○ 100s of GBs? TBs? ○ AWS EBS 100GB SSD $10/mo ● http://rocksdb.org
  • 36.
  • 37.
  • 38.
  • 39.
  • 40. Kafka Streams Interactive Queries http://www.confluent.io/blog/unifying-stream-processing-and-interactive-queries-in-apache-kafka/
  • 41. Confluent Schema Registry ● Serialize messages in Kafka topics using Avro ● Avro schemas registered with central server ● Anyone can safely consume data in topics ● http://docs.confluent.io/3.0.1/schema-registry/docs/index.html