"Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan

Fwdays
FwdaysFwdays
"Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan
https://nbomber.com
https://github.com/stereodb
AGENDA
- intro
- why stateless is slow and less reliable
- tools for building stateful services
PART I
intro to sportsbook domain
and
how we come to stateful
Dynamo Kyiv vs Chelsea
2 : 1
Red card
Score
changed
Odds
changed
PUSH
PULL
Dynamo Kyiv vs Chelsea
2 : 1
Red card
Score
changed
Odds
changed
PUSH
PULL
- quite big payloads: 30 KB compressed data (1.5 MB uncompressed)
- update rate: 2K RPS (per tenant)
- user query rate: 3-4K RPS (per tenant)
- live data is very dynamic: no much sense to cache it
- data should be queryable: simple KV is not enough
- we need secondary indexes
20 KB payload for concurrent read and write
Redis, single node: 4vcpu - 8gb
redis_write: 4K RPS, p75 = 543, p95 = 688, p99 = 842
redis read: 7K RPS, p75 = 970, p95 = 1278, p99 = 1597
API API API
DB
API
Cache
DB
but Cache is not queryable
API + DB
Stateful Service
state
state
state
"Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan
CDC (Debezium)
DB
How to handle a case when
your data is larger than RAM?
10 GB 30 GB
Solution 1: use memory DB that supports data larger than RAM
10 GB
20 GB
UA PL FR
Solution 2: use partition by tenant
Solution 3: use range-based sharding
users
(1-500)
users
(501-1000)
shard A shard B
PART II
why stateless is slow
API + DB
Stateful Service
API
Cache
DB
network latency
network latency
Latency Numbers
Latency
2010 2020
Compress 1KB with Zippy 2μs 2μs
Read 1 MB sequentially from RAM 30μs 3μs
Read 1 MB sequentially from SSD 494μs 49μs
Read 1 MB sequentially from disk 3ms 825μs
Round trip within same datacenter 500μs 500μs
Send packet CA -> Netherlands -> CA 150ms 150ms
https://colin-scott.github.io/personal_website/research/interactive_latency.html
API
Cache
DB
CPU: for serialize/deserialize
CPU: serialize/deserialize
API + DB
Stateful Service
CPU for serialize (we don’t need to deserialize)
API
Cache
DB
CPU: for serialize/deserialize
CPU for ASYNC request handling
CPU: serialize/deserialize
CPU: ASYNC request handling
API + DB
Stateful Service
CPU for serialize (we don’t need to deserialize)
API
Cache
DB
CPU: for serialize/deserialize
CPU for ASYNC request handling
CPU: managing sockets
CPU: serialize/deserialize
CPU: ASYNC request handling
CPU: managing sockets
API + DB
Stateful Service
CPU for serialize (we don’t need to deserialize)
CPU for managing sockets (only clients sockets )
API
Cache
DB
CPU: for serialize/deserialize
CPU for ASYNC request handling
CPU: managing sockets
CPU: serialize/deserialize
CPU: ASYNC request handling
CPU: managing sockets
API + DB
Stateful Service
CPU for serialize (we don’t need to deserialize)
CPU for managing sockets (only clients sockets )
CPU for handling query (very cheap compared to
serialization)
API
Cache
DB
CPU: for serialize/deserialize
CPU for ASYNC request handling
CPU: managing sockets
Overreads
CPU: serialize/deserialize
CPU: ASYNC request handling
CPU: managing sockets
API + DB
Stateful Service
CPU for serialize (we don’t need to deserialize)
CPU for managing sockets (only clients sockets )
CPU for handling query (very cheap compared to
serialization)
"Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan
"Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan
"Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan
"Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan
"Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan
Object hit rate / Transactional hit rate
A B
C
API
In order to fulfill our transactional flow we need to
fetch records: A, B, C
Record A and B will not impact our latency
Overall Latency = Latency of record C
"Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan
"Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan
Most existing cache eviction algorithms focus on maximizing
object hit rate, or the fraction of single object requests served
from cache. However, this approach fails to capture the
inter-object dependencies within transactions.
async / await
async / await
Imagine that we run Redis on localhost. Even with such setup we
usually use async request handling.
public void SimpleMethod()
{
var k = 0;
for (int i = 0; i < Iterations; i++)
{
k = Add(i, i);
}
}
[MethodImpl(MethodImplOptions.NoInlining)]
private int Add(int a, int b) => a + b;
public async Task SimpleMethodAsync()
{
var k = 0;
for (int i = 0; i < Iterations; i++)
{
k = await AddAsync(i, i);
}
}
private Task<int> AddAsync(int a, int b)
{
return Task.FromResult(a + b);
}
public async Task SimpleMethodAsyncYield()
{
var k = 0;
for (int i = 0; i < Iterations; i++)
{
k = await AddAsync(i, i);
}
}
private async Task<int> AddAsync(int a, int b)
{
await Task.Yield();
return a + b;
}
public async Task SimpleMethodAsyncYield()
{
var k = 0;
for (int i = 0; i < Iterations; i++)
{
k = await AddAsync(i, i);
}
}
private async Task<int> AddAsync(int a, int b)
{
await Task.Yield();
return await Task.Run(() => a + b);
}
"Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan
"Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan
PART III
why stateless is less reliable
API
Cache
DB
API + DB
Stateful Service
We have a higher probability of failure
API
Cache
DB
circuit breaker
retry
fallback
timeout
bulkhead isolation
circuit breaker
retry
fallback
timeout
bulkhead isolation
API + DB
Stateful Service
API
Cache
DB
What about cache invalidation
and data consistency?
API + DB
Stateful Service
API
Cache
DB
What about the predictable scale-out?
Will your RPS increase if you add an
additional API or Cache node?
API + DB
Stateful Service
"Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan
- Metastable failures occur in open systems with an uncontrolled source of
load where a trigger causes the system to enter a bad state that persists
even when the trigger is removed.
- Paradoxically, the root cause of these failures is often features that
improve the efficiency or reliability of the system.
- The characteristic of a metastable failure is that the sustaining effect keeps
the system in the metastable failure state even after the trigger is
removed.
At least 4 out of 15 major outages in the
last decade at Amazon Web Services
were caused by metastable failures.
"Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan
PART IV
tools for building stateful services
distributed log with sync replication
In-process memory DB
SQL OLAP
"Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan
"Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan
Dynamo Kyiv vs Chelsea
2 : 1
Red card
Score
changed
Odds
changed
PUSH
PULL
- quite big payloads: 30 KB compressed data (1.5 MB uncompressed)
- update rate: 2K RPS (per tenant)
- user query rate: 3-4K RPS (per tenant)
- live data is very dynamic: no much sense to cache it
- data should be queryable: simple KV is not enough
- we need secondary indexes
At pick to handle big load for 1 tenant we have:
5-10 nodes, 0.5-2 CPU, 6GB RAM
THANKS
always benchmark
https://twitter.com/antyadev
1 de 60

Recomendados

Load Balancing MySQL with HAProxy - Slides por
Load Balancing MySQL with HAProxy - SlidesLoad Balancing MySQL with HAProxy - Slides
Load Balancing MySQL with HAProxy - SlidesSeveralnines
11.3K vistas25 diapositivas
Synapse 2018 Guarding against failure in a hundred step pipeline por
Synapse 2018 Guarding against failure in a hundred step pipelineSynapse 2018 Guarding against failure in a hundred step pipeline
Synapse 2018 Guarding against failure in a hundred step pipelineCalvin French-Owen
120 vistas112 diapositivas
Anton Moldovan "Building an efficient replication system for thousands of ter... por
Anton Moldovan "Building an efficient replication system for thousands of ter...Anton Moldovan "Building an efficient replication system for thousands of ter...
Anton Moldovan "Building an efficient replication system for thousands of ter...Fwdays
150 vistas114 diapositivas
A Day in the Life of a Cloud Network Engineer at Netflix - NET303 - re:Invent... por
A Day in the Life of a Cloud Network Engineer at Netflix - NET303 - re:Invent...A Day in the Life of a Cloud Network Engineer at Netflix - NET303 - re:Invent...
A Day in the Life of a Cloud Network Engineer at Netflix - NET303 - re:Invent...Amazon Web Services
9.4K vistas89 diapositivas
Fighting Against Chaotically Separated Values with Embulk por
Fighting Against Chaotically Separated Values with EmbulkFighting Against Chaotically Separated Values with Embulk
Fighting Against Chaotically Separated Values with EmbulkSadayuki Furuhashi
2.1K vistas45 diapositivas
Microservice bus tutorial por
Microservice bus tutorialMicroservice bus tutorial
Microservice bus tutorialHuabing Zhao
708 vistas19 diapositivas

Más contenido relacionado

Similar a "Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan

L’odyssée d’une requête HTTP chez Scaleway por
L’odyssée d’une requête HTTP chez ScalewayL’odyssée d’une requête HTTP chez Scaleway
L’odyssée d’une requête HTTP chez ScalewayScaleway
293 vistas41 diapositivas
WebRTC Infrastructure scalability notes - Geek'n Kranky - June 2014 @ Google SF por
WebRTC Infrastructure scalability notes - Geek'n Kranky - June 2014 @ Google SFWebRTC Infrastructure scalability notes - Geek'n Kranky - June 2014 @ Google SF
WebRTC Infrastructure scalability notes - Geek'n Kranky - June 2014 @ Google SFAlexandre Gouaillard
760 vistas19 diapositivas
DPC2007 PHP And Oracle (Kuassi Mensah) por
DPC2007 PHP And Oracle (Kuassi Mensah)DPC2007 PHP And Oracle (Kuassi Mensah)
DPC2007 PHP And Oracle (Kuassi Mensah)dpc
849 vistas34 diapositivas
Cassandra at teads por
Cassandra at teadsCassandra at teads
Cassandra at teadsRomain Hardouin
6.2K vistas86 diapositivas
StrongLoop Overview por
StrongLoop OverviewStrongLoop Overview
StrongLoop OverviewShubhra Kar
2.3K vistas44 diapositivas
Getting Started with Amazon Redshift por
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
1K vistas59 diapositivas

Similar a "Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan (20)

L’odyssée d’une requête HTTP chez Scaleway por Scaleway
L’odyssée d’une requête HTTP chez ScalewayL’odyssée d’une requête HTTP chez Scaleway
L’odyssée d’une requête HTTP chez Scaleway
Scaleway293 vistas
WebRTC Infrastructure scalability notes - Geek'n Kranky - June 2014 @ Google SF por Alexandre Gouaillard
WebRTC Infrastructure scalability notes - Geek'n Kranky - June 2014 @ Google SFWebRTC Infrastructure scalability notes - Geek'n Kranky - June 2014 @ Google SF
WebRTC Infrastructure scalability notes - Geek'n Kranky - June 2014 @ Google SF
DPC2007 PHP And Oracle (Kuassi Mensah) por dpc
DPC2007 PHP And Oracle (Kuassi Mensah)DPC2007 PHP And Oracle (Kuassi Mensah)
DPC2007 PHP And Oracle (Kuassi Mensah)
dpc849 vistas
StrongLoop Overview por Shubhra Kar
StrongLoop OverviewStrongLoop Overview
StrongLoop Overview
Shubhra Kar2.3K vistas
Choisir entre une API RPC, SOAP, REST, GraphQL? 
Et si le problème était ai... por François-Guillaume Ribreau
Choisir entre une API  RPC, SOAP, REST, GraphQL?  
Et si le problème était ai...Choisir entre une API  RPC, SOAP, REST, GraphQL?  
Et si le problème était ai...
Choisir entre une API RPC, SOAP, REST, GraphQL? 
Et si le problème était ai...
EEDC 2010. Scaling Web Applications por Expertos en TI
EEDC 2010. Scaling Web ApplicationsEEDC 2010. Scaling Web Applications
EEDC 2010. Scaling Web Applications
Expertos en TI593 vistas
Shared Personalization Service - How To Scale to 15K RPS, Patrice Pelland por Fuenteovejuna
Shared Personalization Service - How To Scale to 15K RPS, Patrice PellandShared Personalization Service - How To Scale to 15K RPS, Patrice Pelland
Shared Personalization Service - How To Scale to 15K RPS, Patrice Pelland
Fuenteovejuna 551 vistas
Kafka elastic search meetup 09242018 por Ying Xu
Kafka elastic search meetup 09242018Kafka elastic search meetup 09242018
Kafka elastic search meetup 09242018
Ying Xu175 vistas
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark por Michael Stack
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and SparkHBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
Michael Stack742 vistas
초보자를 위한 분산 캐시 이야기 por OnGameServer
초보자를 위한 분산 캐시 이야기초보자를 위한 분산 캐시 이야기
초보자를 위한 분산 캐시 이야기
OnGameServer12K vistas
Sql sever engine batch mode and cpu architectures por Chris Adkin
Sql sever engine batch mode and cpu architecturesSql sever engine batch mode and cpu architectures
Sql sever engine batch mode and cpu architectures
Chris Adkin1K vistas
How To Set Up SQL Load Balancing with HAProxy - Slides por Severalnines
How To Set Up SQL Load Balancing with HAProxy - SlidesHow To Set Up SQL Load Balancing with HAProxy - Slides
How To Set Up SQL Load Balancing with HAProxy - Slides
Severalnines21.3K vistas
Oracle Client Failover - Under The Hood por Ludovico Caldara
Oracle Client Failover - Under The HoodOracle Client Failover - Under The Hood
Oracle Client Failover - Under The Hood
Ludovico Caldara1.9K vistas
"Production-ready Serverless Java Applications in 3 weeks" at AWS Community D... por Vadym Kazulkin
"Production-ready Serverless Java Applications in 3 weeks" at AWS Community D..."Production-ready Serverless Java Applications in 3 weeks" at AWS Community D...
"Production-ready Serverless Java Applications in 3 weeks" at AWS Community D...
Vadym Kazulkin84 vistas
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호 por Amazon Web Services Korea
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
Tarantool как платформа для микросервисов / Антон Резников, Владимир Перепели... por Ontico
Tarantool как платформа для микросервисов / Антон Резников, Владимир Перепели...Tarantool как платформа для микросервисов / Антон Резников, Владимир Перепели...
Tarantool как платформа для микросервисов / Антон Резников, Владимир Перепели...
Ontico3.2K vistas
Extending Piwik At R7.com por Leo Lorieri
Extending Piwik At R7.comExtending Piwik At R7.com
Extending Piwik At R7.com
Leo Lorieri2.8K vistas

Más de Fwdays

"The role of CTO in a classical early-stage startup", Eugene Gusarov por
"The role of CTO in a classical early-stage startup", Eugene Gusarov"The role of CTO in a classical early-stage startup", Eugene Gusarov
"The role of CTO in a classical early-stage startup", Eugene GusarovFwdays
31 vistas43 diapositivas
"Cross-functional teams: what to do when a new hire doesn’t solve the busines... por
"Cross-functional teams: what to do when a new hire doesn’t solve the busines..."Cross-functional teams: what to do when a new hire doesn’t solve the busines...
"Cross-functional teams: what to do when a new hire doesn’t solve the busines...Fwdays
30 vistas29 diapositivas
"Ukrainian Mobile Banking Scaling in Practice. From 0 to 100 and beyond", Vad... por
"Ukrainian Mobile Banking Scaling in Practice. From 0 to 100 and beyond", Vad..."Ukrainian Mobile Banking Scaling in Practice. From 0 to 100 and beyond", Vad...
"Ukrainian Mobile Banking Scaling in Practice. From 0 to 100 and beyond", Vad...Fwdays
40 vistas30 diapositivas
"Thriving Culture in a Product Company — Practical Story", Volodymyr Tsukur por
"Thriving Culture in a Product Company — Practical Story", Volodymyr Tsukur"Thriving Culture in a Product Company — Practical Story", Volodymyr Tsukur
"Thriving Culture in a Product Company — Practical Story", Volodymyr TsukurFwdays
40 vistas31 diapositivas
"Fast Start to Building on AWS", Igor Ivaniuk por
"Fast Start to Building on AWS", Igor Ivaniuk"Fast Start to Building on AWS", Igor Ivaniuk
"Fast Start to Building on AWS", Igor IvaniukFwdays
36 vistas76 diapositivas
"Quality Assurance: Achieving Excellence in startup without a Dedicated QA", ... por
"Quality Assurance: Achieving Excellence in startup without a Dedicated QA", ..."Quality Assurance: Achieving Excellence in startup without a Dedicated QA", ...
"Quality Assurance: Achieving Excellence in startup without a Dedicated QA", ...Fwdays
33 vistas39 diapositivas

Más de Fwdays(20)

"The role of CTO in a classical early-stage startup", Eugene Gusarov por Fwdays
"The role of CTO in a classical early-stage startup", Eugene Gusarov"The role of CTO in a classical early-stage startup", Eugene Gusarov
"The role of CTO in a classical early-stage startup", Eugene Gusarov
Fwdays31 vistas
"Cross-functional teams: what to do when a new hire doesn’t solve the busines... por Fwdays
"Cross-functional teams: what to do when a new hire doesn’t solve the busines..."Cross-functional teams: what to do when a new hire doesn’t solve the busines...
"Cross-functional teams: what to do when a new hire doesn’t solve the busines...
Fwdays30 vistas
"Ukrainian Mobile Banking Scaling in Practice. From 0 to 100 and beyond", Vad... por Fwdays
"Ukrainian Mobile Banking Scaling in Practice. From 0 to 100 and beyond", Vad..."Ukrainian Mobile Banking Scaling in Practice. From 0 to 100 and beyond", Vad...
"Ukrainian Mobile Banking Scaling in Practice. From 0 to 100 and beyond", Vad...
Fwdays40 vistas
"Thriving Culture in a Product Company — Practical Story", Volodymyr Tsukur por Fwdays
"Thriving Culture in a Product Company — Practical Story", Volodymyr Tsukur"Thriving Culture in a Product Company — Practical Story", Volodymyr Tsukur
"Thriving Culture in a Product Company — Practical Story", Volodymyr Tsukur
Fwdays40 vistas
"Fast Start to Building on AWS", Igor Ivaniuk por Fwdays
"Fast Start to Building on AWS", Igor Ivaniuk"Fast Start to Building on AWS", Igor Ivaniuk
"Fast Start to Building on AWS", Igor Ivaniuk
Fwdays36 vistas
"Quality Assurance: Achieving Excellence in startup without a Dedicated QA", ... por Fwdays
"Quality Assurance: Achieving Excellence in startup without a Dedicated QA", ..."Quality Assurance: Achieving Excellence in startup without a Dedicated QA", ...
"Quality Assurance: Achieving Excellence in startup without a Dedicated QA", ...
Fwdays33 vistas
"AI Startup Growth from Idea to 1M ARR", Oleksandr Uspenskyi por Fwdays
"AI Startup Growth from Idea to 1M ARR", Oleksandr Uspenskyi"AI Startup Growth from Idea to 1M ARR", Oleksandr Uspenskyi
"AI Startup Growth from Idea to 1M ARR", Oleksandr Uspenskyi
Fwdays26 vistas
"How we switched to Kanban and how it integrates with product planning", Vady... por Fwdays
"How we switched to Kanban and how it integrates with product planning", Vady..."How we switched to Kanban and how it integrates with product planning", Vady...
"How we switched to Kanban and how it integrates with product planning", Vady...
Fwdays61 vistas
"Bringing Flutter to Tide: a case study of a leading fintech platform in the ... por Fwdays
"Bringing Flutter to Tide: a case study of a leading fintech platform in the ..."Bringing Flutter to Tide: a case study of a leading fintech platform in the ...
"Bringing Flutter to Tide: a case study of a leading fintech platform in the ...
Fwdays23 vistas
"Shape Up: How to Develop Quickly and Avoid Burnout", Dmytro Popov por Fwdays
"Shape Up: How to Develop Quickly and Avoid Burnout", Dmytro Popov"Shape Up: How to Develop Quickly and Avoid Burnout", Dmytro Popov
"Shape Up: How to Develop Quickly and Avoid Burnout", Dmytro Popov
Fwdays60 vistas
"Role of a CTO in software outsourcing company", Yuriy Nakonechnyy por Fwdays
"Role of a CTO in software outsourcing company", Yuriy Nakonechnyy"Role of a CTO in software outsourcing company", Yuriy Nakonechnyy
"Role of a CTO in software outsourcing company", Yuriy Nakonechnyy
Fwdays40 vistas
From “T” to “E”, Dmytro Gryn por Fwdays
From “T” to “E”, Dmytro GrynFrom “T” to “E”, Dmytro Gryn
From “T” to “E”, Dmytro Gryn
Fwdays34 vistas
"Why I left React in my TypeScript projects and where ", Illya Klymov por Fwdays
"Why I left React in my TypeScript projects and where ",  Illya Klymov"Why I left React in my TypeScript projects and where ",  Illya Klymov
"Why I left React in my TypeScript projects and where ", Illya Klymov
Fwdays247 vistas
"KillTech project: through innovation to a winning capability", Yelyzaveta B... por Fwdays
"KillTech project: through innovation to a winning capability",  Yelyzaveta B..."KillTech project: through innovation to a winning capability",  Yelyzaveta B...
"KillTech project: through innovation to a winning capability", Yelyzaveta B...
Fwdays230 vistas
"Dude, where’s my boilerplate? ", Oleksii Makodzeba por Fwdays
"Dude, where’s my boilerplate? ", Oleksii Makodzeba"Dude, where’s my boilerplate? ", Oleksii Makodzeba
"Dude, where’s my boilerplate? ", Oleksii Makodzeba
Fwdays116 vistas
"Pixel-Pushing Pundit Challenges in 2023, or Non-functional Requirements for ... por Fwdays
"Pixel-Pushing Pundit Challenges in 2023, or Non-functional Requirements for ..."Pixel-Pushing Pundit Challenges in 2023, or Non-functional Requirements for ...
"Pixel-Pushing Pundit Challenges in 2023, or Non-functional Requirements for ...
Fwdays89 vistas
"Do you really need your test environment?", Vlad Kampov por Fwdays
"Do you really need your test environment?", Vlad Kampov "Do you really need your test environment?", Vlad Kampov
"Do you really need your test environment?", Vlad Kampov
Fwdays213 vistas
"Crafting a Third-Party Banking Library with Web Components and React", Germa... por Fwdays
"Crafting a Third-Party Banking Library with Web Components and React", Germa..."Crafting a Third-Party Banking Library with Web Components and React", Germa...
"Crafting a Third-Party Banking Library with Web Components and React", Germa...
Fwdays180 vistas
"Generating Types without climbing a tree", Matteo Collina por Fwdays
"Generating Types without climbing a tree", Matteo Collina "Generating Types without climbing a tree", Matteo Collina
"Generating Types without climbing a tree", Matteo Collina
Fwdays89 vistas
"You Keep Using That Word", Sam Newman por Fwdays
"You Keep Using That Word", Sam Newman"You Keep Using That Word", Sam Newman
"You Keep Using That Word", Sam Newman
Fwdays37 vistas

Último

JCon Live 2023 - Lice coding some integration problems por
JCon Live 2023 - Lice coding some integration problemsJCon Live 2023 - Lice coding some integration problems
JCon Live 2023 - Lice coding some integration problemsBernd Ruecker
67 vistas85 diapositivas
Combining Orchestration and Choreography for a Clean Architecture por
Combining Orchestration and Choreography for a Clean ArchitectureCombining Orchestration and Choreography for a Clean Architecture
Combining Orchestration and Choreography for a Clean ArchitectureThomasHeinrichs1
68 vistas24 diapositivas
Spesifikasi Lengkap ASUS Vivobook Go 14 por
Spesifikasi Lengkap ASUS Vivobook Go 14Spesifikasi Lengkap ASUS Vivobook Go 14
Spesifikasi Lengkap ASUS Vivobook Go 14Dot Semarang
35 vistas1 diapositiva
Future of Learning - Khoong Chan Meng por
Future of Learning - Khoong Chan MengFuture of Learning - Khoong Chan Meng
Future of Learning - Khoong Chan MengNUS-ISS
31 vistas7 diapositivas
.conf Go 2023 - Data analysis as a routine por
.conf Go 2023 - Data analysis as a routine.conf Go 2023 - Data analysis as a routine
.conf Go 2023 - Data analysis as a routineSplunk
90 vistas12 diapositivas
Empathic Computing: Delivering the Potential of the Metaverse por
Empathic Computing: Delivering  the Potential of the MetaverseEmpathic Computing: Delivering  the Potential of the Metaverse
Empathic Computing: Delivering the Potential of the MetaverseMark Billinghurst
449 vistas80 diapositivas

Último(20)

JCon Live 2023 - Lice coding some integration problems por Bernd Ruecker
JCon Live 2023 - Lice coding some integration problemsJCon Live 2023 - Lice coding some integration problems
JCon Live 2023 - Lice coding some integration problems
Bernd Ruecker67 vistas
Combining Orchestration and Choreography for a Clean Architecture por ThomasHeinrichs1
Combining Orchestration and Choreography for a Clean ArchitectureCombining Orchestration and Choreography for a Clean Architecture
Combining Orchestration and Choreography for a Clean Architecture
ThomasHeinrichs168 vistas
Spesifikasi Lengkap ASUS Vivobook Go 14 por Dot Semarang
Spesifikasi Lengkap ASUS Vivobook Go 14Spesifikasi Lengkap ASUS Vivobook Go 14
Spesifikasi Lengkap ASUS Vivobook Go 14
Dot Semarang35 vistas
Future of Learning - Khoong Chan Meng por NUS-ISS
Future of Learning - Khoong Chan MengFuture of Learning - Khoong Chan Meng
Future of Learning - Khoong Chan Meng
NUS-ISS31 vistas
.conf Go 2023 - Data analysis as a routine por Splunk
.conf Go 2023 - Data analysis as a routine.conf Go 2023 - Data analysis as a routine
.conf Go 2023 - Data analysis as a routine
Splunk90 vistas
Empathic Computing: Delivering the Potential of the Metaverse por Mark Billinghurst
Empathic Computing: Delivering  the Potential of the MetaverseEmpathic Computing: Delivering  the Potential of the Metaverse
Empathic Computing: Delivering the Potential of the Metaverse
Mark Billinghurst449 vistas
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV por Splunk
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV
Splunk86 vistas
Micron CXL product and architecture update por CXL Forum
Micron CXL product and architecture updateMicron CXL product and architecture update
Micron CXL product and architecture update
CXL Forum27 vistas
Transcript: The Details of Description Techniques tips and tangents on altern... por BookNet Canada
Transcript: The Details of Description Techniques tips and tangents on altern...Transcript: The Details of Description Techniques tips and tangents on altern...
Transcript: The Details of Description Techniques tips and tangents on altern...
BookNet Canada119 vistas
Samsung: CMM-H Tiered Memory Solution with Built-in DRAM por CXL Forum
Samsung: CMM-H Tiered Memory Solution with Built-in DRAMSamsung: CMM-H Tiered Memory Solution with Built-in DRAM
Samsung: CMM-H Tiered Memory Solution with Built-in DRAM
CXL Forum105 vistas
Liqid: Composable CXL Preview por CXL Forum
Liqid: Composable CXL PreviewLiqid: Composable CXL Preview
Liqid: Composable CXL Preview
CXL Forum121 vistas
Five Things You SHOULD Know About Postman por Postman
Five Things You SHOULD Know About PostmanFive Things You SHOULD Know About Postman
Five Things You SHOULD Know About Postman
Postman25 vistas
CXL at OCP por CXL Forum
CXL at OCPCXL at OCP
CXL at OCP
CXL Forum208 vistas
Business Analyst Series 2023 - Week 3 Session 5 por DianaGray10
Business Analyst Series 2023 -  Week 3 Session 5Business Analyst Series 2023 -  Week 3 Session 5
Business Analyst Series 2023 - Week 3 Session 5
DianaGray10165 vistas
Java Platform Approach 1.0 - Picnic Meetup por Rick Ossendrijver
Java Platform Approach 1.0 - Picnic MeetupJava Platform Approach 1.0 - Picnic Meetup
Java Platform Approach 1.0 - Picnic Meetup
Rick Ossendrijver25 vistas
Upskilling the Evolving Workforce with Digital Fluency for Tomorrow's Challen... por NUS-ISS
Upskilling the Evolving Workforce with Digital Fluency for Tomorrow's Challen...Upskilling the Evolving Workforce with Digital Fluency for Tomorrow's Challen...
Upskilling the Evolving Workforce with Digital Fluency for Tomorrow's Challen...
NUS-ISS23 vistas
AI: mind, matter, meaning, metaphors, being, becoming, life values por Twain Liu 刘秋艳
AI: mind, matter, meaning, metaphors, being, becoming, life valuesAI: mind, matter, meaning, metaphors, being, becoming, life values
AI: mind, matter, meaning, metaphors, being, becoming, life values
Data-centric AI and the convergence of data and model engineering: opportunit... por Paolo Missier
Data-centric AI and the convergence of data and model engineering:opportunit...Data-centric AI and the convergence of data and model engineering:opportunit...
Data-centric AI and the convergence of data and model engineering: opportunit...
Paolo Missier29 vistas
AMD: 4th Generation EPYC CXL Demo por CXL Forum
AMD: 4th Generation EPYC CXL DemoAMD: 4th Generation EPYC CXL Demo
AMD: 4th Generation EPYC CXL Demo
CXL Forum126 vistas

"Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan

  • 4. AGENDA - intro - why stateless is slow and less reliable - tools for building stateful services
  • 5. PART I intro to sportsbook domain and how we come to stateful
  • 6. Dynamo Kyiv vs Chelsea 2 : 1 Red card Score changed Odds changed PUSH PULL
  • 7. Dynamo Kyiv vs Chelsea 2 : 1 Red card Score changed Odds changed PUSH PULL - quite big payloads: 30 KB compressed data (1.5 MB uncompressed) - update rate: 2K RPS (per tenant) - user query rate: 3-4K RPS (per tenant) - live data is very dynamic: no much sense to cache it - data should be queryable: simple KV is not enough - we need secondary indexes
  • 8. 20 KB payload for concurrent read and write Redis, single node: 4vcpu - 8gb redis_write: 4K RPS, p75 = 543, p95 = 688, p99 = 842 redis read: 7K RPS, p75 = 970, p95 = 1278, p99 = 1597
  • 10. API Cache DB but Cache is not queryable
  • 11. API + DB Stateful Service
  • 15. How to handle a case when your data is larger than RAM? 10 GB 30 GB
  • 16. Solution 1: use memory DB that supports data larger than RAM 10 GB 20 GB
  • 17. UA PL FR Solution 2: use partition by tenant
  • 18. Solution 3: use range-based sharding users (1-500) users (501-1000) shard A shard B
  • 20. API + DB Stateful Service API Cache DB network latency network latency
  • 21. Latency Numbers Latency 2010 2020 Compress 1KB with Zippy 2μs 2μs Read 1 MB sequentially from RAM 30μs 3μs Read 1 MB sequentially from SSD 494μs 49μs Read 1 MB sequentially from disk 3ms 825μs Round trip within same datacenter 500μs 500μs Send packet CA -> Netherlands -> CA 150ms 150ms https://colin-scott.github.io/personal_website/research/interactive_latency.html
  • 22. API Cache DB CPU: for serialize/deserialize CPU: serialize/deserialize API + DB Stateful Service CPU for serialize (we don’t need to deserialize)
  • 23. API Cache DB CPU: for serialize/deserialize CPU for ASYNC request handling CPU: serialize/deserialize CPU: ASYNC request handling API + DB Stateful Service CPU for serialize (we don’t need to deserialize)
  • 24. API Cache DB CPU: for serialize/deserialize CPU for ASYNC request handling CPU: managing sockets CPU: serialize/deserialize CPU: ASYNC request handling CPU: managing sockets API + DB Stateful Service CPU for serialize (we don’t need to deserialize) CPU for managing sockets (only clients sockets )
  • 25. API Cache DB CPU: for serialize/deserialize CPU for ASYNC request handling CPU: managing sockets CPU: serialize/deserialize CPU: ASYNC request handling CPU: managing sockets API + DB Stateful Service CPU for serialize (we don’t need to deserialize) CPU for managing sockets (only clients sockets ) CPU for handling query (very cheap compared to serialization)
  • 26. API Cache DB CPU: for serialize/deserialize CPU for ASYNC request handling CPU: managing sockets Overreads CPU: serialize/deserialize CPU: ASYNC request handling CPU: managing sockets API + DB Stateful Service CPU for serialize (we don’t need to deserialize) CPU for managing sockets (only clients sockets ) CPU for handling query (very cheap compared to serialization)
  • 32. Object hit rate / Transactional hit rate
  • 33. A B C API In order to fulfill our transactional flow we need to fetch records: A, B, C Record A and B will not impact our latency Overall Latency = Latency of record C
  • 36. Most existing cache eviction algorithms focus on maximizing object hit rate, or the fraction of single object requests served from cache. However, this approach fails to capture the inter-object dependencies within transactions.
  • 38. async / await Imagine that we run Redis on localhost. Even with such setup we usually use async request handling.
  • 39. public void SimpleMethod() { var k = 0; for (int i = 0; i < Iterations; i++) { k = Add(i, i); } } [MethodImpl(MethodImplOptions.NoInlining)] private int Add(int a, int b) => a + b;
  • 40. public async Task SimpleMethodAsync() { var k = 0; for (int i = 0; i < Iterations; i++) { k = await AddAsync(i, i); } } private Task<int> AddAsync(int a, int b) { return Task.FromResult(a + b); }
  • 41. public async Task SimpleMethodAsyncYield() { var k = 0; for (int i = 0; i < Iterations; i++) { k = await AddAsync(i, i); } } private async Task<int> AddAsync(int a, int b) { await Task.Yield(); return a + b; }
  • 42. public async Task SimpleMethodAsyncYield() { var k = 0; for (int i = 0; i < Iterations; i++) { k = await AddAsync(i, i); } } private async Task<int> AddAsync(int a, int b) { await Task.Yield(); return await Task.Run(() => a + b); }
  • 45. PART III why stateless is less reliable
  • 46. API Cache DB API + DB Stateful Service We have a higher probability of failure
  • 47. API Cache DB circuit breaker retry fallback timeout bulkhead isolation circuit breaker retry fallback timeout bulkhead isolation API + DB Stateful Service
  • 48. API Cache DB What about cache invalidation and data consistency? API + DB Stateful Service
  • 49. API Cache DB What about the predictable scale-out? Will your RPS increase if you add an additional API or Cache node? API + DB Stateful Service
  • 51. - Metastable failures occur in open systems with an uncontrolled source of load where a trigger causes the system to enter a bad state that persists even when the trigger is removed. - Paradoxically, the root cause of these failures is often features that improve the efficiency or reliability of the system. - The characteristic of a metastable failure is that the sustaining effect keeps the system in the metastable failure state even after the trigger is removed.
  • 52. At least 4 out of 15 major outages in the last decade at Amazon Web Services were caused by metastable failures.
  • 54. PART IV tools for building stateful services
  • 55. distributed log with sync replication
  • 59. Dynamo Kyiv vs Chelsea 2 : 1 Red card Score changed Odds changed PUSH PULL - quite big payloads: 30 KB compressed data (1.5 MB uncompressed) - update rate: 2K RPS (per tenant) - user query rate: 3-4K RPS (per tenant) - live data is very dynamic: no much sense to cache it - data should be queryable: simple KV is not enough - we need secondary indexes At pick to handle big load for 1 tenant we have: 5-10 nodes, 0.5-2 CPU, 6GB RAM