SlideShare una empresa de Scribd logo
1 de 28
Brought to you by
Taming Go's Memory Usage –
and Avoiding a Rust Rewrite
Mark Gritter
Founding Engineer at Akita Software
Mark Gritter
Founding Engineer at Akita Software
■ Previously built VM-aware flash storage arrays at Tintri
■ Trying to build tools that help developers with performance
and correctness!
■ Hobbies: gardening, weaving, math
The Problem
The Akita Agent
libpcap
(packet
capture)
go-packet
(TCP
reassembly)
http
parsing &
translation
obfuscation upload
Oops
Agent Resident Set Size (RSS), GB, as reported by DataDog
The Goal: Predictable
& Stable Memory Usage
Options
1. Impose a cap on how much
memory is used; just restart
when we go over.
… can’t do that from within Go. And the system
administrators we talked to suggested
per-container limits had bad behavior.
Options
1. Impose a cap on how much
memory is used; just restart
when we go over.
2. Rewrite the whole thing in a
language that gave us more
control over memory
management.
… can’t do that from within Go. And the system
administrators we talked to suggested
per-container limits had bad behavior.
… how many months of effort? And no
guarantee of success at the end. We didn’t have
a nice packet-handling library like go-packet
ready to drop in.
Options
1. Impose a cap on how much
memory is used; just restart
when we go over.
2. Rewrite the whole thing in a
language that gave us more
control over memory
management.
3. Find and fix everything we were
doing wrong.
… can’t do that from within Go. And the system
administrators we talked to suggested
per-container limits had bad behavior.
… how many months of effort? And no
guarantee of success at the end. We didn’t have
a nice packet-handling library like go-packet
ready to drop in.
… unknown scope of effort.
Understanding Go’s GC –
and Our Use of it!
Go’s Garbage Collector
Primary focus: low performance overhead! (and it’s good at that!)
On the other hand: very few knobs to turn; tool ecosystem less mature than Java.
■ GOGC = what percentage of live memory to allocate before starting next
sweep.
■ For example, if live data is 200MB, then GOGC=100% means we can allocate
200MB before any memory is reclaimed.
■ Default setting means RSS >= 2 * heap size, at least.
Last live memory New allocations
Allocated during GC
Go’s Built-in Profiling
Go’s built-in pprof support can measure:
■ Size of live heap, at time of measurement (inuse_space, inuse_objects)
■ Allocations made since program start (alloc_space, alloc_objects)
For each of those, a call-stack (of limited depth) is available showing which
function calls led to which memory allocations.
■ This is not always what you want to know: sometimes what you need is:
which object is keeping those allocations live!
Page cache?
Go-packet is generally
good about not doing
excess copies.
But, when a packet is
missing we need a
place to store data until
it is (hopefully)
retransmitted.
First Fixes
■ Limit total page cache size
■ Limit per-connection buffering to only about a few RTT’s worth
■ Upgrade to a newer version that releases page-cache entries back to the
heap. (This won’t help with spikes, but will ensure they aren’t permanent.)
■ More aggressively expire TCP connections and flush partial data to the
parsing layer.
1 GiB is Better than 6 GiB Any Day, but Not Good
Heap numbers looked good, but spikes persisted.
Time to Look at Total Allocations
This is Where it Gets Murky
The allocation profile shows lots of hot-spots that are allocating lots of memory.
But are they contributing to spikes in memory use?
Interpretation: 8.5ms stop-the-world, 79ms concurrent mark and scan, 0.049ms
“mark termination”. The heap was 88 MB at the start of the sweep, 102 MB at the
end of GC, and contained 77 MB of live data.
Conclusion: we allocated 20% of our heap in just the 79ms the GC was running!
2021-08-03T22:21:51.946Z,i-049b3fd0dde1cf672,cli,"gc 504 @713.457s 0%:
8.5+79+0.049 ms clock, 34+13/78/78+0.19 ms cpu, 88->102->77 MB, 95 MB goal, 4 P"
Progress Without Progress
Nodes drop of the allocation tree, but I can still see the spikes in DataDog.
■ Remove re-initialization of regular expressions.
■ Rewrite our visitor to use a pre-allocated stack rather than allocating objects
every time it recursed.
● Then fix the subsidiary problems that this allocation was hiding.
● Lazily create slices on-demand rather than have them pre-built in the visitor context.
This suggests the GC was actually handling these fine! But perhaps this work was
necessary to understand the real causes.
Example
flat flat% sum% cum cum%
7562.56MB 27.14% 27.14% 7562.56MB 27.14% stackVisitorContext.appendPath
flat flat% sum% cum cum%
1225.56MB 5.99% 23.87% 2439.59MB 11.93% stackVisitorContext.EnterStruct
892.03MB 4.36% 33.36% 892.03MB 4.36% stackVisitorContext.appendPath
Simulating the Tool I Wished I Had
What I really want is: What were the allocations that led up to an increased RSS
size?
■ I don’t care if I allocate a lot of memory as long as the GC is good at reclaiming it.
■ I do care if I have to get more memory from the operating system, increasing my
footprint
Solution: Grab the heap dump periodically (I used every minute), wait for a spike,
look at difference in alloc_bytes between the two heap dumps.
normal spike
Biting the Bullet
One of our big contributors was objecthash-proto.
■ A library which hashed arbitrary protobufs (which we use for our IR).
■ It makes heavy use of reflection.
■ Reflection on a struct requires extensive memory allocation.
■ (Why? I don’t know, though I could make some guesses.)
Write a code generator to preserve same behavior, but hashing functions specific
to our protobufs.
BenchmarkWitnessHash-8 15476 76078 ns/op 18349 B/op 947 allocs/op
BenchmarkWitnessOldHash-8 7077 173922 ns/op 48664 B/op 1561 allocs/op
Finally
This was the first fix that actually made a qualitative difference!
One More
Showing nodes accounting for 419.70MB, 87.98% of 477.03MB total
Dropped 129 nodes (cum <= 2.39MB)
Showing top 10 nodes out of 114
flat flat% sum% cum cum%
231.14MB 48.45% 48.45% 234.14MB 49.08% io.ReadAll
52.93MB 11.10% 59.55% 53.43MB 11.20% gopacket...ReadPacketData
51.45MB 10.79% 70.33% 123.88MB 25.97% gopacket...NextPacket
42.42MB 8.89% 79.23% 42.42MB 8.89% bytes.makeSlice
Half the allocations coming from one function?
This turned out to be a buffer between decompression and parsing the HTTP body.
Conclusion
The Akita Agent
libpcap
(packet
capture)
go-packet
(TCP
reassembly)
http
parsing &
translation
obfuscation upload
< 280MiB
99.9th percentile, on our internal dogfooding
(But, unfortunately, we found some additional problem cases since then.)
Lessons
Reduce fixed overhead (every live byte in the heap costs two in RSS).
Profile allocation, not just fixed data.
Stream, don’t buffer.
Replace frequent, small allocations.
(This is the one that leads to the least idiomatic Go code).
Avoid generic libraries with unpredictable memory costs.
Find a way to simulate the tool you wish you had.
Brought to you by
Mark Gritter
mgritter@akitasoftware.com
@markgritter

Más contenido relacionado

Similar a Taming Go's Memory Usage — and Avoiding a Rust Rewrite

Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...Flink Forward
 
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...Fred de Villamil
 
3 Flink Mistakes We Made So You Won't Have To
3 Flink Mistakes We Made So You Won't Have To3 Flink Mistakes We Made So You Won't Have To
3 Flink Mistakes We Made So You Won't Have ToHostedbyConfluent
 
Profiler Guided Java Performance Tuning
Profiler Guided Java Performance TuningProfiler Guided Java Performance Tuning
Profiler Guided Java Performance Tuningosa_ora
 
Parallelism in a NumPy-based program
Parallelism in a NumPy-based programParallelism in a NumPy-based program
Parallelism in a NumPy-based programRalf Gommers
 
JVM Performance Tuning
JVM Performance TuningJVM Performance Tuning
JVM Performance TuningJeremy Leisy
 
Google File System
Google File SystemGoogle File System
Google File SystemDreamJobs1
 
Low level java programming
Low level java programmingLow level java programming
Low level java programmingPeter Lawrey
 
Performance optimization techniques for Java code
Performance optimization techniques for Java codePerformance optimization techniques for Java code
Performance optimization techniques for Java codeAttila Balazs
 
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsSpark Summit
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationshadooparchbook
 
Gopher in performance_tales_ms_go_cracow
Gopher in performance_tales_ms_go_cracowGopher in performance_tales_ms_go_cracow
Gopher in performance_tales_ms_go_cracowMateuszSzczyrzyca
 
Scaling an ELK stack at bol.com
Scaling an ELK stack at bol.comScaling an ELK stack at bol.com
Scaling an ELK stack at bol.comRenzo Tomà
 
Advanced off heap ipc
Advanced off heap ipcAdvanced off heap ipc
Advanced off heap ipcPeter Lawrey
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsmarkgrover
 
Fengqi.asia Cloud advantages
Fengqi.asia Cloud advantagesFengqi.asia Cloud advantages
Fengqi.asia Cloud advantagesAndrew Wong
 

Similar a Taming Go's Memory Usage — and Avoiding a Rust Rewrite (20)

Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
 
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
 
3 Flink Mistakes We Made So You Won't Have To
3 Flink Mistakes We Made So You Won't Have To3 Flink Mistakes We Made So You Won't Have To
3 Flink Mistakes We Made So You Won't Have To
 
Profiler Guided Java Performance Tuning
Profiler Guided Java Performance TuningProfiler Guided Java Performance Tuning
Profiler Guided Java Performance Tuning
 
Parallelism in a NumPy-based program
Parallelism in a NumPy-based programParallelism in a NumPy-based program
Parallelism in a NumPy-based program
 
JVM Performance Tuning
JVM Performance TuningJVM Performance Tuning
JVM Performance Tuning
 
Google File System
Google File SystemGoogle File System
Google File System
 
Low level java programming
Low level java programmingLow level java programming
Low level java programming
 
Performance optimization techniques for Java code
Performance optimization techniques for Java codePerformance optimization techniques for Java code
Performance optimization techniques for Java code
 
Shootout at the PAAS Corral
Shootout at the PAAS CorralShootout at the PAAS Corral
Shootout at the PAAS Corral
 
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark Applications
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applications
 
Gopher in performance_tales_ms_go_cracow
Gopher in performance_tales_ms_go_cracowGopher in performance_tales_ms_go_cracow
Gopher in performance_tales_ms_go_cracow
 
Mysql talk
Mysql talkMysql talk
Mysql talk
 
Scaling an ELK stack at bol.com
Scaling an ELK stack at bol.comScaling an ELK stack at bol.com
Scaling an ELK stack at bol.com
 
Java performance tuning
Java performance tuningJava performance tuning
Java performance tuning
 
Advanced off heap ipc
Advanced off heap ipcAdvanced off heap ipc
Advanced off heap ipc
 
Spark Tips & Tricks
Spark Tips & TricksSpark Tips & Tricks
Spark Tips & Tricks
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applications
 
Fengqi.asia Cloud advantages
Fengqi.asia Cloud advantagesFengqi.asia Cloud advantages
Fengqi.asia Cloud advantages
 

Más de ScyllaDB

What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLScyllaDB
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasScyllaDB
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasScyllaDB
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...ScyllaDB
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...ScyllaDB
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaScyllaDB
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityScyllaDB
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptxScyllaDB
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDBScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationScyllaDB
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsScyllaDB
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesScyllaDB
 
ScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB
 
DBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsDBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsScyllaDB
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBScyllaDB
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101ScyllaDB
 
Top NoSQL Data Modeling Mistakes
Top NoSQL Data Modeling MistakesTop NoSQL Data Modeling Mistakes
Top NoSQL Data Modeling MistakesScyllaDB
 

Más de ScyllaDB (20)

What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQL
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & Pitfalls
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
 
ScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual Workshop
 
DBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsDBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & Tradeoffs
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDB
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101
 
Top NoSQL Data Modeling Mistakes
Top NoSQL Data Modeling MistakesTop NoSQL Data Modeling Mistakes
Top NoSQL Data Modeling Mistakes
 

Último

[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Mark Simos
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Jeffrey Haguewood
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Karmanjay Verma
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...Karmanjay Verma
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 

Último (20)

[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 

Taming Go's Memory Usage — and Avoiding a Rust Rewrite

  • 1. Brought to you by Taming Go's Memory Usage – and Avoiding a Rust Rewrite Mark Gritter Founding Engineer at Akita Software
  • 2. Mark Gritter Founding Engineer at Akita Software ■ Previously built VM-aware flash storage arrays at Tintri ■ Trying to build tools that help developers with performance and correctness! ■ Hobbies: gardening, weaving, math
  • 5. Oops Agent Resident Set Size (RSS), GB, as reported by DataDog
  • 6. The Goal: Predictable & Stable Memory Usage
  • 7. Options 1. Impose a cap on how much memory is used; just restart when we go over. … can’t do that from within Go. And the system administrators we talked to suggested per-container limits had bad behavior.
  • 8. Options 1. Impose a cap on how much memory is used; just restart when we go over. 2. Rewrite the whole thing in a language that gave us more control over memory management. … can’t do that from within Go. And the system administrators we talked to suggested per-container limits had bad behavior. … how many months of effort? And no guarantee of success at the end. We didn’t have a nice packet-handling library like go-packet ready to drop in.
  • 9. Options 1. Impose a cap on how much memory is used; just restart when we go over. 2. Rewrite the whole thing in a language that gave us more control over memory management. 3. Find and fix everything we were doing wrong. … can’t do that from within Go. And the system administrators we talked to suggested per-container limits had bad behavior. … how many months of effort? And no guarantee of success at the end. We didn’t have a nice packet-handling library like go-packet ready to drop in. … unknown scope of effort.
  • 10. Understanding Go’s GC – and Our Use of it!
  • 11. Go’s Garbage Collector Primary focus: low performance overhead! (and it’s good at that!) On the other hand: very few knobs to turn; tool ecosystem less mature than Java. ■ GOGC = what percentage of live memory to allocate before starting next sweep. ■ For example, if live data is 200MB, then GOGC=100% means we can allocate 200MB before any memory is reclaimed. ■ Default setting means RSS >= 2 * heap size, at least. Last live memory New allocations Allocated during GC
  • 12. Go’s Built-in Profiling Go’s built-in pprof support can measure: ■ Size of live heap, at time of measurement (inuse_space, inuse_objects) ■ Allocations made since program start (alloc_space, alloc_objects) For each of those, a call-stack (of limited depth) is available showing which function calls led to which memory allocations. ■ This is not always what you want to know: sometimes what you need is: which object is keeping those allocations live!
  • 13. Page cache? Go-packet is generally good about not doing excess copies. But, when a packet is missing we need a place to store data until it is (hopefully) retransmitted.
  • 14. First Fixes ■ Limit total page cache size ■ Limit per-connection buffering to only about a few RTT’s worth ■ Upgrade to a newer version that releases page-cache entries back to the heap. (This won’t help with spikes, but will ensure they aren’t permanent.) ■ More aggressively expire TCP connections and flush partial data to the parsing layer.
  • 15. 1 GiB is Better than 6 GiB Any Day, but Not Good Heap numbers looked good, but spikes persisted.
  • 16. Time to Look at Total Allocations
  • 17. This is Where it Gets Murky The allocation profile shows lots of hot-spots that are allocating lots of memory. But are they contributing to spikes in memory use? Interpretation: 8.5ms stop-the-world, 79ms concurrent mark and scan, 0.049ms “mark termination”. The heap was 88 MB at the start of the sweep, 102 MB at the end of GC, and contained 77 MB of live data. Conclusion: we allocated 20% of our heap in just the 79ms the GC was running! 2021-08-03T22:21:51.946Z,i-049b3fd0dde1cf672,cli,"gc 504 @713.457s 0%: 8.5+79+0.049 ms clock, 34+13/78/78+0.19 ms cpu, 88->102->77 MB, 95 MB goal, 4 P"
  • 18. Progress Without Progress Nodes drop of the allocation tree, but I can still see the spikes in DataDog. ■ Remove re-initialization of regular expressions. ■ Rewrite our visitor to use a pre-allocated stack rather than allocating objects every time it recursed. ● Then fix the subsidiary problems that this allocation was hiding. ● Lazily create slices on-demand rather than have them pre-built in the visitor context. This suggests the GC was actually handling these fine! But perhaps this work was necessary to understand the real causes.
  • 19. Example flat flat% sum% cum cum% 7562.56MB 27.14% 27.14% 7562.56MB 27.14% stackVisitorContext.appendPath flat flat% sum% cum cum% 1225.56MB 5.99% 23.87% 2439.59MB 11.93% stackVisitorContext.EnterStruct 892.03MB 4.36% 33.36% 892.03MB 4.36% stackVisitorContext.appendPath
  • 20. Simulating the Tool I Wished I Had What I really want is: What were the allocations that led up to an increased RSS size? ■ I don’t care if I allocate a lot of memory as long as the GC is good at reclaiming it. ■ I do care if I have to get more memory from the operating system, increasing my footprint Solution: Grab the heap dump periodically (I used every minute), wait for a spike, look at difference in alloc_bytes between the two heap dumps. normal spike
  • 21. Biting the Bullet One of our big contributors was objecthash-proto. ■ A library which hashed arbitrary protobufs (which we use for our IR). ■ It makes heavy use of reflection. ■ Reflection on a struct requires extensive memory allocation. ■ (Why? I don’t know, though I could make some guesses.) Write a code generator to preserve same behavior, but hashing functions specific to our protobufs. BenchmarkWitnessHash-8 15476 76078 ns/op 18349 B/op 947 allocs/op BenchmarkWitnessOldHash-8 7077 173922 ns/op 48664 B/op 1561 allocs/op
  • 22. Finally This was the first fix that actually made a qualitative difference!
  • 23. One More Showing nodes accounting for 419.70MB, 87.98% of 477.03MB total Dropped 129 nodes (cum <= 2.39MB) Showing top 10 nodes out of 114 flat flat% sum% cum cum% 231.14MB 48.45% 48.45% 234.14MB 49.08% io.ReadAll 52.93MB 11.10% 59.55% 53.43MB 11.20% gopacket...ReadPacketData 51.45MB 10.79% 70.33% 123.88MB 25.97% gopacket...NextPacket 42.42MB 8.89% 79.23% 42.42MB 8.89% bytes.makeSlice Half the allocations coming from one function? This turned out to be a buffer between decompression and parsing the HTTP body.
  • 26. < 280MiB 99.9th percentile, on our internal dogfooding (But, unfortunately, we found some additional problem cases since then.)
  • 27. Lessons Reduce fixed overhead (every live byte in the heap costs two in RSS). Profile allocation, not just fixed data. Stream, don’t buffer. Replace frequent, small allocations. (This is the one that leads to the least idiomatic Go code). Avoid generic libraries with unpredictable memory costs. Find a way to simulate the tool you wish you had.
  • 28. Brought to you by Mark Gritter mgritter@akitasoftware.com @markgritter