SlideShare una empresa de Scribd logo
1 de 10
GC-stall and Page Scan Attacks
by Linux
Cuong Tran
LinkedIn Performance Group
Agenda
• GC attacks by Linux
• Page scan attacks by Linux
• Recommendations
Examples of
GC attacks by Linux
• 2013-10-05T05:01:04.179+0000:…. : 216982K>9328K(256000K), 0.0666320 secs] 377835K-

>170188K(768000K), 0.0675850 secs] [Times:

user=0.17

sys=0.00, real=3.18 secs]
• 2013-09-19T06:14:03.632+0000: 44372.834: [GC [1 CMS-initial-mark:
703914K(921600K)] 718372K(1433600K), 126.1196340 secs] [Times:

user=0.00 sys=127.31, real=126.10 secs]
• GC stopped the world for minutes but:
– Did no real work (CPU time in user mode = 0)
– Burned cycles in Linux kernel
GC attacks by Linux
• IO starvation
– Symptom: GC log shows “low user time, low system
time, long GC pause”.
– Cause: GC threads stuck in kernel waiting for
IO, usually due to journal commits or FS flush of
changes by gzip of log rolling

• Memory starvation.
– Symptom: GC log shows “Low user time, high system
time, long GC pause”
– Cause: Memory pressure triggers swapping or
scanning for free memory

4
Solutions for GC-attacks
• IO Starvation
– Strategy: Even out workload to disk drives (flush every 5 s rather
than 30 s)
sysctl –w vm.dirty_writeback_centisecs = 500
sysctl –w vm.dirty_expire_centisecs = 500

– In progress: Direct IO with gzip or gzip as-you-go

• Memory Starvation
– Strategy: Pre-allocate memory to JVM heap and protect it
against swapping or scanning
– Turn on –XX:+AlwaysPreTouch option in JVM
– Sysctl –w vm.swappiness=0 to protect heap and
anonymous memory
– JVM start up has 2 second delay to allocate all memory (17GB)

5
Page scan attacks by Linux
Measured: 7,000,000 scans/sec
Stall: 2+ minutes
Goal: 0 scans/sec

6
Cause : Page Scan Attacks

Transparent Huge Page (THP)
• A Redhat enhancement for performance
–
–
–
–

2MB huge pages vs. 4KB regular pages
Less TLB miss and page table walk
Only work for anonymous memory (malloc)
Improve 10% performance for SPECjbb, app server workload

• But THP can degrade performance severely
– Collapsing, Compacting, Splitting, Migration
– Very high pgscand/s
– Very busy khugepaged
– Very high system time when process compacts memory or
khugepaged runs

• THP optimization can increase GC stall time by minutes
Cause : Page Scan Attacks

NUMA Optimization
• A Linux optimization for NUMA
– 2 CPU sockets, each having 12 cores and local memory.
– Memory accessible by all 24 cores but local memory is faster
– Linux tries to allocate local memory to application
threads, i.e., from local zone
– Best suited for applications that can fit in one local zone

• NUMA optimization can degrade performance severely
– Very high pgscand/s
– Linux zone-reclaim insists on finding memory on local
zone although memory is plentiful on the other zone
– Linux migrates memory including THP, creating a viscous cycle of
breaking up 2 MB pages, scanning for 4 KB free pages, and reassembling 4KB into 2 MB pages
Cause : Page Scan Attacks

Solutions
• Turn off THP optimization and thus

khugepaged
– echo never >
/sys/kernel/mm/redhat_transparent_hugepa
ge/enabled

– Will not affect file-IO or memory mapped files
– Redhat, Oracle, Hadoop recommends no THP

• Turn off zone-reclaim optimization
– sysctl –w vm.zone_reclaim_mode=0

– Twitter recommends NUMA interleaving
9
Recommendations
• Gate keepers: SRE and SysOps
• Safe to roll-out fixes for GC attacks now
– Linux: Flush changes more frequently and protect heap
• sysctl –w vm.dirty_writeback_centisecs = 500
• sysctl –w vm.dirty_expire_centisecs = 500

• sysctl –w vm.swappiness=0
– JVM: Give JVM heap all memory it needs when started
• –XX:+AlwaysPreTouch
• Heap size per AutoTune

• Gradual roll-out fixes of page scan attacks.
– Best for back-end servers
– Linux: Turn off THP and NUMA optimization
• echo never >
/sys/kernel/mm/redhat_transparent_hugepage/enabled
• sysctl –w vm.zone_reclaim_mode = 0

– Work with product groups to test on small group of servers before
applying changes to the rest

Más contenido relacionado

La actualidad más candente

Physical Memory Models.pdf
Physical Memory Models.pdfPhysical Memory Models.pdf
Physical Memory Models.pdfAdrian Huang
 
Pwning mobile apps without root or jailbreak
Pwning mobile apps without root or jailbreakPwning mobile apps without root or jailbreak
Pwning mobile apps without root or jailbreakAbraham Aranguren
 
Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at NetflixBrendan Gregg
 
Slab Allocator in Linux Kernel
Slab Allocator in Linux KernelSlab Allocator in Linux Kernel
Slab Allocator in Linux KernelAdrian Huang
 
Building Better Backdoors with WMI - DerbyCon 2017
Building Better Backdoors with WMI - DerbyCon 2017Building Better Backdoors with WMI - DerbyCon 2017
Building Better Backdoors with WMI - DerbyCon 2017Alexander Polce Leary
 
New Ways to Find Latency in Linux Using Tracing
New Ways to Find Latency in Linux Using TracingNew Ways to Find Latency in Linux Using Tracing
New Ways to Find Latency in Linux Using TracingScyllaDB
 
Linux MMAP & Ioremap introduction
Linux MMAP & Ioremap introductionLinux MMAP & Ioremap introduction
Linux MMAP & Ioremap introductionGene Chang
 
Performance Tuning EC2 Instances
Performance Tuning EC2 InstancesPerformance Tuning EC2 Instances
Performance Tuning EC2 InstancesBrendan Gregg
 
BPF - in-kernel virtual machine
BPF - in-kernel virtual machineBPF - in-kernel virtual machine
BPF - in-kernel virtual machineAlexei Starovoitov
 
Racing The Web - Hackfest 2016
Racing The Web - Hackfest 2016Racing The Web - Hackfest 2016
Racing The Web - Hackfest 2016Aaron Hnatiw
 
Stop the Guessing: Performance Methodologies for Production Systems
Stop the Guessing: Performance Methodologies for Production SystemsStop the Guessing: Performance Methodologies for Production Systems
Stop the Guessing: Performance Methodologies for Production SystemsBrendan Gregg
 
An Overview of Deserialization Vulnerabilities in the Java Virtual Machine (J...
An Overview of Deserialization Vulnerabilities in the Java Virtual Machine (J...An Overview of Deserialization Vulnerabilities in the Java Virtual Machine (J...
An Overview of Deserialization Vulnerabilities in the Java Virtual Machine (J...joaomatosf_
 
OWASP AppSecCali 2015 - Marshalling Pickles
OWASP AppSecCali 2015 - Marshalling PicklesOWASP AppSecCali 2015 - Marshalling Pickles
OWASP AppSecCali 2015 - Marshalling PicklesChristopher Frohoff
 
semaphore & mutex.pdf
semaphore & mutex.pdfsemaphore & mutex.pdf
semaphore & mutex.pdfAdrian Huang
 
Let's Learn to Talk to GC Logs in Java 9
Let's Learn to Talk to GC Logs in Java 9Let's Learn to Talk to GC Logs in Java 9
Let's Learn to Talk to GC Logs in Java 9Poonam Bajaj Parhar
 
Linux memory consumption
Linux memory consumptionLinux memory consumption
Linux memory consumptionhaish
 
Linux kernel debugging
Linux kernel debuggingLinux kernel debugging
Linux kernel debuggingHao-Ran Liu
 

La actualidad más candente (20)

Physical Memory Models.pdf
Physical Memory Models.pdfPhysical Memory Models.pdf
Physical Memory Models.pdf
 
Pwning mobile apps without root or jailbreak
Pwning mobile apps without root or jailbreakPwning mobile apps without root or jailbreak
Pwning mobile apps without root or jailbreak
 
Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at Netflix
 
Slab Allocator in Linux Kernel
Slab Allocator in Linux KernelSlab Allocator in Linux Kernel
Slab Allocator in Linux Kernel
 
Building Better Backdoors with WMI - DerbyCon 2017
Building Better Backdoors with WMI - DerbyCon 2017Building Better Backdoors with WMI - DerbyCon 2017
Building Better Backdoors with WMI - DerbyCon 2017
 
New Ways to Find Latency in Linux Using Tracing
New Ways to Find Latency in Linux Using TracingNew Ways to Find Latency in Linux Using Tracing
New Ways to Find Latency in Linux Using Tracing
 
Linux MMAP & Ioremap introduction
Linux MMAP & Ioremap introductionLinux MMAP & Ioremap introduction
Linux MMAP & Ioremap introduction
 
Performance Tuning EC2 Instances
Performance Tuning EC2 InstancesPerformance Tuning EC2 Instances
Performance Tuning EC2 Instances
 
BPF - in-kernel virtual machine
BPF - in-kernel virtual machineBPF - in-kernel virtual machine
BPF - in-kernel virtual machine
 
Introduction to Perf
Introduction to PerfIntroduction to Perf
Introduction to Perf
 
Racing The Web - Hackfest 2016
Racing The Web - Hackfest 2016Racing The Web - Hackfest 2016
Racing The Web - Hackfest 2016
 
Defending Your "Gold"
Defending Your "Gold"Defending Your "Gold"
Defending Your "Gold"
 
Stop the Guessing: Performance Methodologies for Production Systems
Stop the Guessing: Performance Methodologies for Production SystemsStop the Guessing: Performance Methodologies for Production Systems
Stop the Guessing: Performance Methodologies for Production Systems
 
An Overview of Deserialization Vulnerabilities in the Java Virtual Machine (J...
An Overview of Deserialization Vulnerabilities in the Java Virtual Machine (J...An Overview of Deserialization Vulnerabilities in the Java Virtual Machine (J...
An Overview of Deserialization Vulnerabilities in the Java Virtual Machine (J...
 
OWASP AppSecCali 2015 - Marshalling Pickles
OWASP AppSecCali 2015 - Marshalling PicklesOWASP AppSecCali 2015 - Marshalling Pickles
OWASP AppSecCali 2015 - Marshalling Pickles
 
semaphore & mutex.pdf
semaphore & mutex.pdfsemaphore & mutex.pdf
semaphore & mutex.pdf
 
Let's Learn to Talk to GC Logs in Java 9
Let's Learn to Talk to GC Logs in Java 9Let's Learn to Talk to GC Logs in Java 9
Let's Learn to Talk to GC Logs in Java 9
 
Linux memory consumption
Linux memory consumptionLinux memory consumption
Linux memory consumption
 
Linux kernel debugging
Linux kernel debuggingLinux kernel debugging
Linux kernel debugging
 
AndroidとSELinux
AndroidとSELinuxAndroidとSELinux
AndroidとSELinux
 

Destacado

Voldemort on Solid State Drives
Voldemort on Solid State DrivesVoldemort on Solid State Drives
Voldemort on Solid State DrivesVinoth Chandar
 
Cassandra and Solid State Drives
Cassandra and Solid State DrivesCassandra and Solid State Drives
Cassandra and Solid State DrivesDataStax Academy
 
Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013Julien Le Dem
 
2015 08-20-criu support-in_docker_for_native_checkpoint_and_restore
2015 08-20-criu support-in_docker_for_native_checkpoint_and_restore2015 08-20-criu support-in_docker_for_native_checkpoint_and_restore
2015 08-20-criu support-in_docker_for_native_checkpoint_and_restoreSaied Kazemi
 
OS caused Large JVM pauses: Deep dive and solutions
OS caused Large JVM pauses: Deep dive and solutionsOS caused Large JVM pauses: Deep dive and solutions
OS caused Large JVM pauses: Deep dive and solutionsZhenyun Zhuang
 
Specification-Based Test Program Generation for ARM VMSAv8-64 MMUs
Specification-Based Test Program Generation for ARM VMSAv8-64 MMUsSpecification-Based Test Program Generation for ARM VMSAv8-64 MMUs
Specification-Based Test Program Generation for ARM VMSAv8-64 MMUsAlexander Kamkin
 
Kernel Recipes 2016 - Kernel documentation: what we have and where it’s going
Kernel Recipes 2016 - Kernel documentation: what we have and where it’s goingKernel Recipes 2016 - Kernel documentation: what we have and where it’s going
Kernel Recipes 2016 - Kernel documentation: what we have and where it’s goingAnne Nicolas
 
Tackling the Management Challenges of Server Consolidation on Multi-core Systems
Tackling the Management Challenges of Server Consolidation on Multi-core SystemsTackling the Management Challenges of Server Consolidation on Multi-core Systems
Tackling the Management Challenges of Server Consolidation on Multi-core SystemsThe Linux Foundation
 
Reverse engineering for_beginners-en
Reverse engineering for_beginners-enReverse engineering for_beginners-en
Reverse engineering for_beginners-enAndri Yabu
 
BKK16-404A PCI Development Meeting
BKK16-404A PCI Development MeetingBKK16-404A PCI Development Meeting
BKK16-404A PCI Development MeetingLinaro
 
Virtualization overheads
Virtualization overheadsVirtualization overheads
Virtualization overheadsSandeep Joshi
 
Docker and friends at Linux Days 2014 in Prague
Docker and friends at Linux Days 2014 in PragueDocker and friends at Linux Days 2014 in Prague
Docker and friends at Linux Days 2014 in Praguetomasbart
 
Linux numa evolution
Linux numa evolutionLinux numa evolution
Linux numa evolutionLukas Pirl
 
BKK16-104 sched-freq
BKK16-104 sched-freqBKK16-104 sched-freq
BKK16-104 sched-freqLinaro
 
Cgroup resource mgmt_v1
Cgroup resource mgmt_v1Cgroup resource mgmt_v1
Cgroup resource mgmt_v1sprdd
 
Troubleshooting Kafka's socket server: from incident to resolution
Troubleshooting Kafka's socket server: from incident to resolutionTroubleshooting Kafka's socket server: from incident to resolution
Troubleshooting Kafka's socket server: from incident to resolutionJoel Koshy
 
Non-Uniform Memory Access ( NUMA)
Non-Uniform Memory Access ( NUMA)Non-Uniform Memory Access ( NUMA)
Non-Uniform Memory Access ( NUMA)Nakul Manchanda
 
Known basic of NFV Features
Known basic of NFV FeaturesKnown basic of NFV Features
Known basic of NFV FeaturesRaul Leite
 
SFO15-TR9: PSCI, ACPI (and UEFI to boot)
SFO15-TR9: PSCI, ACPI (and UEFI to boot)SFO15-TR9: PSCI, ACPI (and UEFI to boot)
SFO15-TR9: PSCI, ACPI (and UEFI to boot)Linaro
 

Destacado (20)

Voldemort on Solid State Drives
Voldemort on Solid State DrivesVoldemort on Solid State Drives
Voldemort on Solid State Drives
 
Cassandra and Solid State Drives
Cassandra and Solid State DrivesCassandra and Solid State Drives
Cassandra and Solid State Drives
 
Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013
 
2015 08-20-criu support-in_docker_for_native_checkpoint_and_restore
2015 08-20-criu support-in_docker_for_native_checkpoint_and_restore2015 08-20-criu support-in_docker_for_native_checkpoint_and_restore
2015 08-20-criu support-in_docker_for_native_checkpoint_and_restore
 
OS caused Large JVM pauses: Deep dive and solutions
OS caused Large JVM pauses: Deep dive and solutionsOS caused Large JVM pauses: Deep dive and solutions
OS caused Large JVM pauses: Deep dive and solutions
 
Specification-Based Test Program Generation for ARM VMSAv8-64 MMUs
Specification-Based Test Program Generation for ARM VMSAv8-64 MMUsSpecification-Based Test Program Generation for ARM VMSAv8-64 MMUs
Specification-Based Test Program Generation for ARM VMSAv8-64 MMUs
 
Dulloor xen-summit
Dulloor xen-summitDulloor xen-summit
Dulloor xen-summit
 
Kernel Recipes 2016 - Kernel documentation: what we have and where it’s going
Kernel Recipes 2016 - Kernel documentation: what we have and where it’s goingKernel Recipes 2016 - Kernel documentation: what we have and where it’s going
Kernel Recipes 2016 - Kernel documentation: what we have and where it’s going
 
Tackling the Management Challenges of Server Consolidation on Multi-core Systems
Tackling the Management Challenges of Server Consolidation on Multi-core SystemsTackling the Management Challenges of Server Consolidation on Multi-core Systems
Tackling the Management Challenges of Server Consolidation on Multi-core Systems
 
Reverse engineering for_beginners-en
Reverse engineering for_beginners-enReverse engineering for_beginners-en
Reverse engineering for_beginners-en
 
BKK16-404A PCI Development Meeting
BKK16-404A PCI Development MeetingBKK16-404A PCI Development Meeting
BKK16-404A PCI Development Meeting
 
Virtualization overheads
Virtualization overheadsVirtualization overheads
Virtualization overheads
 
Docker and friends at Linux Days 2014 in Prague
Docker and friends at Linux Days 2014 in PragueDocker and friends at Linux Days 2014 in Prague
Docker and friends at Linux Days 2014 in Prague
 
Linux numa evolution
Linux numa evolutionLinux numa evolution
Linux numa evolution
 
BKK16-104 sched-freq
BKK16-104 sched-freqBKK16-104 sched-freq
BKK16-104 sched-freq
 
Cgroup resource mgmt_v1
Cgroup resource mgmt_v1Cgroup resource mgmt_v1
Cgroup resource mgmt_v1
 
Troubleshooting Kafka's socket server: from incident to resolution
Troubleshooting Kafka's socket server: from incident to resolutionTroubleshooting Kafka's socket server: from incident to resolution
Troubleshooting Kafka's socket server: from incident to resolution
 
Non-Uniform Memory Access ( NUMA)
Non-Uniform Memory Access ( NUMA)Non-Uniform Memory Access ( NUMA)
Non-Uniform Memory Access ( NUMA)
 
Known basic of NFV Features
Known basic of NFV FeaturesKnown basic of NFV Features
Known basic of NFV Features
 
SFO15-TR9: PSCI, ACPI (and UEFI to boot)
SFO15-TR9: PSCI, ACPI (and UEFI to boot)SFO15-TR9: PSCI, ACPI (and UEFI to boot)
SFO15-TR9: PSCI, ACPI (and UEFI to boot)
 

Similar a Gc and-pagescan-attacks-by-linux

Am I reading GC logs Correctly?
Am I reading GC logs Correctly?Am I reading GC logs Correctly?
Am I reading GC logs Correctly?Tier1 App
 
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014Amazon Web Services
 
Low latency & mechanical sympathy issues and solutions
Low latency & mechanical sympathy  issues and solutionsLow latency & mechanical sympathy  issues and solutions
Low latency & mechanical sympathy issues and solutionsJean-Philippe BEMPEL
 
Java Garbage Collectors – Moving to Java7 Garbage First (G1) Collector
Java Garbage Collectors – Moving to Java7 Garbage First (G1) CollectorJava Garbage Collectors – Moving to Java7 Garbage First (G1) Collector
Java Garbage Collectors – Moving to Java7 Garbage First (G1) CollectorGurpreet Sachdeva
 
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...Flink Forward
 
Pick diamonds from garbage
Pick diamonds from garbagePick diamonds from garbage
Pick diamonds from garbageTier1 App
 
How (not) to kill your MySQL infrastructure
How (not) to kill your MySQL infrastructureHow (not) to kill your MySQL infrastructure
How (not) to kill your MySQL infrastructureMiklos Szel
 
Infrastructure review - Shining a light on the Black Box
Infrastructure review - Shining a light on the Black BoxInfrastructure review - Shining a light on the Black Box
Infrastructure review - Shining a light on the Black BoxMiklos Szel
 
Become a Java GC Hero - ConFoo Conference
Become a Java GC Hero - ConFoo ConferenceBecome a Java GC Hero - ConFoo Conference
Become a Java GC Hero - ConFoo ConferenceTier1app
 
G1 Garbage Collector - Big Heaps and Low Pauses?
G1 Garbage Collector - Big Heaps and Low Pauses?G1 Garbage Collector - Big Heaps and Low Pauses?
G1 Garbage Collector - Big Heaps and Low Pauses?C2B2 Consulting
 
Resolving Firebird performance problems
Resolving Firebird performance problemsResolving Firebird performance problems
Resolving Firebird performance problemsAlexey Kovyazin
 
Designs, Lessons and Advice from Building Large Distributed Systems
Designs, Lessons and Advice from Building Large Distributed SystemsDesigns, Lessons and Advice from Building Large Distributed Systems
Designs, Lessons and Advice from Building Large Distributed SystemsDaehyeok Kim
 
Loadays MySQL
Loadays MySQLLoadays MySQL
Loadays MySQLlefredbe
 
Become a Garbage Collection Hero
Become a Garbage Collection HeroBecome a Garbage Collection Hero
Become a Garbage Collection HeroTier1app
 
In-Memory Computing: How, Why? and common Patterns
In-Memory Computing: How, Why? and common PatternsIn-Memory Computing: How, Why? and common Patterns
In-Memory Computing: How, Why? and common PatternsSrinath Perera
 
Themis: An I/O-Efficient MapReduce (SoCC 2012)
Themis: An I/O-Efficient MapReduce (SoCC 2012)Themis: An I/O-Efficient MapReduce (SoCC 2012)
Themis: An I/O-Efficient MapReduce (SoCC 2012)Alex Rasmussen
 
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...ScyllaDB
 
Tuning Java GC to resolve performance issues
Tuning Java GC to resolve performance issuesTuning Java GC to resolve performance issues
Tuning Java GC to resolve performance issuesSergey Podolsky
 

Similar a Gc and-pagescan-attacks-by-linux (20)

Am I reading GC logs Correctly?
Am I reading GC logs Correctly?Am I reading GC logs Correctly?
Am I reading GC logs Correctly?
 
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
 
Low latency & mechanical sympathy issues and solutions
Low latency & mechanical sympathy  issues and solutionsLow latency & mechanical sympathy  issues and solutions
Low latency & mechanical sympathy issues and solutions
 
Java Garbage Collectors – Moving to Java7 Garbage First (G1) Collector
Java Garbage Collectors – Moving to Java7 Garbage First (G1) CollectorJava Garbage Collectors – Moving to Java7 Garbage First (G1) Collector
Java Garbage Collectors – Moving to Java7 Garbage First (G1) Collector
 
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
 
Pick diamonds from garbage
Pick diamonds from garbagePick diamonds from garbage
Pick diamonds from garbage
 
How (not) to kill your MySQL infrastructure
How (not) to kill your MySQL infrastructureHow (not) to kill your MySQL infrastructure
How (not) to kill your MySQL infrastructure
 
Infrastructure review - Shining a light on the Black Box
Infrastructure review - Shining a light on the Black BoxInfrastructure review - Shining a light on the Black Box
Infrastructure review - Shining a light on the Black Box
 
Become a Java GC Hero - ConFoo Conference
Become a Java GC Hero - ConFoo ConferenceBecome a Java GC Hero - ConFoo Conference
Become a Java GC Hero - ConFoo Conference
 
G1 Garbage Collector - Big Heaps and Low Pauses?
G1 Garbage Collector - Big Heaps and Low Pauses?G1 Garbage Collector - Big Heaps and Low Pauses?
G1 Garbage Collector - Big Heaps and Low Pauses?
 
Resolving Firebird performance problems
Resolving Firebird performance problemsResolving Firebird performance problems
Resolving Firebird performance problems
 
Designs, Lessons and Advice from Building Large Distributed Systems
Designs, Lessons and Advice from Building Large Distributed SystemsDesigns, Lessons and Advice from Building Large Distributed Systems
Designs, Lessons and Advice from Building Large Distributed Systems
 
Loadays MySQL
Loadays MySQLLoadays MySQL
Loadays MySQL
 
Linux Huge Pages
Linux Huge PagesLinux Huge Pages
Linux Huge Pages
 
Become a Garbage Collection Hero
Become a Garbage Collection HeroBecome a Garbage Collection Hero
Become a Garbage Collection Hero
 
In-Memory Computing: How, Why? and common Patterns
In-Memory Computing: How, Why? and common PatternsIn-Memory Computing: How, Why? and common Patterns
In-Memory Computing: How, Why? and common Patterns
 
Themis: An I/O-Efficient MapReduce (SoCC 2012)
Themis: An I/O-Efficient MapReduce (SoCC 2012)Themis: An I/O-Efficient MapReduce (SoCC 2012)
Themis: An I/O-Efficient MapReduce (SoCC 2012)
 
OpenDS_Jazoon2010
OpenDS_Jazoon2010OpenDS_Jazoon2010
OpenDS_Jazoon2010
 
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
 
Tuning Java GC to resolve performance issues
Tuning Java GC to resolve performance issuesTuning Java GC to resolve performance issues
Tuning Java GC to resolve performance issues
 

Último

[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard37
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMKumar Satyam
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 

Último (20)

[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 

Gc and-pagescan-attacks-by-linux

  • 1. GC-stall and Page Scan Attacks by Linux Cuong Tran LinkedIn Performance Group
  • 2. Agenda • GC attacks by Linux • Page scan attacks by Linux • Recommendations
  • 3. Examples of GC attacks by Linux • 2013-10-05T05:01:04.179+0000:…. : 216982K>9328K(256000K), 0.0666320 secs] 377835K- >170188K(768000K), 0.0675850 secs] [Times: user=0.17 sys=0.00, real=3.18 secs] • 2013-09-19T06:14:03.632+0000: 44372.834: [GC [1 CMS-initial-mark: 703914K(921600K)] 718372K(1433600K), 126.1196340 secs] [Times: user=0.00 sys=127.31, real=126.10 secs] • GC stopped the world for minutes but: – Did no real work (CPU time in user mode = 0) – Burned cycles in Linux kernel
  • 4. GC attacks by Linux • IO starvation – Symptom: GC log shows “low user time, low system time, long GC pause”. – Cause: GC threads stuck in kernel waiting for IO, usually due to journal commits or FS flush of changes by gzip of log rolling • Memory starvation. – Symptom: GC log shows “Low user time, high system time, long GC pause” – Cause: Memory pressure triggers swapping or scanning for free memory 4
  • 5. Solutions for GC-attacks • IO Starvation – Strategy: Even out workload to disk drives (flush every 5 s rather than 30 s) sysctl –w vm.dirty_writeback_centisecs = 500 sysctl –w vm.dirty_expire_centisecs = 500 – In progress: Direct IO with gzip or gzip as-you-go • Memory Starvation – Strategy: Pre-allocate memory to JVM heap and protect it against swapping or scanning – Turn on –XX:+AlwaysPreTouch option in JVM – Sysctl –w vm.swappiness=0 to protect heap and anonymous memory – JVM start up has 2 second delay to allocate all memory (17GB) 5
  • 6. Page scan attacks by Linux Measured: 7,000,000 scans/sec Stall: 2+ minutes Goal: 0 scans/sec 6
  • 7. Cause : Page Scan Attacks Transparent Huge Page (THP) • A Redhat enhancement for performance – – – – 2MB huge pages vs. 4KB regular pages Less TLB miss and page table walk Only work for anonymous memory (malloc) Improve 10% performance for SPECjbb, app server workload • But THP can degrade performance severely – Collapsing, Compacting, Splitting, Migration – Very high pgscand/s – Very busy khugepaged – Very high system time when process compacts memory or khugepaged runs • THP optimization can increase GC stall time by minutes
  • 8. Cause : Page Scan Attacks NUMA Optimization • A Linux optimization for NUMA – 2 CPU sockets, each having 12 cores and local memory. – Memory accessible by all 24 cores but local memory is faster – Linux tries to allocate local memory to application threads, i.e., from local zone – Best suited for applications that can fit in one local zone • NUMA optimization can degrade performance severely – Very high pgscand/s – Linux zone-reclaim insists on finding memory on local zone although memory is plentiful on the other zone – Linux migrates memory including THP, creating a viscous cycle of breaking up 2 MB pages, scanning for 4 KB free pages, and reassembling 4KB into 2 MB pages
  • 9. Cause : Page Scan Attacks Solutions • Turn off THP optimization and thus khugepaged – echo never > /sys/kernel/mm/redhat_transparent_hugepa ge/enabled – Will not affect file-IO or memory mapped files – Redhat, Oracle, Hadoop recommends no THP • Turn off zone-reclaim optimization – sysctl –w vm.zone_reclaim_mode=0 – Twitter recommends NUMA interleaving 9
  • 10. Recommendations • Gate keepers: SRE and SysOps • Safe to roll-out fixes for GC attacks now – Linux: Flush changes more frequently and protect heap • sysctl –w vm.dirty_writeback_centisecs = 500 • sysctl –w vm.dirty_expire_centisecs = 500 • sysctl –w vm.swappiness=0 – JVM: Give JVM heap all memory it needs when started • –XX:+AlwaysPreTouch • Heap size per AutoTune • Gradual roll-out fixes of page scan attacks. – Best for back-end servers – Linux: Turn off THP and NUMA optimization • echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled • sysctl –w vm.zone_reclaim_mode = 0 – Work with product groups to test on small group of servers before applying changes to the rest