SlideShare una empresa de Scribd logo
1 de 44
Descargar para leer sin conexión
Performance optimization in Linux
Tales from the trenches
Alex Chistyakov, Principal Engineer, Git in Sky
Linux Piter 2015
Who are we?
●
A small consulting company based in SPb.
●
Web operations
●
Automation
●
Performance tuning
●
Sponsors of local DevOps meetup
Who are you?
●
Linux fans?
●
Developers?
●
Web developers, maybe?
●
System architects?
●
Performance engineers?
Okay, why Linux?
●
Is there anything else?
●
According to W3Cook stats, Linux
serves 95.8% of public web sites
●
And it's on the desktop too!
●
(At least on my desktop)
Linux in the perfect world
●
Linux fans?
●
Developers?
●
Web developers, maybe?
●
System architects?
●
Performance engineers?
Linux in the real world
●
Linux fans?
●
Developers?
●
Web developers, maybe?
●
System architects?
●
Performance engineers?
504 on the main page!
●
A customer is stressed extremely
●
Reaction should be quick and effective
●
The obvious plan does not work
●
We should be prepared!
The obvious plan
●
1) Change something
●
2) Expect the situation to become better
●
3) Wait anxiously
●
4) ????
●
5) PROFIT!!!
●
This plan is quite popular in fact
for some reason (simplicity?)
The proper plan (top secret!)
●
1) Gather metrics (you have them already, don't
you?)
●
2) Analyze metrics
●
3) Elaborate a hypothesis
●
4) Plan and make a single change
●
5) Repeat until success
●
If you were proficient in physics at high
school, this plan should sound extremely familiar
How to collect/analyze metrics?
●
Brendan Gregg's observability tools diagram
How do we collect/analyze?
●
atop (30 sec intervals)
How do we collect/analyze?
●
atop (30 sec intervals)
●
Ex-graphite stack (Grafana, InfluxDB or OpenTSDB
or Cyanite/Cassandra, but NOT Whisper, collectd)
How do we collect/analyze?
●
atop (30 sec intervals)
●
Ex-graphite stack (Grafana, InfluxDB or OpenTSDB
or Cyanite/Cassandra, but NOT Whisper, collectd)
●
NewRelic
How do we collect/analyze?
●
atop (30 sec intervals)
●
Ex-graphite stack (Grafana, InfluxDB or OpenTSDB
or Cyanite/Cassandra, but NOT Whisper, collectd)
●
NewRelic
●
pidstat (not iotop)
How do we collect/analyze?
●
atop (30 sec intervals)
●
Ex-graphite stack (Grafana, InfluxDB or OpenTSDB
or Cyanite/Cassandra, but NOT Whisper, collectd)
●
NewRelic
●
pidstat (not iotop)
●
perf top and perf record
How do we collect/analyze?
●
atop (30 sec intervals)
●
Ex-graphite stack (Grafana, InfluxDB or OpenTSDB
or Cyanite/Cassandra, but NOT Whisper, collectd)
●
NewRelic
●
pidstat (not iotop)
●
perf top and perf record
●
sysdig
How do we collect/analyze?
●
atop (30 sec intervals)
●
Ex-graphite stack (Grafana, InfluxDB or OpenTSDB
or Cyanite/Cassandra, but NOT Whisper, collectd)
●
NewRelic
●
pidstat (not iotop)
●
perf top and perf record
●
sysdig
●
iostat -x 1
Most common case (a no-brainer)
●
CPU time is too high (a lucky customer got an SSD)
●
Disk saturation is above 60% (a not-so-lucky customer)
Most common case (a no-brainer)
●
CPU time is too high (a lucky customer got an SSD)
●
Disk saturation is above 60% (a not-so-lucky customer)
●
PHP and, of course, MySQL
Most common case (a no-brainer)
●
CPU time is too high (a lucky customer got an SSD)
●
Disk saturation is above 60% (a not-so-lucky customer)
●
PHP and, of course, MySQL
●
InnoDB buffers are too low
●
Synchronous commit is 'on'
Most common case (a no-brainer)
●
CPU time is too high (a lucky customer got an SSD)
●
Disk saturation is above 60% (a not-so-lucky customer)
●
PHP and, of course, MySQL
●
InnoDB buffers are too low
●
Synchronous commit is 'on'
●
Too many slow queries
●
Queries with 'filesort' in execution plan
How to solve
●
Install Anemometer, turn on slow queries log
●
Range queries based on their cumulative exec time
●
Read and understand execution plans
●
Blame developers
●
Cry in vain
Story time!
●
Case #1: measuring the measurer
●
It seems that everybody still uses Graphite/Whisper
Story time!
●
Case #1: measuring the measurer
●
It seems that everybody still uses Graphite/Whisper
●
Even the big guys like Mail.Ru
Story time!
●
Case #1: measuring the measurer
●
It seems that everybody still uses Graphite/Whisper
●
Even the big guys like Mail.Ru
●
Well, because SAS HDDs are cheap...
Story time!
●
Case #1: measuring the measurer
●
It seems that everybody still uses Graphite/Whisper
●
Even the big guys like Mail.Ru
●
Well, because SAS HDDs are cheap…
●
But...
A challenger appears!
●
InfluxDB vs. Whisper, July 2015
●
The same set of metrics (carbon-relay-ng in the middle)
●
And the winner is...
Okay, we hate magic
●
Whisper is just a set of RRD-like files on a plain old FS
●
20000 metrics lead you to 20000 files
●
Accessing 20000 files every 10 secs is a major pain
●
InfluxDB is a time series database based on an LSM-tree
●
InfluxDB is much more write-optimized than 20000 separate
files on your ext4/XFS/you-name-it
●
But, of course, SAS drives are quite cheap...
In case you are not scared
●
Case #2: the site got a SUDS (sudden unexpected death
syndrome)
In case you are not scared
●
Case #2: the site got a SUDS (sudden unexpected death
syndrome)
●
Symptoms: everything slows down to a crawl (sounds familiar)
In case you are not scared
●
Case #2: the site got a SUDS (sudden unexpected death
syndrome)
●
Symptoms: everything slows down to a crawl (sounds familiar)
●
NewRelic shows nothing unusual
In case you are not scared
●
Case #2: the site got a SUDS (sudden unexpected death
syndrome)
●
Symptoms: everything slows down to a crawl (sounds familiar)
●
NewRelic shows nothing unusual
●
Well, if parameters more suitable for a busy site that for a very
low traffic one can be called “nothing unusual”
●
And this site is not busy at all
Diagnostic card
●
PHP is OK
●
MySQL does not sort anything
●
Top queries in MySQL sorted by total exec time are all indexed
●
Every MySQL query runs very slow when there is even
moderate load
But how did we solve it?
●
Even a modern rig w/decent Xeons and SATA HDDs can be
turned into a slug
●
As simple as disabling AHCI in BIOS and staying on plain IDE
●
Well this one was not that hard but was quite unusual
●
Rented servers do not suffer from problems like this because
they are configured uniformly
●
I can't easily explain how I came upon this solution, pure
intuition seemed to be involved
Not scared enough yet?
●
Then, case #3: another site got SUDS
Not scared enough yet?
●
Then, case #3: another site got SUDS
Diagnostic card
●
NewRelic blames PHP code
●
Even the SSH console is slow
●
Nothing unusual or unexpected in daily CPU load graphs
●
CPU flamegraph shows nothing
What is a «CPU flamegraph»?
How did we solve it
●
Analyzed atop recorded stats for outage periods
●
atop is quite smart in fact and color suspected values in red or
blue
●
IRQ % is over 50%
How did we solve it
●
Analyzed atop recorded stats for outage periods
●
atop is quite smart in fact and color suspected values in red or
blue
●
IRQ % is over 50%
●
But what is “IRQ %” anyway?
●
Oh, who cares, let's install Munin and get per-interrupt graphs
A blast from the past
●
Analyzed atop recorded stats for outage periods
●
atop is quite smart in fact and color suspected values in red or
blue
●
IRQ % is over 50%
●
But what is “IRQ %” anyway?
●
Oh, who cares, let's install Munin and get per-interrupt graphs
How did we solve it
●
Well we have not solved it yet
●
The graph from previous slide is for past two days
●
But at least we have a plan!
●
https://help.ubuntu.com/community/ReschedulingInterrupts
Summary
●
Linux is cool
●
Performance engineering is hard
●
Don't panic!
Thank you!
●
Questions?
●
Oh, BTW you can hire us!
●
http://gitinsky.com
●
alex@gitinsky.com
●
Please do not forget to attend our meetups:
●
http://meetup.com/Docker-Spb, http://meetup.com/Ansible-Spb,
http://meetup.com/DevOps-40

Más contenido relacionado

La actualidad más candente

From Test to Live with Rex
From Test to Live with RexFrom Test to Live with Rex
From Test to Live with Rex
Jan Gehring
 
Rex - Lightning Talk yapc.eu 2013
Rex - Lightning Talk yapc.eu 2013Rex - Lightning Talk yapc.eu 2013
Rex - Lightning Talk yapc.eu 2013
Jan Gehring
 
Random tips that will save your project's life
Random tips that will save your project's lifeRandom tips that will save your project's life
Random tips that will save your project's life
Mariano Iglesias
 
pgDay Asia 2016 - Swapping Pacemaker-Corosync for repmgr (1)
pgDay Asia 2016 - Swapping Pacemaker-Corosync for repmgr (1)pgDay Asia 2016 - Swapping Pacemaker-Corosync for repmgr (1)
pgDay Asia 2016 - Swapping Pacemaker-Corosync for repmgr (1)
Wei Shan Ang
 
OSNoise Tracer: Who Is Stealing My CPU Time?
OSNoise Tracer: Who Is Stealing My CPU Time?OSNoise Tracer: Who Is Stealing My CPU Time?
OSNoise Tracer: Who Is Stealing My CPU Time?
ScyllaDB
 
Creating parallel tests for NUnit with PNUnit - hands on lab
Creating parallel tests for NUnit with PNUnit - hands on labCreating parallel tests for NUnit with PNUnit - hands on lab
Creating parallel tests for NUnit with PNUnit - hands on lab
psluaces
 
High-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uringHigh-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uring
ScyllaDB
 

La actualidad más candente (20)

From Test to Live with Rex
From Test to Live with RexFrom Test to Live with Rex
From Test to Live with Rex
 
Git+jenkins+rex presentation
Git+jenkins+rex presentationGit+jenkins+rex presentation
Git+jenkins+rex presentation
 
Rex - Lightning Talk yapc.eu 2013
Rex - Lightning Talk yapc.eu 2013Rex - Lightning Talk yapc.eu 2013
Rex - Lightning Talk yapc.eu 2013
 
Random tips that will save your project's life
Random tips that will save your project's lifeRandom tips that will save your project's life
Random tips that will save your project's life
 
Practical SystemTAP basics: Perl memory profiling
Practical SystemTAP basics: Perl memory profilingPractical SystemTAP basics: Perl memory profiling
Practical SystemTAP basics: Perl memory profiling
 
High performance json- postgre sql vs. mongodb
High performance json- postgre sql vs. mongodbHigh performance json- postgre sql vs. mongodb
High performance json- postgre sql vs. mongodb
 
pgDay Asia 2016 - Swapping Pacemaker-Corosync for repmgr (1)
pgDay Asia 2016 - Swapping Pacemaker-Corosync for repmgr (1)pgDay Asia 2016 - Swapping Pacemaker-Corosync for repmgr (1)
pgDay Asia 2016 - Swapping Pacemaker-Corosync for repmgr (1)
 
Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013
 
OSNoise Tracer: Who Is Stealing My CPU Time?
OSNoise Tracer: Who Is Stealing My CPU Time?OSNoise Tracer: Who Is Stealing My CPU Time?
OSNoise Tracer: Who Is Stealing My CPU Time?
 
Node.js and Ruby
Node.js and RubyNode.js and Ruby
Node.js and Ruby
 
Node.js: Whats the Big Deal? Presented and JS Meetup Chicago
Node.js: Whats the Big Deal? Presented and JS Meetup ChicagoNode.js: Whats the Big Deal? Presented and JS Meetup Chicago
Node.js: Whats the Big Deal? Presented and JS Meetup Chicago
 
Creating parallel tests for NUnit with PNUnit - hands on lab
Creating parallel tests for NUnit with PNUnit - hands on labCreating parallel tests for NUnit with PNUnit - hands on lab
Creating parallel tests for NUnit with PNUnit - hands on lab
 
Deep dive-oz
Deep dive-ozDeep dive-oz
Deep dive-oz
 
Beyond Puppet
Beyond PuppetBeyond Puppet
Beyond Puppet
 
How go makes us faster (May 2015)
How go makes us faster (May 2015)How go makes us faster (May 2015)
How go makes us faster (May 2015)
 
Warsztaty ansible
Warsztaty ansibleWarsztaty ansible
Warsztaty ansible
 
Puppet Camp LA 2/19/2015
Puppet Camp LA 2/19/2015Puppet Camp LA 2/19/2015
Puppet Camp LA 2/19/2015
 
High-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uringHigh-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uring
 
Building a REST API with Node.js and MongoDB
Building a REST API with Node.js and MongoDBBuilding a REST API with Node.js and MongoDB
Building a REST API with Node.js and MongoDB
 
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo..."Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
 

Destacado

NoSQL — неспроста ли это "ЖЖЖ"?
NoSQL — неспроста ли это "ЖЖЖ"?NoSQL — неспроста ли это "ЖЖЖ"?
NoSQL — неспроста ли это "ЖЖЖ"?
Daniel Podolsky
 
My talk on Docker, Youcon 2015
My talk on Docker, Youcon 2015My talk on Docker, Youcon 2015
My talk on Docker, Youcon 2015
Alex Chistyakov
 

Destacado (20)

My talk at Linux Piter 2016
My talk at Linux Piter 2016My talk at Linux Piter 2016
My talk at Linux Piter 2016
 
Tk conf daniel-podolsky-sqlvsnosql
Tk conf daniel-podolsky-sqlvsnosqlTk conf daniel-podolsky-sqlvsnosql
Tk conf daniel-podolsky-sqlvsnosql
 
NoSQL — неспроста ли это "ЖЖЖ"?
NoSQL — неспроста ли это "ЖЖЖ"?NoSQL — неспроста ли это "ЖЖЖ"?
NoSQL — неспроста ли это "ЖЖЖ"?
 
My talk on monitoring systems at RootConf 2016
My talk on monitoring systems at RootConf 2016My talk on monitoring systems at RootConf 2016
My talk on monitoring systems at RootConf 2016
 
Ansible in the enterprise
Ansible in the enterpriseAnsible in the enterprise
Ansible in the enterprise
 
My talk on Salt and Ansible from DevConf 2014
My talk on Salt and Ansible from DevConf 2014My talk on Salt and Ansible from DevConf 2014
My talk on Salt and Ansible from DevConf 2014
 
My talk on LeoFS, Highload++ 2014
My talk on LeoFS, Highload++ 2014My talk on LeoFS, Highload++ 2014
My talk on LeoFS, Highload++ 2014
 
"Мы два месяца долбались, а потом построили индекс" (c) Аксенов
"Мы два месяца долбались, а потом построили индекс" (c) Аксенов"Мы два месяца долбались, а потом построили индекс" (c) Аксенов
"Мы два месяца долбались, а потом построили индекс" (c) Аксенов
 
On Docker
On DockerOn Docker
On Docker
 
My talk on Hadoop stack operations engineering at OSPCon
My talk on Hadoop stack operations engineering at OSPConMy talk on Hadoop stack operations engineering at OSPCon
My talk on Hadoop stack operations engineering at OSPCon
 
My talk on Docker, Youcon 2015
My talk on Docker, Youcon 2015My talk on Docker, Youcon 2015
My talk on Docker, Youcon 2015
 
Benchmarking PostgreSQL in Linux and FreeBSD
Benchmarking PostgreSQL in Linux and FreeBSDBenchmarking PostgreSQL in Linux and FreeBSD
Benchmarking PostgreSQL in Linux and FreeBSD
 
My talk at DevParty 2017
My talk at DevParty 2017My talk at DevParty 2017
My talk at DevParty 2017
 
My talk on programming languages at SPbLUG Mar 2017
My talk on programming languages at SPbLUG Mar 2017My talk on programming languages at SPbLUG Mar 2017
My talk on programming languages at SPbLUG Mar 2017
 
My talk at CEE-SECR 2016
My talk at CEE-SECR 2016My talk at CEE-SECR 2016
My talk at CEE-SECR 2016
 
My talk at YouCon Saratov 2016
My talk at YouCon Saratov 2016My talk at YouCon Saratov 2016
My talk at YouCon Saratov 2016
 
My talk on HBase ops engineering at TBD Jun 2016
My talk on HBase ops engineering at TBD Jun 2016My talk on HBase ops engineering at TBD Jun 2016
My talk on HBase ops engineering at TBD Jun 2016
 
My talk on using LVM thin provisioning from SPbLUG/DevOps-40 meetup 25.06.14
My talk on using LVM thin provisioning from SPbLUG/DevOps-40 meetup 25.06.14My talk on using LVM thin provisioning from SPbLUG/DevOps-40 meetup 25.06.14
My talk on using LVM thin provisioning from SPbLUG/DevOps-40 meetup 25.06.14
 
Using Ansible
Using AnsibleUsing Ansible
Using Ansible
 
My talk on Graphite stack on 58it.ru
My talk on Graphite stack on 58it.ruMy talk on Graphite stack on 58it.ru
My talk on Graphite stack on 58it.ru
 

Similar a My talk at Linux Piter 2015

Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Docker, Inc.
 
UKOUG 2011: Practical MySQL Tuning
UKOUG 2011: Practical MySQL TuningUKOUG 2011: Practical MySQL Tuning
UKOUG 2011: Practical MySQL Tuning
FromDual GmbH
 
MySQL 5.6 Performance
MySQL 5.6 PerformanceMySQL 5.6 Performance
MySQL 5.6 Performance
MYXPLAIN
 
kranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadkranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High load
Krivoy Rog IT Community
 

Similar a My talk at Linux Piter 2015 (20)

Scalable, good, cheap
Scalable, good, cheapScalable, good, cheap
Scalable, good, cheap
 
All Your IOPS Are Belong To Us - A Pinteresting Case Study in MySQL Performan...
All Your IOPS Are Belong To Us - A Pinteresting Case Study in MySQL Performan...All Your IOPS Are Belong To Us - A Pinteresting Case Study in MySQL Performan...
All Your IOPS Are Belong To Us - A Pinteresting Case Study in MySQL Performan...
 
HBase operations
HBase operationsHBase operations
HBase operations
 
Real-world Experiences in Scala
Real-world Experiences in ScalaReal-world Experiences in Scala
Real-world Experiences in Scala
 
Kernel Recipes 2016 - Speeding up development by setting up a kernel build farm
Kernel Recipes 2016 - Speeding up development by setting up a kernel build farmKernel Recipes 2016 - Speeding up development by setting up a kernel build farm
Kernel Recipes 2016 - Speeding up development by setting up a kernel build farm
 
Cloud arch patterns
Cloud arch patternsCloud arch patterns
Cloud arch patterns
 
What I learned building a parallel processor from scratch
What I learned building a parallel processor from scratchWhat I learned building a parallel processor from scratch
What I learned building a parallel processor from scratch
 
Ceph Day London 2014 - Deploying ceph in the wild
Ceph Day London 2014 - Deploying ceph in the wildCeph Day London 2014 - Deploying ceph in the wild
Ceph Day London 2014 - Deploying ceph in the wild
 
Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013
 
Automating MySQL operations with Puppet
Automating MySQL operations with PuppetAutomating MySQL operations with Puppet
Automating MySQL operations with Puppet
 
strangeloop 2012 apache cassandra anti patterns
strangeloop 2012 apache cassandra anti patternsstrangeloop 2012 apache cassandra anti patterns
strangeloop 2012 apache cassandra anti patterns
 
UKOUG 2011: Practical MySQL Tuning
UKOUG 2011: Practical MySQL TuningUKOUG 2011: Practical MySQL Tuning
UKOUG 2011: Practical MySQL Tuning
 
MySQL 5.6 Performance
MySQL 5.6 PerformanceMySQL 5.6 Performance
MySQL 5.6 Performance
 
Killer Bugs From Outer Space
Killer Bugs From Outer SpaceKiller Bugs From Outer Space
Killer Bugs From Outer Space
 
Experiences building a distributed shared log on RADOS - Noah Watkins
Experiences building a distributed shared log on RADOS - Noah WatkinsExperiences building a distributed shared log on RADOS - Noah Watkins
Experiences building a distributed shared log on RADOS - Noah Watkins
 
Kernel Recipes 2014 - Performance Does Matter
Kernel Recipes 2014 - Performance Does MatterKernel Recipes 2014 - Performance Does Matter
Kernel Recipes 2014 - Performance Does Matter
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
 
The Dark Side Of Go -- Go runtime related problems in TiDB in production
The Dark Side Of Go -- Go runtime related problems in TiDB  in productionThe Dark Side Of Go -- Go runtime related problems in TiDB  in production
The Dark Side Of Go -- Go runtime related problems in TiDB in production
 
kranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadkranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High load
 

Más de Alex Chistyakov

Más de Alex Chistyakov (20)

My slides from DevOpsDays 2019
My slides from DevOpsDays 2019My slides from DevOpsDays 2019
My slides from DevOpsDays 2019
 
My slides from BMM №3 May 2019
My slides from BMM №3 May 2019My slides from BMM №3 May 2019
My slides from BMM №3 May 2019
 
My slides from DevOps-40 meetup Jun 2019
My slides from DevOps-40 meetup Jun 2019 My slides from DevOps-40 meetup Jun 2019
My slides from DevOps-40 meetup Jun 2019
 
My slides from SECR'2018
My slides from SECR'2018My slides from SECR'2018
My slides from SECR'2018
 
My slides from the first SPb SRE community meetup at DataArt
My slides from the first SPb SRE community meetup at DataArtMy slides from the first SPb SRE community meetup at DataArt
My slides from the first SPb SRE community meetup at DataArt
 
My slides from CC'2019
My slides from CC'2019My slides from CC'2019
My slides from CC'2019
 
My slides from BMM №4 Nov 2019
My slides from BMM №4 Nov 2019My slides from BMM №4 Nov 2019
My slides from BMM №4 Nov 2019
 
My slides from DevOps-40 meetup Oct 2019
My slides from DevOps-40 meetup Oct 2019My slides from DevOps-40 meetup Oct 2019
My slides from DevOps-40 meetup Oct 2019
 
My slides from DevOps-40 meetup Dec 2019
My slides from DevOps-40 meetup Dec 2019My slides from DevOps-40 meetup Dec 2019
My slides from DevOps-40 meetup Dec 2019
 
Configuration management and Kubernetes
Configuration management and KubernetesConfiguration management and Kubernetes
Configuration management and Kubernetes
 
Ansible and other stuff
Ansible and other stuffAnsible and other stuff
Ansible and other stuff
 
Python performance engineering in 2017
Python performance engineering in 2017Python performance engineering in 2017
Python performance engineering in 2017
 
My talk at SPb SQA sub-meetup of ITGM
My talk at SPb SQA sub-meetup of ITGMMy talk at SPb SQA sub-meetup of ITGM
My talk at SPb SQA sub-meetup of ITGM
 
My talk at SECR 2017
My talk at SECR 2017My talk at SECR 2017
My talk at SECR 2017
 
On scaling teams
On scaling teamsOn scaling teams
On scaling teams
 
MariaDB workshop
MariaDB workshopMariaDB workshop
MariaDB workshop
 
Docker for JS people
Docker for JS peopleDocker for JS people
Docker for JS people
 
My talk on DevOps engineer's adventures in the Windows world at UWDC 2017
My talk on DevOps engineer's adventures in the Windows world at UWDC 2017My talk on DevOps engineer's adventures in the Windows world at UWDC 2017
My talk on DevOps engineer's adventures in the Windows world at UWDC 2017
 
My talk on GitHub open data at ITGM #10
 My talk on GitHub open data at ITGM #10 My talk on GitHub open data at ITGM #10
My talk on GitHub open data at ITGM #10
 
My talk on DevOps :) at Stachka 2017
My talk on DevOps :) at Stachka 2017My talk on DevOps :) at Stachka 2017
My talk on DevOps :) at Stachka 2017
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Último (20)

ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 

My talk at Linux Piter 2015

  • 1. Performance optimization in Linux Tales from the trenches Alex Chistyakov, Principal Engineer, Git in Sky Linux Piter 2015
  • 2. Who are we? ● A small consulting company based in SPb. ● Web operations ● Automation ● Performance tuning ● Sponsors of local DevOps meetup
  • 3. Who are you? ● Linux fans? ● Developers? ● Web developers, maybe? ● System architects? ● Performance engineers?
  • 4. Okay, why Linux? ● Is there anything else? ● According to W3Cook stats, Linux serves 95.8% of public web sites ● And it's on the desktop too! ● (At least on my desktop)
  • 5. Linux in the perfect world ● Linux fans? ● Developers? ● Web developers, maybe? ● System architects? ● Performance engineers?
  • 6. Linux in the real world ● Linux fans? ● Developers? ● Web developers, maybe? ● System architects? ● Performance engineers?
  • 7. 504 on the main page! ● A customer is stressed extremely ● Reaction should be quick and effective ● The obvious plan does not work ● We should be prepared!
  • 8. The obvious plan ● 1) Change something ● 2) Expect the situation to become better ● 3) Wait anxiously ● 4) ???? ● 5) PROFIT!!! ● This plan is quite popular in fact for some reason (simplicity?)
  • 9. The proper plan (top secret!) ● 1) Gather metrics (you have them already, don't you?) ● 2) Analyze metrics ● 3) Elaborate a hypothesis ● 4) Plan and make a single change ● 5) Repeat until success ● If you were proficient in physics at high school, this plan should sound extremely familiar
  • 10. How to collect/analyze metrics? ● Brendan Gregg's observability tools diagram
  • 11. How do we collect/analyze? ● atop (30 sec intervals)
  • 12. How do we collect/analyze? ● atop (30 sec intervals) ● Ex-graphite stack (Grafana, InfluxDB or OpenTSDB or Cyanite/Cassandra, but NOT Whisper, collectd)
  • 13. How do we collect/analyze? ● atop (30 sec intervals) ● Ex-graphite stack (Grafana, InfluxDB or OpenTSDB or Cyanite/Cassandra, but NOT Whisper, collectd) ● NewRelic
  • 14. How do we collect/analyze? ● atop (30 sec intervals) ● Ex-graphite stack (Grafana, InfluxDB or OpenTSDB or Cyanite/Cassandra, but NOT Whisper, collectd) ● NewRelic ● pidstat (not iotop)
  • 15. How do we collect/analyze? ● atop (30 sec intervals) ● Ex-graphite stack (Grafana, InfluxDB or OpenTSDB or Cyanite/Cassandra, but NOT Whisper, collectd) ● NewRelic ● pidstat (not iotop) ● perf top and perf record
  • 16. How do we collect/analyze? ● atop (30 sec intervals) ● Ex-graphite stack (Grafana, InfluxDB or OpenTSDB or Cyanite/Cassandra, but NOT Whisper, collectd) ● NewRelic ● pidstat (not iotop) ● perf top and perf record ● sysdig
  • 17. How do we collect/analyze? ● atop (30 sec intervals) ● Ex-graphite stack (Grafana, InfluxDB or OpenTSDB or Cyanite/Cassandra, but NOT Whisper, collectd) ● NewRelic ● pidstat (not iotop) ● perf top and perf record ● sysdig ● iostat -x 1
  • 18. Most common case (a no-brainer) ● CPU time is too high (a lucky customer got an SSD) ● Disk saturation is above 60% (a not-so-lucky customer)
  • 19. Most common case (a no-brainer) ● CPU time is too high (a lucky customer got an SSD) ● Disk saturation is above 60% (a not-so-lucky customer) ● PHP and, of course, MySQL
  • 20. Most common case (a no-brainer) ● CPU time is too high (a lucky customer got an SSD) ● Disk saturation is above 60% (a not-so-lucky customer) ● PHP and, of course, MySQL ● InnoDB buffers are too low ● Synchronous commit is 'on'
  • 21. Most common case (a no-brainer) ● CPU time is too high (a lucky customer got an SSD) ● Disk saturation is above 60% (a not-so-lucky customer) ● PHP and, of course, MySQL ● InnoDB buffers are too low ● Synchronous commit is 'on' ● Too many slow queries ● Queries with 'filesort' in execution plan
  • 22. How to solve ● Install Anemometer, turn on slow queries log ● Range queries based on their cumulative exec time ● Read and understand execution plans ● Blame developers ● Cry in vain
  • 23. Story time! ● Case #1: measuring the measurer ● It seems that everybody still uses Graphite/Whisper
  • 24. Story time! ● Case #1: measuring the measurer ● It seems that everybody still uses Graphite/Whisper ● Even the big guys like Mail.Ru
  • 25. Story time! ● Case #1: measuring the measurer ● It seems that everybody still uses Graphite/Whisper ● Even the big guys like Mail.Ru ● Well, because SAS HDDs are cheap...
  • 26. Story time! ● Case #1: measuring the measurer ● It seems that everybody still uses Graphite/Whisper ● Even the big guys like Mail.Ru ● Well, because SAS HDDs are cheap… ● But...
  • 27. A challenger appears! ● InfluxDB vs. Whisper, July 2015 ● The same set of metrics (carbon-relay-ng in the middle) ● And the winner is...
  • 28. Okay, we hate magic ● Whisper is just a set of RRD-like files on a plain old FS ● 20000 metrics lead you to 20000 files ● Accessing 20000 files every 10 secs is a major pain ● InfluxDB is a time series database based on an LSM-tree ● InfluxDB is much more write-optimized than 20000 separate files on your ext4/XFS/you-name-it ● But, of course, SAS drives are quite cheap...
  • 29. In case you are not scared ● Case #2: the site got a SUDS (sudden unexpected death syndrome)
  • 30. In case you are not scared ● Case #2: the site got a SUDS (sudden unexpected death syndrome) ● Symptoms: everything slows down to a crawl (sounds familiar)
  • 31. In case you are not scared ● Case #2: the site got a SUDS (sudden unexpected death syndrome) ● Symptoms: everything slows down to a crawl (sounds familiar) ● NewRelic shows nothing unusual
  • 32. In case you are not scared ● Case #2: the site got a SUDS (sudden unexpected death syndrome) ● Symptoms: everything slows down to a crawl (sounds familiar) ● NewRelic shows nothing unusual ● Well, if parameters more suitable for a busy site that for a very low traffic one can be called “nothing unusual” ● And this site is not busy at all
  • 33. Diagnostic card ● PHP is OK ● MySQL does not sort anything ● Top queries in MySQL sorted by total exec time are all indexed ● Every MySQL query runs very slow when there is even moderate load
  • 34. But how did we solve it? ● Even a modern rig w/decent Xeons and SATA HDDs can be turned into a slug ● As simple as disabling AHCI in BIOS and staying on plain IDE ● Well this one was not that hard but was quite unusual ● Rented servers do not suffer from problems like this because they are configured uniformly ● I can't easily explain how I came upon this solution, pure intuition seemed to be involved
  • 35. Not scared enough yet? ● Then, case #3: another site got SUDS
  • 36. Not scared enough yet? ● Then, case #3: another site got SUDS
  • 37. Diagnostic card ● NewRelic blames PHP code ● Even the SSH console is slow ● Nothing unusual or unexpected in daily CPU load graphs ● CPU flamegraph shows nothing
  • 38. What is a «CPU flamegraph»?
  • 39. How did we solve it ● Analyzed atop recorded stats for outage periods ● atop is quite smart in fact and color suspected values in red or blue ● IRQ % is over 50%
  • 40. How did we solve it ● Analyzed atop recorded stats for outage periods ● atop is quite smart in fact and color suspected values in red or blue ● IRQ % is over 50% ● But what is “IRQ %” anyway? ● Oh, who cares, let's install Munin and get per-interrupt graphs
  • 41. A blast from the past ● Analyzed atop recorded stats for outage periods ● atop is quite smart in fact and color suspected values in red or blue ● IRQ % is over 50% ● But what is “IRQ %” anyway? ● Oh, who cares, let's install Munin and get per-interrupt graphs
  • 42. How did we solve it ● Well we have not solved it yet ● The graph from previous slide is for past two days ● But at least we have a plan! ● https://help.ubuntu.com/community/ReschedulingInterrupts
  • 43. Summary ● Linux is cool ● Performance engineering is hard ● Don't panic!
  • 44. Thank you! ● Questions? ● Oh, BTW you can hire us! ● http://gitinsky.com ● alex@gitinsky.com ● Please do not forget to attend our meetups: ● http://meetup.com/Docker-Spb, http://meetup.com/Ansible-Spb, http://meetup.com/DevOps-40