SlideShare una empresa de Scribd logo
1 de 26
Time to rethink
/proc
Kir Kolyshkin / Andrey Vagin
@kolyshkin / @vagin_andrey
Texas Linux Fest, 9 July 2016
Austin, TX
2
Agenda
● Intro
● History of /proc
● Limitations of current interface
● Proposed solutions
● Performance results
3
$ whoami
● Linux user since 1995
● Developing containers since 2002
– author of vzctl and vzpkg
● Leading OpenVZ: 2005 to 2015
● Twitter: @kolyshkin
4
● Founded in 1997
● Spun off from Parallels
● HQ in Seattle, WA
● R&D in Moscow, RU
2016
5
Products:
● Containers and hypervisors
● Distributed cluster storage
6
OpenVZ
7
CRIU: Checkpoint / Restore In Userspace
8
Ideas behind CRIU
● We can't merge kernel c/r upstream, so...
let’s redo the whole thing in userspace
● Use existing interfaces where available
– /proc, ptrace, netlink, parasite code injection
● Amend the kernel where necessary
– only ~180 kernel patches
– kernel v3.11+ is sufficient
(if CONFIG_CHECKPOINT_RESTORE is set)
9
History of /proc part I
● Initial solution: /dev/kmem
– May 1975, UNIX 6th
edition (V6)
– http://man.cat-v.org/unix-6th/4/mem
● First “old style” /proc
– 1984, UNIX 8th
edition (V8), by Tom Killian
– A process is a file! Images of running processes
– An alternative to ptrace(2)
– http://man.cat-v.org/unix_8th/4/proc
10
History of /proc part II
● Most well-known old-style /proc
– 1988...1991: UNIX SVR4 (port from V8 with
enhancements by Roger Faulkner and Ron Gomes)
– read(), write(), and 37 ioctl()s
● First modern style /proc
– mid-1990s, Plan 9
– Each process is a directory with multiple
informational and control files
– One can use ls and cat to work with it
11
Plan 9 /proc interface
12
Modern Linux interface: /proc/PID/*
$ ls /proc/self/
attr             cwd      loginuid    numa_maps      schedstat  task
autogroup        environ  map_files   oom_adj        sessionid  timers
auxv             exe      maps        oom_score      setgroups  uid_map
cgroup           fd       mem         oom_score_adj  smaps      wchan
clear_refs       fdinfo   mountinfo   pagemap        stack
cmdline          gid_map  mounts      personality    stat
comm             io       mountstats  projid_map     statm
coredump_filter  latency  net         root           status
cpuset           limits   ns          sched          syscall
13
Limitations of /proc/PID interface
● Requires at least three syscalls per process per file
– open(), read(), close()
● Variety of formats, mostly text based
● Not enough information (/proc/PID/fd/*)
● Some formats are non-extendable
– /proc/PID/maps where the last column is optional
● Sometimes slow due to extra attributes
– /proc/PID/smaps vs /proc/PID/maps
●
14
/proc/PID/smaps
7f1cb0afc000-7f1cb0afd000 rw-p 00021000 08:03 656516 /usr/lib64/ld-2.21.so
Size: 4 kB
Rss: 4 kB
Pss: 4 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 4 kB
Referenced: 4 kB
Anonymous: 4 kB
AnonHugePages: 0 kB
Swap: 0 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
Locked: 0 kB
VmFlags: rd wr mr mw me dw ac sd
$ time cat /proc/*/maps > /dev/null
real 0m0.061s
user 0m0.002s
sys 0m0.059s
$ time cat /proc/*/smaps > /dev/null
real 0m0.253s
user 0m0.004s
sys 0m0.247s
15
Similar problem: info about sockets
● /proc
– /proc/net/netlink
– /proc/net/unix
– /proc/net/tcp
– /proc/net/packet
● Problems: not enough info, complex format, all-or-nothing
● Solution (2012): use netlink, generalize tcp_diag as sock_diag
– the extendable binary format
– allows to specify a group of attributes and sockets
16
Solution 1: task_diag based on netlink socket
1.Netlink message format:
binary and extendable
2.Ways to specify a set of processes
3.Optimal grouping of attributes
17
nlmsg_len
nlmsg_type nlmsg_flags
nlmsg_seq
nlmsg_id
nlattr_len nlattr_type
payload
nlattr_len nlattr_type
payload
Netlink message format
● Simple and elegant
● Binary and easily extendable
● Easy to add a new group
● Easy to add new attribute
18
Specify sets of processes
● TASK_DIAG_DUMP_ALL
– Dump all processes
● TASK_DIAG_DUMP_ALL_THREAD
– Dump all threads
● TASK_DIAG_DUMP_CHILDREN
– Dump children of a specific task
● TASK_DIAG_DUMP_THREAD
– Dump threads of a specific task
● TASK_DIAG_DUMP_ONE
– Dump one task
19
Groups of attributes
● TASK_DIAG_BASE
– PID, PGID, SID, TID, comm
● TASK_DIAG_CRED
– UID, GID, groups, capabilities
● TASK_DIAG_STAT
– per-task and per-process statistics (same as taskstats, not avail
in /proc)
● TASK_DIAG_VMA
– mapped memory regions and their access permissions (same as
maps)
● TASK_DIAG_VMA_STAT
– memory consumption for each mapping (same as smaps)
20
This is what makes it real fast
1.Netlink message format:
binary and extendable
2.Ways to specify a set of processes
3.Optimal grouping of attributes
21
Problems with netlink
● Designed for networking
● Not obvious where to get pid and user
namespaces
● Impossible to restrict netlink sockets
– Credentials are saved when a socket is created
– Process can drop privileges, but netlink doesn't care
– The same socket can be used to get process
attributes and to set ip addresses
22
Change netlink socket to a transactional file
● /proc/task_diag as a transactional file
– write request → read response
● Otherwise same as netlink socket
● LKML discussion has not reached conclusion yet
23
Performance: ps
Traditional ps (using /proc/PID/* files):
$ time ./ps/pscommand ax | wc -l
50089
real 0m1.596s
user 0m0.475s
sys 0m1.126s
New ps (using task_diag):
$ time ./ps/pscommand ax | wc -l
50089
real 0m0.148s
user 0m0.069s
sys 0m0.086s
24
Performance: using perf tool
> Using the fork test command:
> 10,000 processes; 10k proc with 5 threads = 50,000 tasks
> reading /proc: 11.3 sec
> task_diag: 2.2 sec
>
> @7,440 tasks, reading /proc is at 0.77 sec and task_diag at 0.096
>
> 128 instances of sepcjbb, 80,000+ tasks:
> reading /proc: 32.1 sec
> task_diag: 3.9 sec
>
> So overall much snappier startup times.
// David Ahern
25
Source code!
https://github.com/avagin/linux-task-diag/
Branch: devel
Examples: tools/testing/selftests/task_diag/
26
Thank you!
http://virtuozzo.com/
http://openvz.org/
http://criu.org/
@kolyshkin
kolyshkin AT gmail DOT com

Más contenido relacionado

La actualidad más candente

GlusterFS Containers
GlusterFS ContainersGlusterFS Containers
GlusterFS ContainersMohamed Ashiq
 
Tiering barcelona
Tiering barcelonaTiering barcelona
Tiering barcelonaGluster.org
 
Gluster fs architecture_&_roadmap_atin_punemeetup_2015
Gluster fs architecture_&_roadmap_atin_punemeetup_2015Gluster fs architecture_&_roadmap_atin_punemeetup_2015
Gluster fs architecture_&_roadmap_atin_punemeetup_2015Atin Mukherjee
 
Gluster intro-tdose
Gluster intro-tdoseGluster intro-tdose
Gluster intro-tdoseGluster.org
 
Live migration: pros, cons and gotchas -- Pavel Emelyanov
Live migration: pros, cons and gotchas -- Pavel EmelyanovLive migration: pros, cons and gotchas -- Pavel Emelyanov
Live migration: pros, cons and gotchas -- Pavel EmelyanovOpenVZ
 
Gluster technical overview
Gluster technical overviewGluster technical overview
Gluster technical overviewGluster.org
 
Filesystem Comparison: NFS vs GFS2 vs OCFS2
Filesystem Comparison: NFS vs GFS2 vs OCFS2Filesystem Comparison: NFS vs GFS2 vs OCFS2
Filesystem Comparison: NFS vs GFS2 vs OCFS2Giuseppe Paterno'
 
20160401 guster-roadmap
20160401 guster-roadmap20160401 guster-roadmap
20160401 guster-roadmapGluster.org
 
Lt2013 glusterfs.talk
Lt2013 glusterfs.talkLt2013 glusterfs.talk
Lt2013 glusterfs.talkUdo Seidel
 
Gluster d thread_synchronization_using_urcu_lca2016
Gluster d thread_synchronization_using_urcu_lca2016Gluster d thread_synchronization_using_urcu_lca2016
Gluster d thread_synchronization_using_urcu_lca2016Gluster.org
 
Challenges with Gluster and Persistent Memory with Dan Lambright
Challenges with Gluster and Persistent Memory with Dan LambrightChallenges with Gluster and Persistent Memory with Dan Lambright
Challenges with Gluster and Persistent Memory with Dan LambrightGluster.org
 
Rhel cluster gfs_improveperformance
Rhel cluster gfs_improveperformanceRhel cluster gfs_improveperformance
Rhel cluster gfs_improveperformancesprdd
 
Smb gluster devmar2013
Smb gluster devmar2013Smb gluster devmar2013
Smb gluster devmar2013Gluster.org
 
Debugging with-wireshark-niels-de-vos
Debugging with-wireshark-niels-de-vosDebugging with-wireshark-niels-de-vos
Debugging with-wireshark-niels-de-vosGluster.org
 
Gluster intro-tdose
Gluster intro-tdoseGluster intro-tdose
Gluster intro-tdoseGluster.org
 
Distributed Postgres
Distributed PostgresDistributed Postgres
Distributed PostgresStas Kelvich
 

La actualidad más candente (20)

GlusterFS Containers
GlusterFS ContainersGlusterFS Containers
GlusterFS Containers
 
Tiering barcelona
Tiering barcelonaTiering barcelona
Tiering barcelona
 
Gluster fs architecture_&_roadmap_atin_punemeetup_2015
Gluster fs architecture_&_roadmap_atin_punemeetup_2015Gluster fs architecture_&_roadmap_atin_punemeetup_2015
Gluster fs architecture_&_roadmap_atin_punemeetup_2015
 
Gluster intro-tdose
Gluster intro-tdoseGluster intro-tdose
Gluster intro-tdose
 
Live migration: pros, cons and gotchas -- Pavel Emelyanov
Live migration: pros, cons and gotchas -- Pavel EmelyanovLive migration: pros, cons and gotchas -- Pavel Emelyanov
Live migration: pros, cons and gotchas -- Pavel Emelyanov
 
Gluster technical overview
Gluster technical overviewGluster technical overview
Gluster technical overview
 
Filesystem Comparison: NFS vs GFS2 vs OCFS2
Filesystem Comparison: NFS vs GFS2 vs OCFS2Filesystem Comparison: NFS vs GFS2 vs OCFS2
Filesystem Comparison: NFS vs GFS2 vs OCFS2
 
20160401 guster-roadmap
20160401 guster-roadmap20160401 guster-roadmap
20160401 guster-roadmap
 
Dedupe nmamit
Dedupe nmamitDedupe nmamit
Dedupe nmamit
 
Lt2013 glusterfs.talk
Lt2013 glusterfs.talkLt2013 glusterfs.talk
Lt2013 glusterfs.talk
 
Gluster d thread_synchronization_using_urcu_lca2016
Gluster d thread_synchronization_using_urcu_lca2016Gluster d thread_synchronization_using_urcu_lca2016
Gluster d thread_synchronization_using_urcu_lca2016
 
Challenges with Gluster and Persistent Memory with Dan Lambright
Challenges with Gluster and Persistent Memory with Dan LambrightChallenges with Gluster and Persistent Memory with Dan Lambright
Challenges with Gluster and Persistent Memory with Dan Lambright
 
HIGH AVAILABLE CLUSTER IN WEB SERVER WITH HEARTBEAT + DRBD + OCFS2
HIGH AVAILABLE CLUSTER IN WEB SERVER WITH  HEARTBEAT + DRBD + OCFS2HIGH AVAILABLE CLUSTER IN WEB SERVER WITH  HEARTBEAT + DRBD + OCFS2
HIGH AVAILABLE CLUSTER IN WEB SERVER WITH HEARTBEAT + DRBD + OCFS2
 
Rhel cluster gfs_improveperformance
Rhel cluster gfs_improveperformanceRhel cluster gfs_improveperformance
Rhel cluster gfs_improveperformance
 
Smb gluster devmar2013
Smb gluster devmar2013Smb gluster devmar2013
Smb gluster devmar2013
 
Gluster d2
Gluster d2Gluster d2
Gluster d2
 
Debugging with-wireshark-niels-de-vos
Debugging with-wireshark-niels-de-vosDebugging with-wireshark-niels-de-vos
Debugging with-wireshark-niels-de-vos
 
Gluster intro-tdose
Gluster intro-tdoseGluster intro-tdose
Gluster intro-tdose
 
Gdeploy 2.0
Gdeploy 2.0Gdeploy 2.0
Gdeploy 2.0
 
Distributed Postgres
Distributed PostgresDistributed Postgres
Distributed Postgres
 

Destacado

Resume sample
Resume sampleResume sample
Resume sampleNur Amani
 
Q3 EARNING PREVIEW AN AHEAD
Q3 EARNING PREVIEW AN AHEADQ3 EARNING PREVIEW AN AHEAD
Q3 EARNING PREVIEW AN AHEADYaron Shkedy
 
9 breathing exercises for asthmatics - Respiratory Therapy
 9 breathing exercises for asthmatics - Respiratory Therapy  9 breathing exercises for asthmatics - Respiratory Therapy
9 breathing exercises for asthmatics - Respiratory Therapy sivasvlsa
 
Ayudas phrasal verbs zulay guerrereo pinzon ingles
Ayudas phrasal verbs zulay guerrereo pinzon inglesAyudas phrasal verbs zulay guerrereo pinzon ingles
Ayudas phrasal verbs zulay guerrereo pinzon inglesMaguepi
 
Guidance for Industry and Food and Drug Administration Staff: Mobile Medical ...
Guidance for Industry and Food and Drug Administration Staff: Mobile Medical ...Guidance for Industry and Food and Drug Administration Staff: Mobile Medical ...
Guidance for Industry and Food and Drug Administration Staff: Mobile Medical ...CRF Health
 
Use of information and communication
Use of information and communicationUse of information and communication
Use of information and communicationLewis Appleton
 
Key Concepts of Clinical Research & Clinical Trial
Key Concepts of Clinical Research & Clinical Trial Key Concepts of Clinical Research & Clinical Trial
Key Concepts of Clinical Research & Clinical Trial SWAROOP KUMAR K
 

Destacado (12)

Iistrik dinamis
Iistrik dinamisIistrik dinamis
Iistrik dinamis
 
Portfolio theo
Portfolio theoPortfolio theo
Portfolio theo
 
Resume sample
Resume sampleResume sample
Resume sample
 
Manua fotos narradas-1
Manua fotos narradas-1Manua fotos narradas-1
Manua fotos narradas-1
 
GMF ABOUT US EN
GMF ABOUT US ENGMF ABOUT US EN
GMF ABOUT US EN
 
Q3 EARNING PREVIEW AN AHEAD
Q3 EARNING PREVIEW AN AHEADQ3 EARNING PREVIEW AN AHEAD
Q3 EARNING PREVIEW AN AHEAD
 
9 breathing exercises for asthmatics - Respiratory Therapy
 9 breathing exercises for asthmatics - Respiratory Therapy  9 breathing exercises for asthmatics - Respiratory Therapy
9 breathing exercises for asthmatics - Respiratory Therapy
 
Ayudas phrasal verbs zulay guerrereo pinzon ingles
Ayudas phrasal verbs zulay guerrereo pinzon inglesAyudas phrasal verbs zulay guerrereo pinzon ingles
Ayudas phrasal verbs zulay guerrereo pinzon ingles
 
Guidance for Industry and Food and Drug Administration Staff: Mobile Medical ...
Guidance for Industry and Food and Drug Administration Staff: Mobile Medical ...Guidance for Industry and Food and Drug Administration Staff: Mobile Medical ...
Guidance for Industry and Food and Drug Administration Staff: Mobile Medical ...
 
Use of information and communication
Use of information and communicationUse of information and communication
Use of information and communication
 
A business evaluation
A business evaluationA business evaluation
A business evaluation
 
Key Concepts of Clinical Research & Clinical Trial
Key Concepts of Clinical Research & Clinical Trial Key Concepts of Clinical Research & Clinical Trial
Key Concepts of Clinical Research & Clinical Trial
 

Similar a Time to rethink /proc

Speeding up ps and top
Speeding up ps and topSpeeding up ps and top
Speeding up ps and topOpenVZ
 
CS 626 - March : Capsicum: Practical Capabilities for UNIX
CS 626 - March : Capsicum: Practical Capabilities for UNIXCS 626 - March : Capsicum: Practical Capabilities for UNIX
CS 626 - March : Capsicum: Practical Capabilities for UNIXruchith
 
Shall we play a game
Shall we play a gameShall we play a game
Shall we play a gamejackpot201
 
Geospatial web services using little-known GDAL features and modern Perl midd...
Geospatial web services using little-known GDAL features and modern Perl midd...Geospatial web services using little-known GDAL features and modern Perl midd...
Geospatial web services using little-known GDAL features and modern Perl midd...Ari Jolma
 
Extending OpenShift Origin: Build Your Own Cartridge with Bill DeCoste of Red...
Extending OpenShift Origin: Build Your Own Cartridge with Bill DeCoste of Red...Extending OpenShift Origin: Build Your Own Cartridge with Bill DeCoste of Red...
Extending OpenShift Origin: Build Your Own Cartridge with Bill DeCoste of Red...OpenShift Origin
 
We shall play a game....
We shall play a game....We shall play a game....
We shall play a game....Sadia Textile
 
Getting Started with Performance Co-Pilot
Getting Started with Performance Co-PilotGetting Started with Performance Co-Pilot
Getting Started with Performance Co-PilotPaul V. Novarese
 
Gdb basics for my sql db as (percona live europe 2019)
Gdb basics for my sql db as (percona live europe 2019)Gdb basics for my sql db as (percona live europe 2019)
Gdb basics for my sql db as (percona live europe 2019)Valerii Kravchuk
 
Community Update at OpenStack Summit Boston
Community Update at OpenStack Summit BostonCommunity Update at OpenStack Summit Boston
Community Update at OpenStack Summit BostonSage Weil
 
Introduction to Docker (as presented at December 2013 Global Hackathon)
Introduction to Docker (as presented at December 2013 Global Hackathon)Introduction to Docker (as presented at December 2013 Global Hackathon)
Introduction to Docker (as presented at December 2013 Global Hackathon)Jérôme Petazzoni
 
Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013dotCloud
 
Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013Docker, Inc.
 
lecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptxlecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptxssuser413a98
 
Deploying Containers and Managing Them
Deploying Containers and Managing ThemDeploying Containers and Managing Them
Deploying Containers and Managing ThemDocker, Inc.
 
[scala.by] Launching new application fast
[scala.by] Launching new application fast[scala.by] Launching new application fast
[scala.by] Launching new application fastDenis Karpenko
 

Similar a Time to rethink /proc (20)

Speeding up ps and top
Speeding up ps and topSpeeding up ps and top
Speeding up ps and top
 
CS 626 - March : Capsicum: Practical Capabilities for UNIX
CS 626 - March : Capsicum: Practical Capabilities for UNIXCS 626 - March : Capsicum: Practical Capabilities for UNIX
CS 626 - March : Capsicum: Practical Capabilities for UNIX
 
0507 057 01 98 * Adana Klima Servisleri
0507 057 01 98 * Adana Klima Servisleri0507 057 01 98 * Adana Klima Servisleri
0507 057 01 98 * Adana Klima Servisleri
 
Shall we play a game
Shall we play a gameShall we play a game
Shall we play a game
 
Shall we play a game?
Shall we play a game?Shall we play a game?
Shall we play a game?
 
Geospatial web services using little-known GDAL features and modern Perl midd...
Geospatial web services using little-known GDAL features and modern Perl midd...Geospatial web services using little-known GDAL features and modern Perl midd...
Geospatial web services using little-known GDAL features and modern Perl midd...
 
Extending OpenShift Origin: Build Your Own Cartridge with Bill DeCoste of Red...
Extending OpenShift Origin: Build Your Own Cartridge with Bill DeCoste of Red...Extending OpenShift Origin: Build Your Own Cartridge with Bill DeCoste of Red...
Extending OpenShift Origin: Build Your Own Cartridge with Bill DeCoste of Red...
 
DPDK In Depth
DPDK In DepthDPDK In Depth
DPDK In Depth
 
We shall play a game....
We shall play a game....We shall play a game....
We shall play a game....
 
The Accidental DBA
The Accidental DBAThe Accidental DBA
The Accidental DBA
 
Getting Started with Performance Co-Pilot
Getting Started with Performance Co-PilotGetting Started with Performance Co-Pilot
Getting Started with Performance Co-Pilot
 
Gdb basics for my sql db as (percona live europe 2019)
Gdb basics for my sql db as (percona live europe 2019)Gdb basics for my sql db as (percona live europe 2019)
Gdb basics for my sql db as (percona live europe 2019)
 
Community Update at OpenStack Summit Boston
Community Update at OpenStack Summit BostonCommunity Update at OpenStack Summit Boston
Community Update at OpenStack Summit Boston
 
Introduction to Docker (as presented at December 2013 Global Hackathon)
Introduction to Docker (as presented at December 2013 Global Hackathon)Introduction to Docker (as presented at December 2013 Global Hackathon)
Introduction to Docker (as presented at December 2013 Global Hackathon)
 
Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013
 
Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013
 
An Introduction To Linux
An Introduction To LinuxAn Introduction To Linux
An Introduction To Linux
 
lecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptxlecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptx
 
Deploying Containers and Managing Them
Deploying Containers and Managing ThemDeploying Containers and Managing Them
Deploying Containers and Managing Them
 
[scala.by] Launching new application fast
[scala.by] Launching new application fast[scala.by] Launching new application fast
[scala.by] Launching new application fast
 

Último

Generic or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsGeneric or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsBert Jan Schrijver
 
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburgmasabamasaba
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...Nitya salvi
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionOnePlan Solutions
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...Shane Coughlan
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareJim McKeeth
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...SelfMade bd
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastPapp Krisztián
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...masabamasaba
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfayushiqss
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension AidPhilip Schwarz
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...masabamasaba
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...masabamasaba
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrainmasabamasaba
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024Mind IT Systems
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfonteinmasabamasaba
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrandmasabamasaba
 

Último (20)

Generic or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsGeneric or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisions
 
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 

Time to rethink /proc

  • 1. Time to rethink /proc Kir Kolyshkin / Andrey Vagin @kolyshkin / @vagin_andrey Texas Linux Fest, 9 July 2016 Austin, TX
  • 2. 2 Agenda ● Intro ● History of /proc ● Limitations of current interface ● Proposed solutions ● Performance results
  • 3. 3 $ whoami ● Linux user since 1995 ● Developing containers since 2002 – author of vzctl and vzpkg ● Leading OpenVZ: 2005 to 2015 ● Twitter: @kolyshkin
  • 4. 4 ● Founded in 1997 ● Spun off from Parallels ● HQ in Seattle, WA ● R&D in Moscow, RU 2016
  • 5. 5 Products: ● Containers and hypervisors ● Distributed cluster storage
  • 7. 7 CRIU: Checkpoint / Restore In Userspace
  • 8. 8 Ideas behind CRIU ● We can't merge kernel c/r upstream, so... let’s redo the whole thing in userspace ● Use existing interfaces where available – /proc, ptrace, netlink, parasite code injection ● Amend the kernel where necessary – only ~180 kernel patches – kernel v3.11+ is sufficient (if CONFIG_CHECKPOINT_RESTORE is set)
  • 9. 9 History of /proc part I ● Initial solution: /dev/kmem – May 1975, UNIX 6th edition (V6) – http://man.cat-v.org/unix-6th/4/mem ● First “old style” /proc – 1984, UNIX 8th edition (V8), by Tom Killian – A process is a file! Images of running processes – An alternative to ptrace(2) – http://man.cat-v.org/unix_8th/4/proc
  • 10. 10 History of /proc part II ● Most well-known old-style /proc – 1988...1991: UNIX SVR4 (port from V8 with enhancements by Roger Faulkner and Ron Gomes) – read(), write(), and 37 ioctl()s ● First modern style /proc – mid-1990s, Plan 9 – Each process is a directory with multiple informational and control files – One can use ls and cat to work with it
  • 11. 11 Plan 9 /proc interface
  • 12. 12 Modern Linux interface: /proc/PID/* $ ls /proc/self/ attr             cwd      loginuid    numa_maps      schedstat  task autogroup        environ  map_files   oom_adj        sessionid  timers auxv             exe      maps        oom_score      setgroups  uid_map cgroup           fd       mem         oom_score_adj  smaps      wchan clear_refs       fdinfo   mountinfo   pagemap        stack cmdline          gid_map  mounts      personality    stat comm             io       mountstats  projid_map     statm coredump_filter  latency  net         root           status cpuset           limits   ns          sched          syscall
  • 13. 13 Limitations of /proc/PID interface ● Requires at least three syscalls per process per file – open(), read(), close() ● Variety of formats, mostly text based ● Not enough information (/proc/PID/fd/*) ● Some formats are non-extendable – /proc/PID/maps where the last column is optional ● Sometimes slow due to extra attributes – /proc/PID/smaps vs /proc/PID/maps ●
  • 14. 14 /proc/PID/smaps 7f1cb0afc000-7f1cb0afd000 rw-p 00021000 08:03 656516 /usr/lib64/ld-2.21.so Size: 4 kB Rss: 4 kB Pss: 4 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 0 kB Private_Dirty: 4 kB Referenced: 4 kB Anonymous: 4 kB AnonHugePages: 0 kB Swap: 0 kB KernelPageSize: 4 kB MMUPageSize: 4 kB Locked: 0 kB VmFlags: rd wr mr mw me dw ac sd $ time cat /proc/*/maps > /dev/null real 0m0.061s user 0m0.002s sys 0m0.059s $ time cat /proc/*/smaps > /dev/null real 0m0.253s user 0m0.004s sys 0m0.247s
  • 15. 15 Similar problem: info about sockets ● /proc – /proc/net/netlink – /proc/net/unix – /proc/net/tcp – /proc/net/packet ● Problems: not enough info, complex format, all-or-nothing ● Solution (2012): use netlink, generalize tcp_diag as sock_diag – the extendable binary format – allows to specify a group of attributes and sockets
  • 16. 16 Solution 1: task_diag based on netlink socket 1.Netlink message format: binary and extendable 2.Ways to specify a set of processes 3.Optimal grouping of attributes
  • 17. 17 nlmsg_len nlmsg_type nlmsg_flags nlmsg_seq nlmsg_id nlattr_len nlattr_type payload nlattr_len nlattr_type payload Netlink message format ● Simple and elegant ● Binary and easily extendable ● Easy to add a new group ● Easy to add new attribute
  • 18. 18 Specify sets of processes ● TASK_DIAG_DUMP_ALL – Dump all processes ● TASK_DIAG_DUMP_ALL_THREAD – Dump all threads ● TASK_DIAG_DUMP_CHILDREN – Dump children of a specific task ● TASK_DIAG_DUMP_THREAD – Dump threads of a specific task ● TASK_DIAG_DUMP_ONE – Dump one task
  • 19. 19 Groups of attributes ● TASK_DIAG_BASE – PID, PGID, SID, TID, comm ● TASK_DIAG_CRED – UID, GID, groups, capabilities ● TASK_DIAG_STAT – per-task and per-process statistics (same as taskstats, not avail in /proc) ● TASK_DIAG_VMA – mapped memory regions and their access permissions (same as maps) ● TASK_DIAG_VMA_STAT – memory consumption for each mapping (same as smaps)
  • 20. 20 This is what makes it real fast 1.Netlink message format: binary and extendable 2.Ways to specify a set of processes 3.Optimal grouping of attributes
  • 21. 21 Problems with netlink ● Designed for networking ● Not obvious where to get pid and user namespaces ● Impossible to restrict netlink sockets – Credentials are saved when a socket is created – Process can drop privileges, but netlink doesn't care – The same socket can be used to get process attributes and to set ip addresses
  • 22. 22 Change netlink socket to a transactional file ● /proc/task_diag as a transactional file – write request → read response ● Otherwise same as netlink socket ● LKML discussion has not reached conclusion yet
  • 23. 23 Performance: ps Traditional ps (using /proc/PID/* files): $ time ./ps/pscommand ax | wc -l 50089 real 0m1.596s user 0m0.475s sys 0m1.126s New ps (using task_diag): $ time ./ps/pscommand ax | wc -l 50089 real 0m0.148s user 0m0.069s sys 0m0.086s
  • 24. 24 Performance: using perf tool > Using the fork test command: > 10,000 processes; 10k proc with 5 threads = 50,000 tasks > reading /proc: 11.3 sec > task_diag: 2.2 sec > > @7,440 tasks, reading /proc is at 0.77 sec and task_diag at 0.096 > > 128 instances of sepcjbb, 80,000+ tasks: > reading /proc: 32.1 sec > task_diag: 3.9 sec > > So overall much snappier startup times. // David Ahern

Notas del editor

  1. Slackware on floppies. Kernel 1.0.9, recompiled 1.1.50 from source. And it’s my second time here at TXLF, long way from Seattle.
  2. Virtuozzo a product is a essentially a supercharged version of OpenVZ, with containers and VMs working side by side and are uniformly managed by same set of tools.Storage idea is to take the individual servers’ hard drives to
  3. OpenVZ, my baby. First steps, first words, first kernel panics. Do we have any users in the audience? Full (system) containers for Linux Developed since 1999,open source since 2005 Live migration since 2007 ~2000 Linux kernel patches enabling LXC, Docker, CoreOS… biggest contributor to containers Now reborn as Virtuozzo 7
  4. 4 years old! v.2.3 (June 2016) Aims to replace OpenVZ kernel c/r Saves and restores setsof running processes Integrated into LXC, Docker* Not just for live migration! save HPC job or game, update kernel or hardware,balance load, speed-up boot, reverse debug, inject faults
  5. We failed to merge in-kernel c/r because that kernel code is very invasive, touching every kernel subsystem, no kernel maintainer wanted that in their code
  6. As I’m getting older, I find myself more and more interested in history.
  7. More than 40 files and 10 directories for each process. Our tests showed that reading that amount of files takes lots of time. Oh, and here is a picture of a classic locomotive, a rail transport vehicle. Why this picture? Because it’s slooow. Are there any engineers here? I mean, real ones, not software engineers. What would be the max speed of this beast?
  8. Variety of formats – no one wants to spend their life writing parsers for all these formats. Text-based: consider ps showing process time. Kernel has it in binary, shows to /proc as a string, ps reads it and converts to binary, to use say for sorting, and finally converts it to string when printing. An example of non-extendable format is /proc/*/maps – last field is file name, and it is ... optional!
  9. There are three definitive properties of this solutionLet’s see them in more details.
  10. The structure is pretty generic, this is what makes this format extendable.
  11. One important thing here is optimal grouping. If any attribute greatly affects response speed, it should be separated into a separate group.
  12. These three properties is what makes the API real FAST.For those of you living in US, here’s a picture of a european high speed rail train, 186 miles per hour.
  13. Another bad example of using netlink: taskstats
  14. Final remark: open source is really awesome! Why? There are many people from many different places working on many different problems. The work that I just described is one example of such work.