SlideShare una empresa de Scribd logo
1 de 28
Descargar para leer sin conexión
Linuxカーネル
ページ回収
吉田雅徳@siburu!
2014/7/27(Sun)
1. 前回のあらすじ
What’s Page Frame
❖ page frame = A page-sized/aligned piece of RAM!
❖ struct page = An one-on-one structure in kernel for each page frame!
❖ mem_map!
❖ Unique array of struct page's which covers all RAM that a kernel
manages.!
❖ but in CONFIG_SPARSEMEM environment!
❖ There's no unique mem_map.!
❖ Instead, there's a list of 2MB-sized arrays of struct page's.!
❖ You must use __pfn_to_page(), __page_to_pfn() or wrappers of them.
What’s NUMA
❖ NUMA(Non-Uniform Memory Architecture)!
❖ System is comprised of nodes.!
❖ Each node is defined by a set of CPUs and one physical memory range.!
❖ Memory access latency differs depending on source and destination nodes.!
❖ NUMA configuration!
❖ ACPI provides NUMA configuration:!
❖ SRAT(Static Resource Affinity Table)!
❖ To know which CPUs and memory range are contained in which NUMA
node?!
❖ SLIT(System Locality Information Table)!
❖ To know how far a NUMA node is from another node?
What’s Memory Zone
❖ Physical memory is separated by address range:!
❖ ZONE_DMA: <16MB!
❖ ZONE_DMA32: <4GB!
❖ ZONE_NORMAL: the rest!
❖ ZONE_MOVABLE: none by default.!
❖ This is used to define a hot-removable physical
memory range.
struct pglist_data {!
struct zone node_zone[MAX_NR_ZONES];!
};
Memory node, zone
物理アドレス Range1 Range2
CPU1 CPU2 CPU3 CPU4
struct pglist_data {!
struct zone node_zone[MAX_NR_ZONES];!
…!
};
NUMA node1 NUMA node2
❖ どのpglist_dataにも各ZONE(DMA∼MOVABLE)に対応する
zone構造体が用意される(但し一部の中身は空かもしれない)
Memory Allocation
1. At first, checks threshold for each zone

(threshold = watermark and dirty-ratio).!
❖ If all zones are failed, the kernel goes into page reclaim
path (=today’s topic).!
2. If some zone is ok, allocates a page from the zone’s buddy
system.!
❖ 0-order page is allocated from per-cpu cache.!
❖ higher order page is obtained from per-order lists of pages
Memory Deallocation
❖ Page is returned to buddy system.!
❖ 0-order page is returned to per-cpu cache via
free_hot_cold_page().!
❖ Cold page: A page estimated not to be on CPU cache!
❖ This is linked to the tail of LRU list of the per-cpu cache.!
❖ Hot page: A page estimated to be on CPU cache!
❖ This is linked to the head of LRU list of the per-cpu cache.!
❖ higher order page is directly returned to per-order lists of pages.
Buddy System
4k 4k 4k
8k 8k 8k
4m 4m 4m
・・・
Per-cpu cache
4k 4k 4k
Per-zone buddy system
order0

(de)alloc
HOT COLD
order1
order10
・・・
2. ページの回収
2.1 Direct reclaim!
2.2 Daemon reclaim
ページ割当フローの復習
❖ __alloc_pages_nodemask(ページ割当基本関数)!
❖ get_page_from_freelist(1st: local zones, low wmark) → get_page_from_freelist(2nd: all zones)!
❖ __alloc_pages_slowpath!
1. wake_all_kswapds(kswapd達の起床)!
2. get_page_from_freelist(3rd: all zones, min wmark)!
3. if {__GFP,PF}_MEMALLOC → __alloc_pages_high_priority!
4. __alloc_pages_direct_compact(非同期的)!
5. __alloc_pages_direct_reclaim(本コンテキストで直接ページ回収)!
6. if not did_some_progress → __alloc_pages_may_oom!
7. リトライ(2.へ) 又は __alloc_pages_direct_compact(同期的)
2.1 Direct Reclaim
(ページ割当要求者本人による回収)
__alloc_pages_direct_reclaim()
❖ __perform_reclaim!
❖ current->flags |= PF_MEMALLOC!
❖ ページ回収の延長でページ割当が必要になった時に、緊急備蓄分を使用できるように!
❖ try_to_free_pages!
❖ throttle_direct_reclaim!
❖ if !pfmemalloc_watermark_ok →  kswapdによりokになるのを待機!
❖ do_try_to_free_pages!
❖ current->flags &= ~PF_MEMALLOC!
❖ get_page_from_freelist!
❖ drain_all_pages!
❖ get_page_from_freelist
pfmemalloc_watermark_ok()
❖ ARGS!
❖ pgdat(type: struct pglist_data)!
❖ RETURN!
❖ type: bool!
❖ node’s free_pages > 0.5 * node’s min_wmark!
❖ DESC!
❖ node単位で(zone単位でなく)、フリーページ量を min watermarkの半分と比較し、超え
ていればOK!
❖ 下回っていればfalseを返すとともに、 当該nodeのkswapdを起床!
❖ メモリ 迫したnodeではdirect reclaimはやめて kswapdに任せる、その閾値を決める関数。
do_try_to_free_pages()
❖ Core function for page reclaim, which is called at 3 different scenes!
❖ try_to_free_pages() → Global reclaim path via __alloc_pages_nodemask()!
❖ try_to_free_mem_cgroup_pages() → Per-memcg reclaim path!
❖ Right before per-memcg slab allocation!
❖ Right before per-memcg file page allocation!
❖ Right before per-memcg anon page allocation!
❖ Right before per-memcg swapin allocation!
❖ shrink_all_memory() → Hibernation path!
❖ Arguments: (1)struct zonelist *zonelist (2)struct scan_control *sc
struct scan_control
struct scan_control {!
! unsigned long nr_scanned;!
! unsigned long nr_reclaimed;!
! unsigned long nr_to_reclaim;!
! …!
! int swappiness; // 0..100!
! …!
! struct mem_cgroup *target_mem_cgroup;!
! …!
! nodemask_t! *nodemask;!
};!
do_try_to_free_pagesの処理
❖ 以下二つのループ!
❖ shrink_zones()!
❖ 後述!
❖ wakeup_flusher_threads()!
❖ shrink_zonesが、回収目標(scan_context::nr_to_reclaim)の1.5
倍以上のページをスキャンするたび、呼び出し。!
❖ 最大で、スキャンした分のページをライトバックするよう、
全ブロックデバイス(bdi)に要求。
shrink_zones()
1. for_each_zone_zonelist_nodemask:!
1. mem_cgroup_soft_limit_reclaim!
❖ while mem_cgroup_largest_soft_limit_node:!
❖ mem_cgroup_soft_reclaim!
❖ shrink_zoneに進む前に、当該zoneを使ってる memcgでlimitを超えてるものについて、 ページ
回収を済ませる処理!
2. shrink_zone!
❖ foreach mem_cgroup_iter:!
❖ shrink_lruvec!
❖ ここでのiterationはGlobal reclaimの場合は root memcgから回収!
2. shrink_slab!
❖ スラブについては次回以降で・・・
shrink_lruvec()
❖ per-zone page freer!
1. get_scan_count!
❖ 回収目標ページ数決定!
2. while 目標未達:!
❖ shrink_list(LRU_INACTIVE_ANON)!
❖ shrink_list(LRU_ACTIVE_ANON)!
❖ shrink_list(LRU_INACTIVE_FILE)!
❖ shrink_list(LRU_ACTIVE_FILE)!
3. if INACTIVEな無名メモリだけでは不足:!
❖ shrink_active_list
shrink_list()
❖ shrink_{active or inactive}_listを呼ぶ、但し、activeリストを
shrinkするのは、対となるinactiveリストより大きい場合のみ!
1. if ACTIVEなリストを指定:!
❖ if size of lru(ACTIVE) > size of lru(INACTIVE):!
❖ shrink_active_list!
2. else:!
❖ shrink_inactive_list
shrink_{active,inactive}_list
❖ shrink_active_list()!
1. Traverse pages in an active list!
2. Find inactive pages in the list and move them to an
inactive list!
❖ shrink_inactive_list()!
❖ foreach page:!
1. page_mapped(page) => try_to_unmap(page)!
2. if PageDirty(page) => pageout(page)
inactiveなページとは
❖ !laptop_modeの場合!
❖ active LRU listの末尾から、単純に指定数分のページ
をinactiveなページとして取得!
❖ laptop_modeの場合!
❖ active LRU listの末尾から、cleanな指定数分のページ
をinactiveなページとして取得
try_to_unmap()
❖ Unmap a specified page from all corresponding mappings!
1. Set up struct rmap_walk_control.!
2. rmap_walk_{file, anon, or ksm}!
❖ rmap walk is iterating VMAs and unmapping from it!
A. file: traverse address_space::i_mmap tree!
B. anon: traverse anon_vma tree!
C. ksm: traverse all merged anon_vma trees!
❖ each operation is similar to that for anon
A. rmap_walk_file
page
address_space(inode)
i_mmap(type: rb_root)
vma vma vma vma
pgtbl pgtbl pgtbl pgtbl
unmap
B. rmap_walk_anon
page
anon_vma
rb_root(type:rb_root)
vma vma vma vma
pgtbl pgtbl pgtbl pgtbl
unmap
C. rmap_walk_ksm
page
stable_node
hlist
anon!
vma
anon

vma
anon!
vma
vma vma vma vma
pgtbl pgtbl pgtbl pgtbl
anon!
vma
2.2 Daemon Reclaim
(KSwapDによる代行回収)
kswapd
❖ Processing overview!
1. Wake up!
2. balance_pgdat()!
3. Sleep!
❖ balance_pgdat()!
❖ Work until all zones of pgdat are at or over hi-wmark.!
❖ reclaim function: kswapd_shrink_zone()

Más contenido relacionado

La actualidad más candente

Linux packet-forwarding
Linux packet-forwardingLinux packet-forwarding
Linux packet-forwardingMasakazu Asama
 
条件分岐とcmovとmaxps
条件分岐とcmovとmaxps条件分岐とcmovとmaxps
条件分岐とcmovとmaxpsMITSUNARI Shigeo
 
エンジニアなら知っておきたい「仮想マシン」のしくみ v1.1 (hbstudy 17)
エンジニアなら知っておきたい「仮想マシン」のしくみ v1.1 (hbstudy 17)エンジニアなら知っておきたい「仮想マシン」のしくみ v1.1 (hbstudy 17)
エンジニアなら知っておきたい「仮想マシン」のしくみ v1.1 (hbstudy 17)Takeshi HASEGAWA
 
分散システムの限界について知ろう
分散システムの限界について知ろう分散システムの限界について知ろう
分散システムの限界について知ろうShingo Omura
 
Cgroupあれこれ-第4回コンテナ型仮想化の情報交換会資料
Cgroupあれこれ-第4回コンテナ型仮想化の情報交換会資料Cgroupあれこれ-第4回コンテナ型仮想化の情報交換会資料
Cgroupあれこれ-第4回コンテナ型仮想化の情報交換会資料KamezawaHiroyuki
 
RISC-V : Berkeley Boot Loader & Proxy Kernelのソースコード解析
RISC-V : Berkeley Boot Loader & Proxy Kernelのソースコード解析RISC-V : Berkeley Boot Loader & Proxy Kernelのソースコード解析
RISC-V : Berkeley Boot Loader & Proxy Kernelのソースコード解析Mr. Vengineer
 
【続編】その ionice、ほんとに効いてますか?
【続編】その ionice、ほんとに効いてますか?【続編】その ionice、ほんとに効いてますか?
【続編】その ionice、ほんとに効いてますか?Narimichi Takamura
 
Yoctoで綺麗なkernel configを作る
Yoctoで綺麗なkernel configを作るYoctoで綺麗なkernel configを作る
Yoctoで綺麗なkernel configを作るshimadah
 
ARM LinuxのMMUはわかりにくい
ARM LinuxのMMUはわかりにくいARM LinuxのMMUはわかりにくい
ARM LinuxのMMUはわかりにくいwata2ki
 
Linux memory-management-kamal
Linux memory-management-kamalLinux memory-management-kamal
Linux memory-management-kamalKamal Maiti
 
malloc & vmalloc in Linux
malloc & vmalloc in Linuxmalloc & vmalloc in Linux
malloc & vmalloc in LinuxAdrian Huang
 
CUDAプログラミング入門
CUDAプログラミング入門CUDAプログラミング入門
CUDAプログラミング入門NVIDIA Japan
 
Deep Dive into the Linux Kernel - メモリ管理におけるCompaction機能について
Deep Dive into the Linux Kernel - メモリ管理におけるCompaction機能についてDeep Dive into the Linux Kernel - メモリ管理におけるCompaction機能について
Deep Dive into the Linux Kernel - メモリ管理におけるCompaction機能についてNTT DATA Technology & Innovation
 
不揮発性メモリ(PMEM)を利用したストレージエンジンの話 #mysql_jp #myna会 #yahoo #mysql #pmem #不揮発性メモリ
不揮発性メモリ(PMEM)を利用したストレージエンジンの話  #mysql_jp #myna会 #yahoo #mysql #pmem #不揮発性メモリ不揮発性メモリ(PMEM)を利用したストレージエンジンの話  #mysql_jp #myna会 #yahoo #mysql #pmem #不揮発性メモリ
不揮発性メモリ(PMEM)を利用したストレージエンジンの話 #mysql_jp #myna会 #yahoo #mysql #pmem #不揮発性メモリYahoo!デベロッパーネットワーク
 
10GbE時代のネットワークI/O高速化
10GbE時代のネットワークI/O高速化10GbE時代のネットワークI/O高速化
10GbE時代のネットワークI/O高速化Takuya ASADA
 
The ideal and reality of NVDIMM RAS
The ideal and reality of NVDIMM RASThe ideal and reality of NVDIMM RAS
The ideal and reality of NVDIMM RASYasunori Goto
 
カーネル空間ですべてのプロセスを動かすには -TAL, SFI, Wasmとか - カーネル/VM探検隊15
カーネル空間ですべてのプロセスを動かすには -TAL, SFI, Wasmとか - カーネル/VM探検隊15カーネル空間ですべてのプロセスを動かすには -TAL, SFI, Wasmとか - カーネル/VM探検隊15
カーネル空間ですべてのプロセスを動かすには -TAL, SFI, Wasmとか - カーネル/VM探検隊15Takaya Saeki
 
CXL_説明_公開用.pdf
CXL_説明_公開用.pdfCXL_説明_公開用.pdf
CXL_説明_公開用.pdfYasunori Goto
 
Linuxのsemaphoreとmutexを見る 
Linuxのsemaphoreとmutexを見る Linuxのsemaphoreとmutexを見る 
Linuxのsemaphoreとmutexを見る wata2ki
 

La actualidad más candente (20)

Linux packet-forwarding
Linux packet-forwardingLinux packet-forwarding
Linux packet-forwarding
 
条件分岐とcmovとmaxps
条件分岐とcmovとmaxps条件分岐とcmovとmaxps
条件分岐とcmovとmaxps
 
エンジニアなら知っておきたい「仮想マシン」のしくみ v1.1 (hbstudy 17)
エンジニアなら知っておきたい「仮想マシン」のしくみ v1.1 (hbstudy 17)エンジニアなら知っておきたい「仮想マシン」のしくみ v1.1 (hbstudy 17)
エンジニアなら知っておきたい「仮想マシン」のしくみ v1.1 (hbstudy 17)
 
分散システムの限界について知ろう
分散システムの限界について知ろう分散システムの限界について知ろう
分散システムの限界について知ろう
 
Cgroupあれこれ-第4回コンテナ型仮想化の情報交換会資料
Cgroupあれこれ-第4回コンテナ型仮想化の情報交換会資料Cgroupあれこれ-第4回コンテナ型仮想化の情報交換会資料
Cgroupあれこれ-第4回コンテナ型仮想化の情報交換会資料
 
RISC-V : Berkeley Boot Loader & Proxy Kernelのソースコード解析
RISC-V : Berkeley Boot Loader & Proxy Kernelのソースコード解析RISC-V : Berkeley Boot Loader & Proxy Kernelのソースコード解析
RISC-V : Berkeley Boot Loader & Proxy Kernelのソースコード解析
 
【続編】その ionice、ほんとに効いてますか?
【続編】その ionice、ほんとに効いてますか?【続編】その ionice、ほんとに効いてますか?
【続編】その ionice、ほんとに効いてますか?
 
Yoctoで綺麗なkernel configを作る
Yoctoで綺麗なkernel configを作るYoctoで綺麗なkernel configを作る
Yoctoで綺麗なkernel configを作る
 
ARM LinuxのMMUはわかりにくい
ARM LinuxのMMUはわかりにくいARM LinuxのMMUはわかりにくい
ARM LinuxのMMUはわかりにくい
 
Linux memory-management-kamal
Linux memory-management-kamalLinux memory-management-kamal
Linux memory-management-kamal
 
malloc & vmalloc in Linux
malloc & vmalloc in Linuxmalloc & vmalloc in Linux
malloc & vmalloc in Linux
 
CUDAプログラミング入門
CUDAプログラミング入門CUDAプログラミング入門
CUDAプログラミング入門
 
Deep Dive into the Linux Kernel - メモリ管理におけるCompaction機能について
Deep Dive into the Linux Kernel - メモリ管理におけるCompaction機能についてDeep Dive into the Linux Kernel - メモリ管理におけるCompaction機能について
Deep Dive into the Linux Kernel - メモリ管理におけるCompaction機能について
 
不揮発性メモリ(PMEM)を利用したストレージエンジンの話 #mysql_jp #myna会 #yahoo #mysql #pmem #不揮発性メモリ
不揮発性メモリ(PMEM)を利用したストレージエンジンの話  #mysql_jp #myna会 #yahoo #mysql #pmem #不揮発性メモリ不揮発性メモリ(PMEM)を利用したストレージエンジンの話  #mysql_jp #myna会 #yahoo #mysql #pmem #不揮発性メモリ
不揮発性メモリ(PMEM)を利用したストレージエンジンの話 #mysql_jp #myna会 #yahoo #mysql #pmem #不揮発性メモリ
 
10GbE時代のネットワークI/O高速化
10GbE時代のネットワークI/O高速化10GbE時代のネットワークI/O高速化
10GbE時代のネットワークI/O高速化
 
The ideal and reality of NVDIMM RAS
The ideal and reality of NVDIMM RASThe ideal and reality of NVDIMM RAS
The ideal and reality of NVDIMM RAS
 
Raft
RaftRaft
Raft
 
カーネル空間ですべてのプロセスを動かすには -TAL, SFI, Wasmとか - カーネル/VM探検隊15
カーネル空間ですべてのプロセスを動かすには -TAL, SFI, Wasmとか - カーネル/VM探検隊15カーネル空間ですべてのプロセスを動かすには -TAL, SFI, Wasmとか - カーネル/VM探検隊15
カーネル空間ですべてのプロセスを動かすには -TAL, SFI, Wasmとか - カーネル/VM探検隊15
 
CXL_説明_公開用.pdf
CXL_説明_公開用.pdfCXL_説明_公開用.pdf
CXL_説明_公開用.pdf
 
Linuxのsemaphoreとmutexを見る 
Linuxのsemaphoreとmutexを見る Linuxのsemaphoreとmutexを見る 
Linuxのsemaphoreとmutexを見る 
 

Similar a Page reclaim

Kvm performance optimization for ubuntu
Kvm performance optimization for ubuntuKvm performance optimization for ubuntu
Kvm performance optimization for ubuntuSim Janghoon
 
PV-Drivers for SeaBIOS using Upstream Qemu
PV-Drivers for SeaBIOS using Upstream QemuPV-Drivers for SeaBIOS using Upstream Qemu
PV-Drivers for SeaBIOS using Upstream QemuThe Linux Foundation
 
Perl Memory Use 201209
Perl Memory Use 201209Perl Memory Use 201209
Perl Memory Use 201209Tim Bunce
 
EuroSec2012 "Effects of Memory Randomization, Sanitization and Page Cache on ...
EuroSec2012 "Effects of Memory Randomization, Sanitization and Page Cache on ...EuroSec2012 "Effects of Memory Randomization, Sanitization and Page Cache on ...
EuroSec2012 "Effects of Memory Randomization, Sanitization and Page Cache on ...Kuniyasu Suzaki
 
Physical Memory Models.pdf
Physical Memory Models.pdfPhysical Memory Models.pdf
Physical Memory Models.pdfAdrian Huang
 
Spark Summit EU talk by Jorg Schad
Spark Summit EU talk by Jorg SchadSpark Summit EU talk by Jorg Schad
Spark Summit EU talk by Jorg SchadSpark Summit
 
Perl Memory Use 201207 (OUTDATED, see 201209 )
Perl Memory Use 201207 (OUTDATED, see 201209 )Perl Memory Use 201207 (OUTDATED, see 201209 )
Perl Memory Use 201207 (OUTDATED, see 201209 )Tim Bunce
 
Windows memory manager internals
Windows memory manager internalsWindows memory manager internals
Windows memory manager internalsSisimon Soman
 
Live memory forensics
Live memory forensicsLive memory forensics
Live memory forensicsMehedi Hasan
 
Memory hierarchy.pdf
Memory hierarchy.pdfMemory hierarchy.pdf
Memory hierarchy.pdfISHAN194169
 
z/VM 6.3 - Mudanças de Comportamento do hypervisor para suporte de partições ...
z/VM 6.3 - Mudanças de Comportamento do hypervisor para suporte de partições ...z/VM 6.3 - Mudanças de Comportamento do hypervisor para suporte de partições ...
z/VM 6.3 - Mudanças de Comportamento do hypervisor para suporte de partições ...Joao Galdino Mello de Souza
 
Basics_of_Kernel_Panic_Hang_and_ Kdump.pdf
Basics_of_Kernel_Panic_Hang_and_ Kdump.pdfBasics_of_Kernel_Panic_Hang_and_ Kdump.pdf
Basics_of_Kernel_Panic_Hang_and_ Kdump.pdfstroganovboris
 
Jaime Peñalba - Kernel exploitation. ¿El octavo arte? [rooted2019]
Jaime Peñalba - Kernel exploitation. ¿El octavo arte? [rooted2019]Jaime Peñalba - Kernel exploitation. ¿El octavo arte? [rooted2019]
Jaime Peñalba - Kernel exploitation. ¿El octavo arte? [rooted2019]RootedCON
 
Linux kernel memory allocators
Linux kernel memory allocatorsLinux kernel memory allocators
Linux kernel memory allocatorsHao-Ran Liu
 
LECTURE13nvjlfdihbkzbjvzbfmdnmzbxckbn.ppt
LECTURE13nvjlfdihbkzbjvzbfmdnmzbxckbn.pptLECTURE13nvjlfdihbkzbjvzbfmdnmzbxckbn.ppt
LECTURE13nvjlfdihbkzbjvzbfmdnmzbxckbn.pptNikhilKumarJaiswal2
 

Similar a Page reclaim (20)

Kvm performance optimization for ubuntu
Kvm performance optimization for ubuntuKvm performance optimization for ubuntu
Kvm performance optimization for ubuntu
 
PV-Drivers for SeaBIOS using Upstream Qemu
PV-Drivers for SeaBIOS using Upstream QemuPV-Drivers for SeaBIOS using Upstream Qemu
PV-Drivers for SeaBIOS using Upstream Qemu
 
Perl Memory Use 201209
Perl Memory Use 201209Perl Memory Use 201209
Perl Memory Use 201209
 
EuroSec2012 "Effects of Memory Randomization, Sanitization and Page Cache on ...
EuroSec2012 "Effects of Memory Randomization, Sanitization and Page Cache on ...EuroSec2012 "Effects of Memory Randomization, Sanitization and Page Cache on ...
EuroSec2012 "Effects of Memory Randomization, Sanitization and Page Cache on ...
 
Physical Memory Models.pdf
Physical Memory Models.pdfPhysical Memory Models.pdf
Physical Memory Models.pdf
 
Linux memory
Linux memoryLinux memory
Linux memory
 
Spark Summit EU talk by Jorg Schad
Spark Summit EU talk by Jorg SchadSpark Summit EU talk by Jorg Schad
Spark Summit EU talk by Jorg Schad
 
Perl Memory Use 201207 (OUTDATED, see 201209 )
Perl Memory Use 201207 (OUTDATED, see 201209 )Perl Memory Use 201207 (OUTDATED, see 201209 )
Perl Memory Use 201207 (OUTDATED, see 201209 )
 
Build Your OS Part1
Build Your OS Part1Build Your OS Part1
Build Your OS Part1
 
Windows memory manager internals
Windows memory manager internalsWindows memory manager internals
Windows memory manager internals
 
Live memory forensics
Live memory forensicsLive memory forensics
Live memory forensics
 
Memory hierarchy.pdf
Memory hierarchy.pdfMemory hierarchy.pdf
Memory hierarchy.pdf
 
z/VM 6.3 - Mudanças de Comportamento do hypervisor para suporte de partições ...
z/VM 6.3 - Mudanças de Comportamento do hypervisor para suporte de partições ...z/VM 6.3 - Mudanças de Comportamento do hypervisor para suporte de partições ...
z/VM 6.3 - Mudanças de Comportamento do hypervisor para suporte de partições ...
 
Basics_of_Kernel_Panic_Hang_and_ Kdump.pdf
Basics_of_Kernel_Panic_Hang_and_ Kdump.pdfBasics_of_Kernel_Panic_Hang_and_ Kdump.pdf
Basics_of_Kernel_Panic_Hang_and_ Kdump.pdf
 
Jaime Peñalba - Kernel exploitation. ¿El octavo arte? [rooted2019]
Jaime Peñalba - Kernel exploitation. ¿El octavo arte? [rooted2019]Jaime Peñalba - Kernel exploitation. ¿El octavo arte? [rooted2019]
Jaime Peñalba - Kernel exploitation. ¿El octavo arte? [rooted2019]
 
Vmfs
VmfsVmfs
Vmfs
 
Linux kernel memory allocators
Linux kernel memory allocatorsLinux kernel memory allocators
Linux kernel memory allocators
 
LECTURE13nvjlfdihbkzbjvzbfmdnmzbxckbn.ppt
LECTURE13nvjlfdihbkzbjvzbfmdnmzbxckbn.pptLECTURE13nvjlfdihbkzbjvzbfmdnmzbxckbn.ppt
LECTURE13nvjlfdihbkzbjvzbfmdnmzbxckbn.ppt
 
Memoryhierarchy
MemoryhierarchyMemoryhierarchy
Memoryhierarchy
 
Distributed system
Distributed systemDistributed system
Distributed system
 

Último

%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrandmasabamasaba
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...masabamasaba
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyviewmasabamasaba
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...masabamasaba
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfonteinmasabamasaba
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...masabamasaba
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in sowetomasabamasaba
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...masabamasaba
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...chiefasafspells
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benonimasabamasaba
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastPapp Krisztián
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2
 

Último (20)

%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 

Page reclaim

  • 3. What’s Page Frame ❖ page frame = A page-sized/aligned piece of RAM! ❖ struct page = An one-on-one structure in kernel for each page frame! ❖ mem_map! ❖ Unique array of struct page's which covers all RAM that a kernel manages.! ❖ but in CONFIG_SPARSEMEM environment! ❖ There's no unique mem_map.! ❖ Instead, there's a list of 2MB-sized arrays of struct page's.! ❖ You must use __pfn_to_page(), __page_to_pfn() or wrappers of them.
  • 4. What’s NUMA ❖ NUMA(Non-Uniform Memory Architecture)! ❖ System is comprised of nodes.! ❖ Each node is defined by a set of CPUs and one physical memory range.! ❖ Memory access latency differs depending on source and destination nodes.! ❖ NUMA configuration! ❖ ACPI provides NUMA configuration:! ❖ SRAT(Static Resource Affinity Table)! ❖ To know which CPUs and memory range are contained in which NUMA node?! ❖ SLIT(System Locality Information Table)! ❖ To know how far a NUMA node is from another node?
  • 5. What’s Memory Zone ❖ Physical memory is separated by address range:! ❖ ZONE_DMA: <16MB! ❖ ZONE_DMA32: <4GB! ❖ ZONE_NORMAL: the rest! ❖ ZONE_MOVABLE: none by default.! ❖ This is used to define a hot-removable physical memory range.
  • 6. struct pglist_data {! struct zone node_zone[MAX_NR_ZONES];! }; Memory node, zone 物理アドレス Range1 Range2 CPU1 CPU2 CPU3 CPU4 struct pglist_data {! struct zone node_zone[MAX_NR_ZONES];! …! }; NUMA node1 NUMA node2 ❖ どのpglist_dataにも各ZONE(DMA∼MOVABLE)に対応する zone構造体が用意される(但し一部の中身は空かもしれない)
  • 7. Memory Allocation 1. At first, checks threshold for each zone
 (threshold = watermark and dirty-ratio).! ❖ If all zones are failed, the kernel goes into page reclaim path (=today’s topic).! 2. If some zone is ok, allocates a page from the zone’s buddy system.! ❖ 0-order page is allocated from per-cpu cache.! ❖ higher order page is obtained from per-order lists of pages
  • 8. Memory Deallocation ❖ Page is returned to buddy system.! ❖ 0-order page is returned to per-cpu cache via free_hot_cold_page().! ❖ Cold page: A page estimated not to be on CPU cache! ❖ This is linked to the tail of LRU list of the per-cpu cache.! ❖ Hot page: A page estimated to be on CPU cache! ❖ This is linked to the head of LRU list of the per-cpu cache.! ❖ higher order page is directly returned to per-order lists of pages.
  • 9. Buddy System 4k 4k 4k 8k 8k 8k 4m 4m 4m ・・・ Per-cpu cache 4k 4k 4k Per-zone buddy system order0
 (de)alloc HOT COLD order1 order10 ・・・
  • 10. 2. ページの回収 2.1 Direct reclaim! 2.2 Daemon reclaim
  • 11. ページ割当フローの復習 ❖ __alloc_pages_nodemask(ページ割当基本関数)! ❖ get_page_from_freelist(1st: local zones, low wmark) → get_page_from_freelist(2nd: all zones)! ❖ __alloc_pages_slowpath! 1. wake_all_kswapds(kswapd達の起床)! 2. get_page_from_freelist(3rd: all zones, min wmark)! 3. if {__GFP,PF}_MEMALLOC → __alloc_pages_high_priority! 4. __alloc_pages_direct_compact(非同期的)! 5. __alloc_pages_direct_reclaim(本コンテキストで直接ページ回収)! 6. if not did_some_progress → __alloc_pages_may_oom! 7. リトライ(2.へ) 又は __alloc_pages_direct_compact(同期的)
  • 13. __alloc_pages_direct_reclaim() ❖ __perform_reclaim! ❖ current->flags |= PF_MEMALLOC! ❖ ページ回収の延長でページ割当が必要になった時に、緊急備蓄分を使用できるように! ❖ try_to_free_pages! ❖ throttle_direct_reclaim! ❖ if !pfmemalloc_watermark_ok →  kswapdによりokになるのを待機! ❖ do_try_to_free_pages! ❖ current->flags &= ~PF_MEMALLOC! ❖ get_page_from_freelist! ❖ drain_all_pages! ❖ get_page_from_freelist
  • 14. pfmemalloc_watermark_ok() ❖ ARGS! ❖ pgdat(type: struct pglist_data)! ❖ RETURN! ❖ type: bool! ❖ node’s free_pages > 0.5 * node’s min_wmark! ❖ DESC! ❖ node単位で(zone単位でなく)、フリーページ量を min watermarkの半分と比較し、超え ていればOK! ❖ 下回っていればfalseを返すとともに、 当該nodeのkswapdを起床! ❖ メモリ 迫したnodeではdirect reclaimはやめて kswapdに任せる、その閾値を決める関数。
  • 15. do_try_to_free_pages() ❖ Core function for page reclaim, which is called at 3 different scenes! ❖ try_to_free_pages() → Global reclaim path via __alloc_pages_nodemask()! ❖ try_to_free_mem_cgroup_pages() → Per-memcg reclaim path! ❖ Right before per-memcg slab allocation! ❖ Right before per-memcg file page allocation! ❖ Right before per-memcg anon page allocation! ❖ Right before per-memcg swapin allocation! ❖ shrink_all_memory() → Hibernation path! ❖ Arguments: (1)struct zonelist *zonelist (2)struct scan_control *sc
  • 16. struct scan_control struct scan_control {! ! unsigned long nr_scanned;! ! unsigned long nr_reclaimed;! ! unsigned long nr_to_reclaim;! ! …! ! int swappiness; // 0..100! ! …! ! struct mem_cgroup *target_mem_cgroup;! ! …! ! nodemask_t! *nodemask;! };!
  • 17. do_try_to_free_pagesの処理 ❖ 以下二つのループ! ❖ shrink_zones()! ❖ 後述! ❖ wakeup_flusher_threads()! ❖ shrink_zonesが、回収目標(scan_context::nr_to_reclaim)の1.5 倍以上のページをスキャンするたび、呼び出し。! ❖ 最大で、スキャンした分のページをライトバックするよう、 全ブロックデバイス(bdi)に要求。
  • 18. shrink_zones() 1. for_each_zone_zonelist_nodemask:! 1. mem_cgroup_soft_limit_reclaim! ❖ while mem_cgroup_largest_soft_limit_node:! ❖ mem_cgroup_soft_reclaim! ❖ shrink_zoneに進む前に、当該zoneを使ってる memcgでlimitを超えてるものについて、 ページ 回収を済ませる処理! 2. shrink_zone! ❖ foreach mem_cgroup_iter:! ❖ shrink_lruvec! ❖ ここでのiterationはGlobal reclaimの場合は root memcgから回収! 2. shrink_slab! ❖ スラブについては次回以降で・・・
  • 19. shrink_lruvec() ❖ per-zone page freer! 1. get_scan_count! ❖ 回収目標ページ数決定! 2. while 目標未達:! ❖ shrink_list(LRU_INACTIVE_ANON)! ❖ shrink_list(LRU_ACTIVE_ANON)! ❖ shrink_list(LRU_INACTIVE_FILE)! ❖ shrink_list(LRU_ACTIVE_FILE)! 3. if INACTIVEな無名メモリだけでは不足:! ❖ shrink_active_list
  • 20. shrink_list() ❖ shrink_{active or inactive}_listを呼ぶ、但し、activeリストを shrinkするのは、対となるinactiveリストより大きい場合のみ! 1. if ACTIVEなリストを指定:! ❖ if size of lru(ACTIVE) > size of lru(INACTIVE):! ❖ shrink_active_list! 2. else:! ❖ shrink_inactive_list
  • 21. shrink_{active,inactive}_list ❖ shrink_active_list()! 1. Traverse pages in an active list! 2. Find inactive pages in the list and move them to an inactive list! ❖ shrink_inactive_list()! ❖ foreach page:! 1. page_mapped(page) => try_to_unmap(page)! 2. if PageDirty(page) => pageout(page)
  • 22. inactiveなページとは ❖ !laptop_modeの場合! ❖ active LRU listの末尾から、単純に指定数分のページ をinactiveなページとして取得! ❖ laptop_modeの場合! ❖ active LRU listの末尾から、cleanな指定数分のページ をinactiveなページとして取得
  • 23. try_to_unmap() ❖ Unmap a specified page from all corresponding mappings! 1. Set up struct rmap_walk_control.! 2. rmap_walk_{file, anon, or ksm}! ❖ rmap walk is iterating VMAs and unmapping from it! A. file: traverse address_space::i_mmap tree! B. anon: traverse anon_vma tree! C. ksm: traverse all merged anon_vma trees! ❖ each operation is similar to that for anon
  • 25. B. rmap_walk_anon page anon_vma rb_root(type:rb_root) vma vma vma vma pgtbl pgtbl pgtbl pgtbl unmap
  • 28. kswapd ❖ Processing overview! 1. Wake up! 2. balance_pgdat()! 3. Sleep! ❖ balance_pgdat()! ❖ Work until all zones of pgdat are at or over hi-wmark.! ❖ reclaim function: kswapd_shrink_zone()