SlideShare una empresa de Scribd logo
1 de 27
Descargar para leer sin conexión
GCMA: Guaranteed
Contiguous Memory Allocator
SeongJae Park1
, Minchan Kim2
, Heon Y. Yeom1
Seoul National University1
LG Electronics2
This work by SeongJae Park is licensed under the Creative
Commons Attribution-ShareAlike 3.0 Unported License. To
view a copy of this license, visit
http://creativecommons.org/licenses/by-sa/3.0/.
These slides were presented during
EWiLi 2015 at Amsterdam
(http://syst.univ-brest.fr/ewili2015/)
TL; DR
● Contiguous Memory Allocation is a big problem of embedded system
● Current solutions have weakness
○ Too expensive for embedded system, or
○ Waste memory space, or
○ Too slow and even fail to allocate
● GCMA, our new solution, guarantees
○ Fast latency of contiguous memory allocation
○ Success of contiguous memory allocation
○ and Reasonable memory utilization
● That’s it! ;)
Who Wants Physically Contiguous Memory?
● Virtual Memory avoids the fragmentation problem
○ Applications use virtual address; virtual address can be mapped to any physical address
○ MMU do mapping of virtual address to physical address
● In Linux, it uses demand paging and reclaim
Physical
Memory
Application MMUCPU
Logical address Physical address
We Need Physically Contiguous Memory!
● Devices using DMA
○ Lots of devices need large, contiguous memory for internal buffer
(e.g., 12 Mega-Pixel Camera needs large, contiguous buffer)
○ Because MMU is behind the CPU, devices using DMA cannot use virtual address space
○ Slow latency or failure of allocation can harm user experience or life
(What if your camera does not react while a kitty go away)
Physical
Memory
Application MMUCPU
Logical address Physical address
Device
DMA
Physical address
We Need Physically Contiguous Memory!
● Devices using DMA
○ Lots of devices need large, contiguous memory for internal buffer
(e.g., 12 Mega-Pixel Camera needs large, contiguous buffer)
○ Because MMU is behind the CPU, devices using DMA can not use virtual address space
○ Slow latency or failure of allocation can harm user experience or life
(What if your camera does not react while a kitty go away)
● Huge page, System internal buffer
○ Huge pages can enhance system performance by minimizing TLB miss
○ System needs contiguous memory for internal usage
e.g., struct page *alloc_pages(unsigned int flags, unsigned int order);
NOTE: This research focuses on device case for embedded system, though
H/W Solutions: IOMMU, Scatter / Gather DMA
● IOMMU serves as MMU for devices
● Scatter / Gather DMA let DMA to discontiguous memory area
● Consumes more power and money because it requires additional H/W
● Can be expensive for low-end embedded devices
● Useless for Huge page or system internal physically contiguous memory
Physical
Memory
Application MMUCPU
Logical address Physicals address
Device
DMA
Device address
IOMMU
Physical address
Reserved Area Technique
● Reserves sufficient amount of contiguous area at boot time and
let only contiguous memory allocation to use the area
● Pros: Simple and effective for contiguous memory allocation
● Cons: System memory space utilization could be low if the contiguous
memory allocation does not fully use the reserved area
○ Sacrifice 10% of your phone memory for a kitty that may or may not meet with?
● Widely adopted solution despite of the utilization problem
System Memory
(Narrowed by Reservation)
Reserved
Area
Process
Normal Allocation Contiguous Allocation
CMA: Contiguous Memory Allocator
● Software-based solution in mainline Linux
● Generously extended version of Reserved area technique
○ Mainly focus on memory utilization
○ Let movable pages to use the reserved area
○ If contig mem alloc requires pages being used for movable pages,
move those pages out of reserved area and give the vacant area to contig mem alloc
○ Solves memory utilization problem because general pages are movable
● In other words, CMA gives different priority to clients of the reserved area
○ Primary client: contiguous memory allocation
○ Secondary client: movable page allocation
CMA: in Real-land
● Worst case latency for a photo shot using Camera of Raspberry Pi 2 under
background workload is
● 1.6 Seconds when configured with Reserved area technique,
● 9.8 seconds when configured with CMA; Enough for a kitty to go away
The Phantom Menace Unveiled: Latency of CMA
● Latency of CMA during a photo shot using Camera of Raspberry Pi 2 under
background workload
● It’s clear that latency of Camera is bounded by CMA
CMA: Why So Slow?
● In short, secondary client of CMA is not so nice as expected
○ In this context, ‘nice’ means as unix command ‘nice’ means
● Moving page is expensive task (copying, rmap control, LRU churn, …)
● If someone is holding the page, it could not be moved until the one releases
the page (e.g., get_user_page())
● Result: Unexpected long latency and even failure
● That’s why CMA is not adopted to many devices
● Raspberry Pi does not support CMA officially because of the problem
GCMA: Guaranteed Contiguous Memory Allocator
● GCMA is a variant of CMA that guarantees
○ Fast latency and success of allocation
○ While keeping memory utilization as well
● Main idea is simple
○ Follow primary / secondary client idea of CMA to keep memory utilization
○ Select secondary client as only nice one unlike CMA
■ For fast latency, should be able to vacate the reserved area as soon as required without
any task such as moving content
■ For success, should be out of kernel control
■ For memory utilization, most pages should be able to be secondary client
Secondary Client of GCMA
● Last chance cache for
○ pages being swapped out and
○ clean pages being evicted from file cache
● Pages being swapped out to write-through swap device
○ Could be discarded immediately because the contents are written through to swap device
○ Kernel think it’s already swapped out
○ Most anonymous pages could be swapped out
● Clean pages being evicted from file cache
○ Could be discarded because the contents are in storage already
○ Kernel think it’s already evicted out
○ Lots of file-backed pages could be the case
● Additional pros 1: Already expected to not be accessed again soon
○ Discarding secondary client pages from GCAM will not affect system performance much
● Additional pros 2: Occurs with severe workload, only
○ In peaceful case, zero overhead at all
GCMA: Workflow
● Reserve memory area in boot time
● If a page is swapped out or evicted from file cache, keep the page in reserved
area
○ If system requires the page again, give it back from the reserved area
○ If contiguous memory allocation requires area being used by those pages, discard those
pages and use the area for contiguous memory allocation
System Memory GCMA Reserved
Area
Swap Device
File
Process
Normal Allocation Contiguous Allocation
Reclaim / Swap
Get back if hit
GCMA Implementation
● Used Cleancache / Frontswap interface for second class client of GCMA
rather than inventing the wheel again
● DMEM: Discardable Memory
○ Works as a last chance cache for secondary client pages using GCMA reserved area
○ Manages Index table using hash table and RB-tree
○ Evicts pages in LRU scheme
Reserved Area
Dmem Logic
GCMA Logic
Dmem Interface
CMA Interface
CMA Logic
Cleancache Frontswap
GCMA Implementation: in Short
● Implemented on Linux v3.18
● About 1,500 lines of code
● Ported, evaluated on CuBox and Raspberry Pi 2
● Available under GPL v3 at: https://github.com/sjp38/linux.
gcma/releases/tag/gcma/rfc/v2
GCMA Optimization for Write-through Swap
● Write-through swap device could consume I/O resource
● GCMA recommends compressed memory block device as a swap device
○ Zram is a compressed memory block device in upstream Linux
○ Google also recommends zram as a swap device for low-memory Android devices
Experimental Setup
● Device setup
○ Raspberry Pi 2
○ ARM cortex-A7 900 MHz * 4 cores
○ 1 GiB LPDDR2 SDRAM
○ Class 10 SanDisk 16 GiB microSD card
● Configurations
○ Baseline: Linux rpi-v3.18.11 + 100 MiB swap +
256 MiB reserved area
○ CMA: Linux rpi-v3.18.11 + 100 MiB swap +
256 MiB CMA area
○ GCMA: Linux rpi-v3.18.11 + 100 MiB Zram swap +
256 MiB GCMA area
● Workloads
○ Contig Mem Alloc: Contiguous memory allocation using CMA or GCMA
○ Blogbench: Realistic file system benchmark
○ Camera shot: Repeated shot of picture using Raspberry Pi 2 Camera module with 10 seconds
interval between each shot
Contig Mem Alloc without Background Workload
● Average of 30 allocations
● GCMA shows 14.89x to 129.41x faster latency compared to CMA
● CMA even failed once for 32,768 contiguous pages allocation even though
there is no background task
4MiB Contig Mem Alloc w/o Background Workload
● Latency of CMA is more predictable compared to GCMA
4MiB Contig Mem Allocation w/ Background Task
● Background workload: Blogbench
● CMA consumes 5 seconds (unacceptable!) for one allocation in bad case
Camera Latency w/ Background Task
● Camera is a realistic, important workload of Raspberry Pi 2
● Background task: Blogbench
● GCMA keeps latency as fast as Reserved area technique configuration
Blogbench on CMA / GCMA
● Scores are normalized to scores of Baseline configuration
● ‘/ cam’ means camera workload as a background workload
● System under GCMA even outperforms CMA with realistic workload owing to
its light overhead
● Allocation using CMA even degrades system performance because it should
move out pages in reserved area
Conclusion
● Contiguous memory allocation is still a big problem for embedded device
○ H/W solutions are too expensive
○ S/W solutions like CMA is not effective
○ Most system uses Reserved area technique despite of low memory utilization
● GCMA guarantees fast latency, success of allocation and reasonable memory
utilization
○ Allocation latency of GCMA is as fast as Reserved area technique
○ System performance under GCMA is is similar or outperforms the performance under CMA
Thank You
Any question?

Más contenido relacionado

La actualidad más candente

DB Latency Using DRAM + PMem in App Direct & Memory Modes
DB Latency Using DRAM + PMem in App Direct & Memory ModesDB Latency Using DRAM + PMem in App Direct & Memory Modes
DB Latency Using DRAM + PMem in App Direct & Memory ModesScyllaDB
 
Vanquishing Latency Outliers in the Lightbits LightOS Software Defined Storag...
Vanquishing Latency Outliers in the Lightbits LightOS Software Defined Storag...Vanquishing Latency Outliers in the Lightbits LightOS Software Defined Storag...
Vanquishing Latency Outliers in the Lightbits LightOS Software Defined Storag...ScyllaDB
 
µCLinux on Pluto 6 Project presentation
µCLinux on Pluto 6 Project presentationµCLinux on Pluto 6 Project presentation
µCLinux on Pluto 6 Project presentationedlangley
 
Performance: Observe and Tune
Performance: Observe and TunePerformance: Observe and Tune
Performance: Observe and TunePaul V. Novarese
 
Kernel Recipes 2017 - What's new in the world of storage for Linux - Jens Axboe
Kernel Recipes 2017 - What's new in the world of storage for Linux - Jens AxboeKernel Recipes 2017 - What's new in the world of storage for Linux - Jens Axboe
Kernel Recipes 2017 - What's new in the world of storage for Linux - Jens AxboeAnne Nicolas
 
Get Lower Latency and Higher Throughput for Java Applications
Get Lower Latency and Higher Throughput for Java ApplicationsGet Lower Latency and Higher Throughput for Java Applications
Get Lower Latency and Higher Throughput for Java ApplicationsScyllaDB
 
Where Did All These Cycles Go?
Where Did All These Cycles Go?Where Did All These Cycles Go?
Where Did All These Cycles Go?ScyllaDB
 
LAS16-TR04: Using tracing to tune and optimize EAS (English)
LAS16-TR04: Using tracing to tune and optimize EAS (English)LAS16-TR04: Using tracing to tune and optimize EAS (English)
LAS16-TR04: Using tracing to tune and optimize EAS (English)Linaro
 
BKK16-208 EAS
BKK16-208 EASBKK16-208 EAS
BKK16-208 EASLinaro
 
Kernel Recipes 2016 - Speeding up development by setting up a kernel build farm
Kernel Recipes 2016 - Speeding up development by setting up a kernel build farmKernel Recipes 2016 - Speeding up development by setting up a kernel build farm
Kernel Recipes 2016 - Speeding up development by setting up a kernel build farmAnne Nicolas
 
Practical SystemTAP basics: Perl memory profiling
Practical SystemTAP basics: Perl memory profilingPractical SystemTAP basics: Perl memory profiling
Practical SystemTAP basics: Perl memory profilingLubomir Rintel
 
Let’s Fix Logging Once and for All
Let’s Fix Logging Once and for AllLet’s Fix Logging Once and for All
Let’s Fix Logging Once and for AllScyllaDB
 
Linux rt in financial markets
Linux rt in financial marketsLinux rt in financial markets
Linux rt in financial marketsAdrien Mahieux
 
Using eBPF to Measure the k8s Cluster Health
Using eBPF to Measure the k8s Cluster HealthUsing eBPF to Measure the k8s Cluster Health
Using eBPF to Measure the k8s Cluster HealthScyllaDB
 
Luca Abeni - Real-Time Virtual Machines with Linux and kvm
Luca Abeni - Real-Time Virtual Machines with Linux and kvmLuca Abeni - Real-Time Virtual Machines with Linux and kvm
Luca Abeni - Real-Time Virtual Machines with Linux and kvmlinuxlab_conf
 
Linux kernel memory allocators
Linux kernel memory allocatorsLinux kernel memory allocators
Linux kernel memory allocatorsHao-Ran Liu
 
Keeping Latency Low and Throughput High with Application-level Priority Manag...
Keeping Latency Low and Throughput High with Application-level Priority Manag...Keeping Latency Low and Throughput High with Application-level Priority Manag...
Keeping Latency Low and Throughput High with Application-level Priority Manag...ScyllaDB
 
Extreme HTTP Performance Tuning: 1.2M API req/s on a 4 vCPU EC2 Instance
Extreme HTTP Performance Tuning: 1.2M API req/s on a 4 vCPU EC2 InstanceExtreme HTTP Performance Tuning: 1.2M API req/s on a 4 vCPU EC2 Instance
Extreme HTTP Performance Tuning: 1.2M API req/s on a 4 vCPU EC2 InstanceScyllaDB
 
Using QEMU for cross development
Using QEMU for cross developmentUsing QEMU for cross development
Using QEMU for cross developmentTetsuyuki Kobayashi
 
Process scheduling
Process schedulingProcess scheduling
Process schedulingHao-Ran Liu
 

La actualidad más candente (20)

DB Latency Using DRAM + PMem in App Direct & Memory Modes
DB Latency Using DRAM + PMem in App Direct & Memory ModesDB Latency Using DRAM + PMem in App Direct & Memory Modes
DB Latency Using DRAM + PMem in App Direct & Memory Modes
 
Vanquishing Latency Outliers in the Lightbits LightOS Software Defined Storag...
Vanquishing Latency Outliers in the Lightbits LightOS Software Defined Storag...Vanquishing Latency Outliers in the Lightbits LightOS Software Defined Storag...
Vanquishing Latency Outliers in the Lightbits LightOS Software Defined Storag...
 
µCLinux on Pluto 6 Project presentation
µCLinux on Pluto 6 Project presentationµCLinux on Pluto 6 Project presentation
µCLinux on Pluto 6 Project presentation
 
Performance: Observe and Tune
Performance: Observe and TunePerformance: Observe and Tune
Performance: Observe and Tune
 
Kernel Recipes 2017 - What's new in the world of storage for Linux - Jens Axboe
Kernel Recipes 2017 - What's new in the world of storage for Linux - Jens AxboeKernel Recipes 2017 - What's new in the world of storage for Linux - Jens Axboe
Kernel Recipes 2017 - What's new in the world of storage for Linux - Jens Axboe
 
Get Lower Latency and Higher Throughput for Java Applications
Get Lower Latency and Higher Throughput for Java ApplicationsGet Lower Latency and Higher Throughput for Java Applications
Get Lower Latency and Higher Throughput for Java Applications
 
Where Did All These Cycles Go?
Where Did All These Cycles Go?Where Did All These Cycles Go?
Where Did All These Cycles Go?
 
LAS16-TR04: Using tracing to tune and optimize EAS (English)
LAS16-TR04: Using tracing to tune and optimize EAS (English)LAS16-TR04: Using tracing to tune and optimize EAS (English)
LAS16-TR04: Using tracing to tune and optimize EAS (English)
 
BKK16-208 EAS
BKK16-208 EASBKK16-208 EAS
BKK16-208 EAS
 
Kernel Recipes 2016 - Speeding up development by setting up a kernel build farm
Kernel Recipes 2016 - Speeding up development by setting up a kernel build farmKernel Recipes 2016 - Speeding up development by setting up a kernel build farm
Kernel Recipes 2016 - Speeding up development by setting up a kernel build farm
 
Practical SystemTAP basics: Perl memory profiling
Practical SystemTAP basics: Perl memory profilingPractical SystemTAP basics: Perl memory profiling
Practical SystemTAP basics: Perl memory profiling
 
Let’s Fix Logging Once and for All
Let’s Fix Logging Once and for AllLet’s Fix Logging Once and for All
Let’s Fix Logging Once and for All
 
Linux rt in financial markets
Linux rt in financial marketsLinux rt in financial markets
Linux rt in financial markets
 
Using eBPF to Measure the k8s Cluster Health
Using eBPF to Measure the k8s Cluster HealthUsing eBPF to Measure the k8s Cluster Health
Using eBPF to Measure the k8s Cluster Health
 
Luca Abeni - Real-Time Virtual Machines with Linux and kvm
Luca Abeni - Real-Time Virtual Machines with Linux and kvmLuca Abeni - Real-Time Virtual Machines with Linux and kvm
Luca Abeni - Real-Time Virtual Machines with Linux and kvm
 
Linux kernel memory allocators
Linux kernel memory allocatorsLinux kernel memory allocators
Linux kernel memory allocators
 
Keeping Latency Low and Throughput High with Application-level Priority Manag...
Keeping Latency Low and Throughput High with Application-level Priority Manag...Keeping Latency Low and Throughput High with Application-level Priority Manag...
Keeping Latency Low and Throughput High with Application-level Priority Manag...
 
Extreme HTTP Performance Tuning: 1.2M API req/s on a 4 vCPU EC2 Instance
Extreme HTTP Performance Tuning: 1.2M API req/s on a 4 vCPU EC2 InstanceExtreme HTTP Performance Tuning: 1.2M API req/s on a 4 vCPU EC2 Instance
Extreme HTTP Performance Tuning: 1.2M API req/s on a 4 vCPU EC2 Instance
 
Using QEMU for cross development
Using QEMU for cross developmentUsing QEMU for cross development
Using QEMU for cross development
 
Process scheduling
Process schedulingProcess scheduling
Process scheduling
 

Destacado

Git inter-snapshot public
Git  inter-snapshot publicGit  inter-snapshot public
Git inter-snapshot publicSeongJae Park
 
Porting golang development environment developed with golang
Porting golang development environment developed with golangPorting golang development environment developed with golang
Porting golang development environment developed with golangSeongJae Park
 
An introduction to_golang.avi
An introduction to_golang.aviAn introduction to_golang.avi
An introduction to_golang.aviSeongJae Park
 
Let the contribution begin (EST futures)
Let the contribution begin  (EST futures)Let the contribution begin  (EST futures)
Let the contribution begin (EST futures)SeongJae Park
 
Deep dark side of git - prologue
Deep dark side of git - prologueDeep dark side of git - prologue
Deep dark side of git - prologueSeongJae Park
 
(Live) build and run golang web server on android.avi
(Live) build and run golang web server on android.avi(Live) build and run golang web server on android.avi
(Live) build and run golang web server on android.aviSeongJae Park
 
Develop Android/iOS app using golang
Develop Android/iOS app using golangDevelop Android/iOS app using golang
Develop Android/iOS app using golangSeongJae Park
 
Sw install with_without_docker
Sw install with_without_dockerSw install with_without_docker
Sw install with_without_dockerSeongJae Park
 
Deep dark-side of git: How git works internally
Deep dark-side of git: How git works internallyDeep dark-side of git: How git works internally
Deep dark-side of git: How git works internallySeongJae Park
 
Develop Android app using Golang
Develop Android app using GolangDevelop Android app using Golang
Develop Android app using GolangSeongJae Park
 

Destacado (11)

Git inter-snapshot public
Git  inter-snapshot publicGit  inter-snapshot public
Git inter-snapshot public
 
Porting golang development environment developed with golang
Porting golang development environment developed with golangPorting golang development environment developed with golang
Porting golang development environment developed with golang
 
An introduction to_golang.avi
An introduction to_golang.aviAn introduction to_golang.avi
An introduction to_golang.avi
 
Let the contribution begin (EST futures)
Let the contribution begin  (EST futures)Let the contribution begin  (EST futures)
Let the contribution begin (EST futures)
 
Deep dark side of git - prologue
Deep dark side of git - prologueDeep dark side of git - prologue
Deep dark side of git - prologue
 
(Live) build and run golang web server on android.avi
(Live) build and run golang web server on android.avi(Live) build and run golang web server on android.avi
(Live) build and run golang web server on android.avi
 
Develop Android/iOS app using golang
Develop Android/iOS app using golangDevelop Android/iOS app using golang
Develop Android/iOS app using golang
 
Sw install with_without_docker
Sw install with_without_dockerSw install with_without_docker
Sw install with_without_docker
 
Deep dark-side of git: How git works internally
Deep dark-side of git: How git works internallyDeep dark-side of git: How git works internally
Deep dark-side of git: How git works internally
 
Develop Android app using Golang
Develop Android app using GolangDevelop Android app using Golang
Develop Android app using Golang
 
Memory management
Memory managementMemory management
Memory management
 

Similar a gcma: guaranteed contiguous memory allocator

Continguous Memory Allocator in the Linux Kernel
Continguous Memory Allocator in the Linux KernelContinguous Memory Allocator in the Linux Kernel
Continguous Memory Allocator in the Linux KernelKernel TLV
 
Virtualization for Emerging Memory Devices
Virtualization for Emerging Memory DevicesVirtualization for Emerging Memory Devices
Virtualization for Emerging Memory DevicesTakahiro Hirofuchi
 
Linux Memory Basics for SysAdmins - ChinaNetCloud Training
Linux Memory Basics for SysAdmins - ChinaNetCloud TrainingLinux Memory Basics for SysAdmins - ChinaNetCloud Training
Linux Memory Basics for SysAdmins - ChinaNetCloud TrainingChinaNetCloud
 
LCE13: Android Graphics Upstreaming
LCE13: Android Graphics UpstreamingLCE13: Android Graphics Upstreaming
LCE13: Android Graphics UpstreamingLinaro
 
IBM MQ Disaster Recovery
IBM MQ Disaster RecoveryIBM MQ Disaster Recovery
IBM MQ Disaster RecoveryMarkTaylorIBM
 
LCA13: Memory Hotplug on Android
LCA13: Memory Hotplug on AndroidLCA13: Memory Hotplug on Android
LCA13: Memory Hotplug on AndroidLinaro
 
Memory Management in Amoeba
Memory Management in AmoebaMemory Management in Amoeba
Memory Management in AmoebaRamesh Adhikari
 
The Dark Side Of Go -- Go runtime related problems in TiDB in production
The Dark Side Of Go -- Go runtime related problems in TiDB  in productionThe Dark Side Of Go -- Go runtime related problems in TiDB  in production
The Dark Side Of Go -- Go runtime related problems in TiDB in productionPingCAP
 
Sc19 ibm hms final
Sc19 ibm hms finalSc19 ibm hms final
Sc19 ibm hms finalYutaka Kawai
 
Webinar: Eliminate Backups and Simplify DR with Hybrid Cloud Storage
Webinar: Eliminate Backups and Simplify DR with Hybrid Cloud StorageWebinar: Eliminate Backups and Simplify DR with Hybrid Cloud Storage
Webinar: Eliminate Backups and Simplify DR with Hybrid Cloud StorageStorage Switzerland
 
Task migration using CRIU
Task migration using CRIUTask migration using CRIU
Task migration using CRIURohit Jnagal
 
Enhanced Live Migration for Intensive Memory Loads
Enhanced Live Migration for Intensive Memory LoadsEnhanced Live Migration for Intensive Memory Loads
Enhanced Live Migration for Intensive Memory LoadsSamsung Open Source Group
 
Literature survey presentation
Literature survey presentationLiterature survey presentation
Literature survey presentationKarthik Iyr
 
Recent advancements in cache technology
Recent advancements in cache technologyRecent advancements in cache technology
Recent advancements in cache technologyParas Nath Chaudhary
 
Kernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea Arcangeli
Kernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea ArcangeliKernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea Arcangeli
Kernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea ArcangeliAnne Nicolas
 
Improving MeeGo boot-up time
Improving MeeGo boot-up timeImproving MeeGo boot-up time
Improving MeeGo boot-up timeHiroshi Doyu
 
Measuring a 25 and 40Gb/s Data Plane
Measuring a 25 and 40Gb/s Data PlaneMeasuring a 25 and 40Gb/s Data Plane
Measuring a 25 and 40Gb/s Data PlaneOpen-NFP
 

Similar a gcma: guaranteed contiguous memory allocator (20)

Continguous Memory Allocator in the Linux Kernel
Continguous Memory Allocator in the Linux KernelContinguous Memory Allocator in the Linux Kernel
Continguous Memory Allocator in the Linux Kernel
 
Virtualization for Emerging Memory Devices
Virtualization for Emerging Memory DevicesVirtualization for Emerging Memory Devices
Virtualization for Emerging Memory Devices
 
Linux Memory Basics for SysAdmins - ChinaNetCloud Training
Linux Memory Basics for SysAdmins - ChinaNetCloud TrainingLinux Memory Basics for SysAdmins - ChinaNetCloud Training
Linux Memory Basics for SysAdmins - ChinaNetCloud Training
 
LCE13: Android Graphics Upstreaming
LCE13: Android Graphics UpstreamingLCE13: Android Graphics Upstreaming
LCE13: Android Graphics Upstreaming
 
IBM MQ Disaster Recovery
IBM MQ Disaster RecoveryIBM MQ Disaster Recovery
IBM MQ Disaster Recovery
 
LCA13: Memory Hotplug on Android
LCA13: Memory Hotplug on AndroidLCA13: Memory Hotplug on Android
LCA13: Memory Hotplug on Android
 
Memory Management in Amoeba
Memory Management in AmoebaMemory Management in Amoeba
Memory Management in Amoeba
 
The Dark Side Of Go -- Go runtime related problems in TiDB in production
The Dark Side Of Go -- Go runtime related problems in TiDB  in productionThe Dark Side Of Go -- Go runtime related problems in TiDB  in production
The Dark Side Of Go -- Go runtime related problems in TiDB in production
 
RAMinate Invited Talk at NII
RAMinate Invited Talk at NIIRAMinate Invited Talk at NII
RAMinate Invited Talk at NII
 
Sc19 ibm hms final
Sc19 ibm hms finalSc19 ibm hms final
Sc19 ibm hms final
 
Webinar: Eliminate Backups and Simplify DR with Hybrid Cloud Storage
Webinar: Eliminate Backups and Simplify DR with Hybrid Cloud StorageWebinar: Eliminate Backups and Simplify DR with Hybrid Cloud Storage
Webinar: Eliminate Backups and Simplify DR with Hybrid Cloud Storage
 
Task migration using CRIU
Task migration using CRIUTask migration using CRIU
Task migration using CRIU
 
Enhanced Live Migration for Intensive Memory Loads
Enhanced Live Migration for Intensive Memory LoadsEnhanced Live Migration for Intensive Memory Loads
Enhanced Live Migration for Intensive Memory Loads
 
Literature survey presentation
Literature survey presentationLiterature survey presentation
Literature survey presentation
 
Recent advancements in cache technology
Recent advancements in cache technologyRecent advancements in cache technology
Recent advancements in cache technology
 
ch9_virMem.pdf
ch9_virMem.pdfch9_virMem.pdf
ch9_virMem.pdf
 
Kernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea Arcangeli
Kernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea ArcangeliKernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea Arcangeli
Kernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea Arcangeli
 
Improving MeeGo boot-up time
Improving MeeGo boot-up timeImproving MeeGo boot-up time
Improving MeeGo boot-up time
 
Measuring a 25 and 40Gb/s Data Plane
Measuring a 25 and 40Gb/s Data PlaneMeasuring a 25 and 40Gb/s Data Plane
Measuring a 25 and 40Gb/s Data Plane
 
RAMinate ACM SoCC 2016 Talk
RAMinate ACM SoCC 2016 TalkRAMinate ACM SoCC 2016 Talk
RAMinate ACM SoCC 2016 Talk
 

Más de SeongJae Park

Biscuit: an operating system written in go
Biscuit:  an operating system written in goBiscuit:  an operating system written in go
Biscuit: an operating system written in goSeongJae Park
 
DO YOU WANT TO USE A VCS
DO YOU WANT TO USE A VCSDO YOU WANT TO USE A VCS
DO YOU WANT TO USE A VCSSeongJae Park
 
Experimental android hacking using reflection
Experimental android hacking using reflectionExperimental android hacking using reflection
Experimental android hacking using reflectionSeongJae Park
 
Let the contribution begin
Let the contribution beginLet the contribution begin
Let the contribution beginSeongJae Park
 
Touch Android Without Touching
Touch Android Without TouchingTouch Android Without Touching
Touch Android Without TouchingSeongJae Park
 
AOSP에 컨트리뷰션 하기 dev festx korea 2012 presentation
AOSP에 컨트리뷰션 하기   dev festx korea 2012 presentationAOSP에 컨트리뷰션 하기   dev festx korea 2012 presentation
AOSP에 컨트리뷰션 하기 dev festx korea 2012 presentationSeongJae Park
 

Más de SeongJae Park (8)

Biscuit: an operating system written in go
Biscuit:  an operating system written in goBiscuit:  an operating system written in go
Biscuit: an operating system written in go
 
DO YOU WANT TO USE A VCS
DO YOU WANT TO USE A VCSDO YOU WANT TO USE A VCS
DO YOU WANT TO USE A VCS
 
Experimental android hacking using reflection
Experimental android hacking using reflectionExperimental android hacking using reflection
Experimental android hacking using reflection
 
ash
ashash
ash
 
Hacktime for adk
Hacktime for adkHacktime for adk
Hacktime for adk
 
Let the contribution begin
Let the contribution beginLet the contribution begin
Let the contribution begin
 
Touch Android Without Touching
Touch Android Without TouchingTouch Android Without Touching
Touch Android Without Touching
 
AOSP에 컨트리뷰션 하기 dev festx korea 2012 presentation
AOSP에 컨트리뷰션 하기   dev festx korea 2012 presentationAOSP에 컨트리뷰션 하기   dev festx korea 2012 presentation
AOSP에 컨트리뷰션 하기 dev festx korea 2012 presentation
 

Último

A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 

Último (20)

A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

gcma: guaranteed contiguous memory allocator

  • 1. GCMA: Guaranteed Contiguous Memory Allocator SeongJae Park1 , Minchan Kim2 , Heon Y. Yeom1 Seoul National University1 LG Electronics2
  • 2. This work by SeongJae Park is licensed under the Creative Commons Attribution-ShareAlike 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/.
  • 3. These slides were presented during EWiLi 2015 at Amsterdam (http://syst.univ-brest.fr/ewili2015/)
  • 4. TL; DR ● Contiguous Memory Allocation is a big problem of embedded system ● Current solutions have weakness ○ Too expensive for embedded system, or ○ Waste memory space, or ○ Too slow and even fail to allocate ● GCMA, our new solution, guarantees ○ Fast latency of contiguous memory allocation ○ Success of contiguous memory allocation ○ and Reasonable memory utilization ● That’s it! ;)
  • 5. Who Wants Physically Contiguous Memory? ● Virtual Memory avoids the fragmentation problem ○ Applications use virtual address; virtual address can be mapped to any physical address ○ MMU do mapping of virtual address to physical address ● In Linux, it uses demand paging and reclaim Physical Memory Application MMUCPU Logical address Physical address
  • 6. We Need Physically Contiguous Memory! ● Devices using DMA ○ Lots of devices need large, contiguous memory for internal buffer (e.g., 12 Mega-Pixel Camera needs large, contiguous buffer) ○ Because MMU is behind the CPU, devices using DMA cannot use virtual address space ○ Slow latency or failure of allocation can harm user experience or life (What if your camera does not react while a kitty go away) Physical Memory Application MMUCPU Logical address Physical address Device DMA Physical address
  • 7. We Need Physically Contiguous Memory! ● Devices using DMA ○ Lots of devices need large, contiguous memory for internal buffer (e.g., 12 Mega-Pixel Camera needs large, contiguous buffer) ○ Because MMU is behind the CPU, devices using DMA can not use virtual address space ○ Slow latency or failure of allocation can harm user experience or life (What if your camera does not react while a kitty go away) ● Huge page, System internal buffer ○ Huge pages can enhance system performance by minimizing TLB miss ○ System needs contiguous memory for internal usage e.g., struct page *alloc_pages(unsigned int flags, unsigned int order); NOTE: This research focuses on device case for embedded system, though
  • 8. H/W Solutions: IOMMU, Scatter / Gather DMA ● IOMMU serves as MMU for devices ● Scatter / Gather DMA let DMA to discontiguous memory area ● Consumes more power and money because it requires additional H/W ● Can be expensive for low-end embedded devices ● Useless for Huge page or system internal physically contiguous memory Physical Memory Application MMUCPU Logical address Physicals address Device DMA Device address IOMMU Physical address
  • 9. Reserved Area Technique ● Reserves sufficient amount of contiguous area at boot time and let only contiguous memory allocation to use the area ● Pros: Simple and effective for contiguous memory allocation ● Cons: System memory space utilization could be low if the contiguous memory allocation does not fully use the reserved area ○ Sacrifice 10% of your phone memory for a kitty that may or may not meet with? ● Widely adopted solution despite of the utilization problem System Memory (Narrowed by Reservation) Reserved Area Process Normal Allocation Contiguous Allocation
  • 10. CMA: Contiguous Memory Allocator ● Software-based solution in mainline Linux ● Generously extended version of Reserved area technique ○ Mainly focus on memory utilization ○ Let movable pages to use the reserved area ○ If contig mem alloc requires pages being used for movable pages, move those pages out of reserved area and give the vacant area to contig mem alloc ○ Solves memory utilization problem because general pages are movable ● In other words, CMA gives different priority to clients of the reserved area ○ Primary client: contiguous memory allocation ○ Secondary client: movable page allocation
  • 11. CMA: in Real-land ● Worst case latency for a photo shot using Camera of Raspberry Pi 2 under background workload is ● 1.6 Seconds when configured with Reserved area technique, ● 9.8 seconds when configured with CMA; Enough for a kitty to go away
  • 12. The Phantom Menace Unveiled: Latency of CMA ● Latency of CMA during a photo shot using Camera of Raspberry Pi 2 under background workload ● It’s clear that latency of Camera is bounded by CMA
  • 13. CMA: Why So Slow? ● In short, secondary client of CMA is not so nice as expected ○ In this context, ‘nice’ means as unix command ‘nice’ means ● Moving page is expensive task (copying, rmap control, LRU churn, …) ● If someone is holding the page, it could not be moved until the one releases the page (e.g., get_user_page()) ● Result: Unexpected long latency and even failure ● That’s why CMA is not adopted to many devices ● Raspberry Pi does not support CMA officially because of the problem
  • 14. GCMA: Guaranteed Contiguous Memory Allocator ● GCMA is a variant of CMA that guarantees ○ Fast latency and success of allocation ○ While keeping memory utilization as well ● Main idea is simple ○ Follow primary / secondary client idea of CMA to keep memory utilization ○ Select secondary client as only nice one unlike CMA ■ For fast latency, should be able to vacate the reserved area as soon as required without any task such as moving content ■ For success, should be out of kernel control ■ For memory utilization, most pages should be able to be secondary client
  • 15. Secondary Client of GCMA ● Last chance cache for ○ pages being swapped out and ○ clean pages being evicted from file cache ● Pages being swapped out to write-through swap device ○ Could be discarded immediately because the contents are written through to swap device ○ Kernel think it’s already swapped out ○ Most anonymous pages could be swapped out ● Clean pages being evicted from file cache ○ Could be discarded because the contents are in storage already ○ Kernel think it’s already evicted out ○ Lots of file-backed pages could be the case ● Additional pros 1: Already expected to not be accessed again soon ○ Discarding secondary client pages from GCAM will not affect system performance much ● Additional pros 2: Occurs with severe workload, only ○ In peaceful case, zero overhead at all
  • 16. GCMA: Workflow ● Reserve memory area in boot time ● If a page is swapped out or evicted from file cache, keep the page in reserved area ○ If system requires the page again, give it back from the reserved area ○ If contiguous memory allocation requires area being used by those pages, discard those pages and use the area for contiguous memory allocation System Memory GCMA Reserved Area Swap Device File Process Normal Allocation Contiguous Allocation Reclaim / Swap Get back if hit
  • 17. GCMA Implementation ● Used Cleancache / Frontswap interface for second class client of GCMA rather than inventing the wheel again ● DMEM: Discardable Memory ○ Works as a last chance cache for secondary client pages using GCMA reserved area ○ Manages Index table using hash table and RB-tree ○ Evicts pages in LRU scheme Reserved Area Dmem Logic GCMA Logic Dmem Interface CMA Interface CMA Logic Cleancache Frontswap
  • 18. GCMA Implementation: in Short ● Implemented on Linux v3.18 ● About 1,500 lines of code ● Ported, evaluated on CuBox and Raspberry Pi 2 ● Available under GPL v3 at: https://github.com/sjp38/linux. gcma/releases/tag/gcma/rfc/v2
  • 19. GCMA Optimization for Write-through Swap ● Write-through swap device could consume I/O resource ● GCMA recommends compressed memory block device as a swap device ○ Zram is a compressed memory block device in upstream Linux ○ Google also recommends zram as a swap device for low-memory Android devices
  • 20. Experimental Setup ● Device setup ○ Raspberry Pi 2 ○ ARM cortex-A7 900 MHz * 4 cores ○ 1 GiB LPDDR2 SDRAM ○ Class 10 SanDisk 16 GiB microSD card ● Configurations ○ Baseline: Linux rpi-v3.18.11 + 100 MiB swap + 256 MiB reserved area ○ CMA: Linux rpi-v3.18.11 + 100 MiB swap + 256 MiB CMA area ○ GCMA: Linux rpi-v3.18.11 + 100 MiB Zram swap + 256 MiB GCMA area ● Workloads ○ Contig Mem Alloc: Contiguous memory allocation using CMA or GCMA ○ Blogbench: Realistic file system benchmark ○ Camera shot: Repeated shot of picture using Raspberry Pi 2 Camera module with 10 seconds interval between each shot
  • 21. Contig Mem Alloc without Background Workload ● Average of 30 allocations ● GCMA shows 14.89x to 129.41x faster latency compared to CMA ● CMA even failed once for 32,768 contiguous pages allocation even though there is no background task
  • 22. 4MiB Contig Mem Alloc w/o Background Workload ● Latency of CMA is more predictable compared to GCMA
  • 23. 4MiB Contig Mem Allocation w/ Background Task ● Background workload: Blogbench ● CMA consumes 5 seconds (unacceptable!) for one allocation in bad case
  • 24. Camera Latency w/ Background Task ● Camera is a realistic, important workload of Raspberry Pi 2 ● Background task: Blogbench ● GCMA keeps latency as fast as Reserved area technique configuration
  • 25. Blogbench on CMA / GCMA ● Scores are normalized to scores of Baseline configuration ● ‘/ cam’ means camera workload as a background workload ● System under GCMA even outperforms CMA with realistic workload owing to its light overhead ● Allocation using CMA even degrades system performance because it should move out pages in reserved area
  • 26. Conclusion ● Contiguous memory allocation is still a big problem for embedded device ○ H/W solutions are too expensive ○ S/W solutions like CMA is not effective ○ Most system uses Reserved area technique despite of low memory utilization ● GCMA guarantees fast latency, success of allocation and reasonable memory utilization ○ Allocation latency of GCMA is as fast as Reserved area technique ○ System performance under GCMA is is similar or outperforms the performance under CMA