SlideShare una empresa de Scribd logo
1 de 27
Parallelism-Aware Memory Interference
Delay Analysis for COTS Multicore Systems
Heechul Yun +, Rodolfo Pellizzoni*, Prathap Kumar Valsan+
+University of Kansas
*University of Waterloo
1
High-Performance Multicores for
Embedded Real-Time Systems
• Why?
– Intelligence  more performance
– Space, weight, power (SWaP), cost
2
Challenge: Shared Memory Hierarchy
3
Core1 Core2 Core3 Core4
Memory Controller (MC)
Shared Cache
• Hardware resources are contented among the cores
• Tasks can suffer significant memory interference
delays
DRAM
Memory Interference Delay
• Can be extremely high in the worst-case
4
8.0
33.5
45.8
0
5
10
15
20
25
30
35
40
45
50
ARM
Cortex A15
Intel
Nahelem
Intel
Haswell
solo
+ 1 co-runner
+ 2 co-runners
+ 3 co-runners
Normalizedexecutiontime
45.8X
slowdown
DRAM
LLC
Core1 Core2 Core3 Core4
bench co-runner(s)
banks
Modeling Memory Interference
• Common (false) assumptions on COTS systems
– A single resource
• Reality: multiple parallel resources (banks)
– A constant memory service cost
• Reality: it varies depending on the DRAM bank state
– Round-robin arbitration, in-order processing
• Reality: FR-FCFS can re-order requests
– Both read and write requests are treated equally
• Reality: writes are buffered and processed opportunistically
– One outstanding request per core
• Reality: an out-of-order core can generate parallel reqs.
5
Addressed in This Work
Addressed in [Kim’14]
[Kim’14] H. Kim, D. de Niz, B. Andersson, M. Klein, O. Mutlu, and R. R.Rajkumar. “Bounding Memory Interference
Delay in COTS-based Multi-Core Systems,” RTAS’14
Our Approach
• Realistic memory interference model for COTS systems
– Memory-level parallelism (MLP) in COTS architecture
– Write-buffering and opportunistic batch processing in MC
• DRAM bank-partitioning
– Reduce interference
• Delay analysis
– Compute worst-case memory interference delay of the
task under analysis
6
Outline
• Motivation
• Background
– COTS multicore architecture
– DRAM organization
– Memory controller
• Our approach
• Evaluation
• Conclusion
7
Memory-Level Parallelism (MLP)
• Broadly defined as the number of concurrent
memory requests that a given architecture
can handle at a time
8
Last Level Cache (LLC)
DRAM DIMM
Memory Controller (MC)
Core1 Core2 Core3 Core4
Request buffers
Read Write
Scheduler
MSHRs
CMD/ADDR DATA
COTS Multicore Architecture
Out-of-order core:
Multiple memory requests
Non-blocking caches:
Multiple cache-misses
MC and DRAM:
Multiple banks
Bank
4
Bank
3
Bank
2
Bank
1
MSHRs MSHRs MSHRs MSHRs
9
DRAM Organization
LLC
DRAM DIMM
Memory Controller (MC)
Bank
4
Bank
3
Bank
2
Bank
1
Core1 Core2 Core3 Core4
Mess
• intra-bank conflicts
• Inter-bank conflicts
10
DRAM Bank Partitioning
L3
DRAM DIMM
Memory Controller (MC)
Bank
4
Bank
3
Bank
2
Bank
1
Core1 Core2 Core3 Core4 • Private banking
– OS kernel allocates
pages from
dedicated banks for
each core
Eliminate
intra bank
conflicts
11
Bank Access Cost
Row 1
Row 2
Row 3
Row 4
Row 5
A DRAM Bank
Row Bufferactivate
precharge
Read/write
• State dependent access cost
– Row hit: fast
– Row miss: slow
READ (Bank 1, Row 3, Col 7)
Col7
12
Memory Controller
13
Read request
buffer
Write request
buffer
Bank 1
scheduler
Channel scheduler
Bank 2
scheduler
Bank N
scheduler
DRAM chip
Memory requests from cores
Writes are buffered and processed opportunistically
“Intelligent” Read/Write Switching
• Intuition
– Writes are not in the critical path. So buffer and
process them opportunistically
• Algorithm [Hansson’14]
– If there are reads, process them unless the write
buffer is almost full (high watermark)
– If there’s no reads and there is enough buffered
writes (low watermark), process the writes until
reads arrive
14
[Hansson’14] Hansson et al., “Simulating DRAM controllers for future syste
m architecture exploration,” ISPASS’14
FR-FCFS Scheduling[Rixner’00]
• Priority order
1. Row hit request
2. Older request
• Maximize memory throughput
15
Bank 1
scheduler
Channel scheduler
Bank 2
scheduler
Bank N
scheduler
[Rixner’00] S. Rixner, W. J. Dally, U. J. Kapasi, P. Mattson, and J. Owens. Memo
ry access scheduling. ACM SIGARCH Computer Architecture News. 2000
Outline
• Motivation
• Background
• Our approach
– System model
– Delay analysis
• Evaluation
• Conclusion
16
System Model
• Task
– Solo execution time C
– Memory demand (#of LLC misses): H
• Core
– Can generate multiple, but bounded, parallel requests
• Upper-bounded by L1 cache’s MSHR size
• Cache (LLC)
– Assume no cache-level interference
• Core-private or partitioned LLC
• No MSHR contention
• DRAM controller
– Efficient FR-FCFS scheduler, open-page pollicy
– Separate read and write request buffer
• Watermark scheme on processing writes
17
Delay Analysis
• Goal
– Compute the worst-case memory interference delay of a
task under analysis
• Request driven analysis
– Based on the task’s own memory demand: H
– Compute worst-case per request delay: RD
– Memory interference delay = RD x H
• Job driven analysis
– Based on the other tasks’ memory requests over time
– See paper
18
Key Intuition #1
• The #of competing requests Nrq is bounded
– Because the # of per-core parallel requests is bounded.
– Example
• Cortex-A15’s per-core bound = 6
• Nrq = 6 x 3 (cores) = 18
19
Key Intuition #2
• DRAM sub-commands of the competing
memory requests are overlapped
– Much less pessimistic than [Kim’14], which simply
sums up each sub-command’s maximum delay
– See paper for the proof
20
Key Intuition #3
• The worst-case delay happens when
– The read buffer has Nrq requests
– And the write request buffer just becomes full
• Start a write batch
– Then the read request under analysis arrives
21
RD = read batch delay + write batch delay
Outline
• Motivation
• Background
• Our approach
• Evaluation
• Conclusion
22
Evaluation Setup
• Gem5 simulator
– 4 out-of-order cores (based on Cortex-A15)
• L2 MSHR size is increased to eliminate MSHR contention
– DRAM controller model [Hansson’14]
– LPDDR2 @ 533Mhz
• Linux 3.14
– Use PALLOC[Yun’14] to partition
DRAM banks and LLC
• Workload
– Subject: Latency, SPEC2006
– Co-runners: Bandwidth (write)
23
DRAM
LLC
Core1 Core2 Core3 Core4
subject co-runner(s)
Results with the Latency benchmark
• Ours(ideal): Read only delay analysis (ignore writes)
• Ours(opt): assume writes are balanced over multiple banks
• Ours(worst): all writes are targeting one bank & all row misses
24
0
5
10
15
20
25
30
Measured [Kim'14] Ours(ideal) Ours(opt) Ours(worst)
NormalizedResponseTime
underestimate
overestimate
[Kim’14] H. Kim, D. de Niz, B. Andersson, M. Klein, O. Mutlu, and R. R.Rajkumar. “Bounding Memory Interference
Delay in COTS-based Multi-Core Systems,” RTAS’14
Results with SPEC2006 Benchmarks
• Main source of pessimism:
– The pathological case of write (LLC write-backs) processing
25
Conclusion
• Memory interference delay on COTS multicore
– Existing analysis methods rely on strong assumptions
• Our approach
– A realistic model of COTS memory system
• Parallel memory requests
• Read prioritization and opportunistic write processing
– Request and job-driven delay analysis methods
• Pessimistic but still can be useful for low memory intensive tasks
• Future work
– Reduce pessimism in the analysis
26
Thank You
27

Más contenido relacionado

La actualidad más candente

Lecture 6.1
Lecture  6.1Lecture  6.1
Lecture 6.1Mr SMAK
 
Coherence and consistency models in multiprocessor architecture
Coherence and consistency models in multiprocessor architectureCoherence and consistency models in multiprocessor architecture
Coherence and consistency models in multiprocessor architectureUniversity of Pisa
 
Erasing Belady's Limitations: In Search of Flash Cache Offline Optimality
Erasing Belady's Limitations: In Search of Flash Cache Offline OptimalityErasing Belady's Limitations: In Search of Flash Cache Offline Optimality
Erasing Belady's Limitations: In Search of Flash Cache Offline OptimalityYue Cheng
 
High Performance Storage Devices in the Linux Kernel
High Performance Storage Devices in the Linux KernelHigh Performance Storage Devices in the Linux Kernel
High Performance Storage Devices in the Linux KernelKernel TLV
 
Warehouse scale computer
Warehouse scale computerWarehouse scale computer
Warehouse scale computerHassan A-j
 
Spectrum Scale Best Practices by Olaf Weiser
Spectrum Scale Best Practices by Olaf WeiserSpectrum Scale Best Practices by Olaf Weiser
Spectrum Scale Best Practices by Olaf WeiserSandeep Patil
 
Chip Multithreading Systems Need a New Operating System Scheduler
Chip Multithreading Systems Need a New Operating System Scheduler Chip Multithreading Systems Need a New Operating System Scheduler
Chip Multithreading Systems Need a New Operating System Scheduler Sarwan ali
 
Cache coherence
Cache coherenceCache coherence
Cache coherenceEmployee
 
asap2013-khoa-presentation
asap2013-khoa-presentationasap2013-khoa-presentation
asap2013-khoa-presentationAbhishek Jain
 
GCMA: Guaranteed Contiguous Memory Allocator
GCMA: Guaranteed Contiguous Memory AllocatorGCMA: Guaranteed Contiguous Memory Allocator
GCMA: Guaranteed Contiguous Memory AllocatorSeongJae Park
 
Linux Kernel Memory Model
Linux Kernel Memory ModelLinux Kernel Memory Model
Linux Kernel Memory ModelSeongJae Park
 
Compositional Analysis for the Multi-Resource Server
Compositional Analysis for the Multi-Resource ServerCompositional Analysis for the Multi-Resource Server
Compositional Analysis for the Multi-Resource ServerEricsson
 
Revisiting CephFS MDS and mClock QoS Scheduler
Revisiting CephFS MDS and mClock QoS SchedulerRevisiting CephFS MDS and mClock QoS Scheduler
Revisiting CephFS MDS and mClock QoS SchedulerYongseok Oh
 

La actualidad más candente (19)

Lecture 6.1
Lecture  6.1Lecture  6.1
Lecture 6.1
 
Cache coherence
Cache coherenceCache coherence
Cache coherence
 
Coherence and consistency models in multiprocessor architecture
Coherence and consistency models in multiprocessor architectureCoherence and consistency models in multiprocessor architecture
Coherence and consistency models in multiprocessor architecture
 
Erasing Belady's Limitations: In Search of Flash Cache Offline Optimality
Erasing Belady's Limitations: In Search of Flash Cache Offline OptimalityErasing Belady's Limitations: In Search of Flash Cache Offline Optimality
Erasing Belady's Limitations: In Search of Flash Cache Offline Optimality
 
High Performance Storage Devices in the Linux Kernel
High Performance Storage Devices in the Linux KernelHigh Performance Storage Devices in the Linux Kernel
High Performance Storage Devices in the Linux Kernel
 
Warehouse scale computer
Warehouse scale computerWarehouse scale computer
Warehouse scale computer
 
Spectrum Scale Best Practices by Olaf Weiser
Spectrum Scale Best Practices by Olaf WeiserSpectrum Scale Best Practices by Olaf Weiser
Spectrum Scale Best Practices by Olaf Weiser
 
Chip Multithreading Systems Need a New Operating System Scheduler
Chip Multithreading Systems Need a New Operating System Scheduler Chip Multithreading Systems Need a New Operating System Scheduler
Chip Multithreading Systems Need a New Operating System Scheduler
 
Cache coherence
Cache coherenceCache coherence
Cache coherence
 
RedHat Cluster!
RedHat Cluster!RedHat Cluster!
RedHat Cluster!
 
asap2013-khoa-presentation
asap2013-khoa-presentationasap2013-khoa-presentation
asap2013-khoa-presentation
 
Unified Hardware Abstraction Layer with Device Masquerade
Unified Hardware Abstraction Layer with Device MasqueradeUnified Hardware Abstraction Layer with Device Masquerade
Unified Hardware Abstraction Layer with Device Masquerade
 
GCMA: Guaranteed Contiguous Memory Allocator
GCMA: Guaranteed Contiguous Memory AllocatorGCMA: Guaranteed Contiguous Memory Allocator
GCMA: Guaranteed Contiguous Memory Allocator
 
Chapter 2 pc
Chapter 2 pcChapter 2 pc
Chapter 2 pc
 
Linux Kernel Memory Model
Linux Kernel Memory ModelLinux Kernel Memory Model
Linux Kernel Memory Model
 
Memory (Computer Organization)
Memory (Computer Organization)Memory (Computer Organization)
Memory (Computer Organization)
 
Compositional Analysis for the Multi-Resource Server
Compositional Analysis for the Multi-Resource ServerCompositional Analysis for the Multi-Resource Server
Compositional Analysis for the Multi-Resource Server
 
Revisiting CephFS MDS and mClock QoS Scheduler
Revisiting CephFS MDS and mClock QoS SchedulerRevisiting CephFS MDS and mClock QoS Scheduler
Revisiting CephFS MDS and mClock QoS Scheduler
 
Snooping 2
Snooping 2Snooping 2
Snooping 2
 

Destacado

Parallelism aware batch scheduling
Parallelism aware batch schedulingParallelism aware batch scheduling
Parallelism aware batch schedulingaifayed
 
Cache-partitioning
Cache-partitioningCache-partitioning
Cache-partitioningdavidkftam
 
Practical real-time operating system security for the masses
Practical real-time operating system security for the massesPractical real-time operating system security for the masses
Practical real-time operating system security for the massesMilosch Meriac
 
Resilient IoT Security: The end of flat security models
Resilient IoT Security: The end of flat security modelsResilient IoT Security: The end of flat security models
Resilient IoT Security: The end of flat security modelsMilosch Meriac
 
NoC simulators presentation
NoC simulators presentationNoC simulators presentation
NoC simulators presentationHossam Hassan
 
Noc ajal final
Noc ajal  finalNoc ajal  final
Noc ajal finalAJAL A J
 

Destacado (7)

Parallelism aware batch scheduling
Parallelism aware batch schedulingParallelism aware batch scheduling
Parallelism aware batch scheduling
 
Cache-partitioning
Cache-partitioningCache-partitioning
Cache-partitioning
 
RapidMRC
RapidMRCRapidMRC
RapidMRC
 
Practical real-time operating system security for the masses
Practical real-time operating system security for the massesPractical real-time operating system security for the masses
Practical real-time operating system security for the masses
 
Resilient IoT Security: The end of flat security models
Resilient IoT Security: The end of flat security modelsResilient IoT Security: The end of flat security models
Resilient IoT Security: The end of flat security models
 
NoC simulators presentation
NoC simulators presentationNoC simulators presentation
NoC simulators presentation
 
Noc ajal final
Noc ajal  finalNoc ajal  final
Noc ajal final
 

Similar a Parallelism-Aware Memory Interference Delay Analysis for Multicore Systems

Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems
Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time SystemsTaming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems
Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time SystemsHeechul Yun
 
Deterministic Memory Abstraction and Supporting Multicore System Architecture
Deterministic Memory Abstraction and Supporting Multicore System ArchitectureDeterministic Memory Abstraction and Supporting Multicore System Architecture
Deterministic Memory Abstraction and Supporting Multicore System ArchitectureHeechul Yun
 
Memory Hierarchy PPT of Computer Organization
Memory Hierarchy PPT of Computer OrganizationMemory Hierarchy PPT of Computer Organization
Memory Hierarchy PPT of Computer Organization2022002857mbit
 
SOC-CH4.pptSOC Processors Used in SOCSOC Processors Used in SOC
SOC-CH4.pptSOC Processors Used in SOCSOC Processors Used in SOCSOC-CH4.pptSOC Processors Used in SOCSOC Processors Used in SOC
SOC-CH4.pptSOC Processors Used in SOCSOC Processors Used in SOCSnehaLatha68
 
CPU Caches - Jamie Allen
CPU Caches - Jamie AllenCPU Caches - Jamie Allen
CPU Caches - Jamie Allenjaxconf
 
Ct213 memory subsystem
Ct213 memory subsystemCt213 memory subsystem
Ct213 memory subsystemSandeep Kamath
 
chap 18 multicore computers
chap 18 multicore computers chap 18 multicore computers
chap 18 multicore computers Sher Shah Merkhel
 
Basic Computer Architecture
Basic Computer ArchitectureBasic Computer Architecture
Basic Computer ArchitectureYong Heui Cho
 
Computer Memory Hierarchy Computer Architecture
Computer Memory Hierarchy Computer ArchitectureComputer Memory Hierarchy Computer Architecture
Computer Memory Hierarchy Computer ArchitectureHaris456
 
COMPUTER ARCHITECTURE-2.pptx
COMPUTER ARCHITECTURE-2.pptxCOMPUTER ARCHITECTURE-2.pptx
COMPUTER ARCHITECTURE-2.pptxssusere16bd9
 
Modeling, estimating, and predicting Ceph (Linux Foundation - Vault 2015)
Modeling, estimating, and predicting Ceph (Linux Foundation - Vault 2015)Modeling, estimating, and predicting Ceph (Linux Foundation - Vault 2015)
Modeling, estimating, and predicting Ceph (Linux Foundation - Vault 2015)Lars Marowsky-Brée
 
Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters
Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters
Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters Ceph Community
 
lecture asdkvakm;bk;dv;advvAVHD;KASV;DVKHSVDK
lecture asdkvakm;bk;dv;advvAVHD;KASV;DVKHSVDKlecture asdkvakm;bk;dv;advvAVHD;KASV;DVKHSVDK
lecture asdkvakm;bk;dv;advvAVHD;KASV;DVKHSVDKofficeaiotfab
 
Spectrum Scale Memory Usage
Spectrum Scale Memory UsageSpectrum Scale Memory Usage
Spectrum Scale Memory UsageTomer Perry
 

Similar a Parallelism-Aware Memory Interference Delay Analysis for Multicore Systems (20)

Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems
Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time SystemsTaming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems
Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems
 
Deterministic Memory Abstraction and Supporting Multicore System Architecture
Deterministic Memory Abstraction and Supporting Multicore System ArchitectureDeterministic Memory Abstraction and Supporting Multicore System Architecture
Deterministic Memory Abstraction and Supporting Multicore System Architecture
 
Memory Hierarchy PPT of Computer Organization
Memory Hierarchy PPT of Computer OrganizationMemory Hierarchy PPT of Computer Organization
Memory Hierarchy PPT of Computer Organization
 
Memory
MemoryMemory
Memory
 
SOC-CH4.pptSOC Processors Used in SOCSOC Processors Used in SOC
SOC-CH4.pptSOC Processors Used in SOCSOC Processors Used in SOCSOC-CH4.pptSOC Processors Used in SOCSOC Processors Used in SOC
SOC-CH4.pptSOC Processors Used in SOCSOC Processors Used in SOC
 
CPU Caches
CPU CachesCPU Caches
CPU Caches
 
CPU Caches - Jamie Allen
CPU Caches - Jamie AllenCPU Caches - Jamie Allen
CPU Caches - Jamie Allen
 
Memory Organization
Memory OrganizationMemory Organization
Memory Organization
 
Ct213 memory subsystem
Ct213 memory subsystemCt213 memory subsystem
Ct213 memory subsystem
 
Caches microP
Caches microPCaches microP
Caches microP
 
internal_memory
internal_memoryinternal_memory
internal_memory
 
chap 18 multicore computers
chap 18 multicore computers chap 18 multicore computers
chap 18 multicore computers
 
Basic Computer Architecture
Basic Computer ArchitectureBasic Computer Architecture
Basic Computer Architecture
 
Computer Memory Hierarchy Computer Architecture
Computer Memory Hierarchy Computer ArchitectureComputer Memory Hierarchy Computer Architecture
Computer Memory Hierarchy Computer Architecture
 
COMPUTER ARCHITECTURE-2.pptx
COMPUTER ARCHITECTURE-2.pptxCOMPUTER ARCHITECTURE-2.pptx
COMPUTER ARCHITECTURE-2.pptx
 
Modeling, estimating, and predicting Ceph (Linux Foundation - Vault 2015)
Modeling, estimating, and predicting Ceph (Linux Foundation - Vault 2015)Modeling, estimating, and predicting Ceph (Linux Foundation - Vault 2015)
Modeling, estimating, and predicting Ceph (Linux Foundation - Vault 2015)
 
Lect18
Lect18Lect18
Lect18
 
Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters
Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters
Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters
 
lecture asdkvakm;bk;dv;advvAVHD;KASV;DVKHSVDK
lecture asdkvakm;bk;dv;advvAVHD;KASV;DVKHSVDKlecture asdkvakm;bk;dv;advvAVHD;KASV;DVKHSVDK
lecture asdkvakm;bk;dv;advvAVHD;KASV;DVKHSVDK
 
Spectrum Scale Memory Usage
Spectrum Scale Memory UsageSpectrum Scale Memory Usage
Spectrum Scale Memory Usage
 

Último

Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 

Último (20)

Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 

Parallelism-Aware Memory Interference Delay Analysis for Multicore Systems

  • 1. Parallelism-Aware Memory Interference Delay Analysis for COTS Multicore Systems Heechul Yun +, Rodolfo Pellizzoni*, Prathap Kumar Valsan+ +University of Kansas *University of Waterloo 1
  • 2. High-Performance Multicores for Embedded Real-Time Systems • Why? – Intelligence  more performance – Space, weight, power (SWaP), cost 2
  • 3. Challenge: Shared Memory Hierarchy 3 Core1 Core2 Core3 Core4 Memory Controller (MC) Shared Cache • Hardware resources are contented among the cores • Tasks can suffer significant memory interference delays DRAM
  • 4. Memory Interference Delay • Can be extremely high in the worst-case 4 8.0 33.5 45.8 0 5 10 15 20 25 30 35 40 45 50 ARM Cortex A15 Intel Nahelem Intel Haswell solo + 1 co-runner + 2 co-runners + 3 co-runners Normalizedexecutiontime 45.8X slowdown DRAM LLC Core1 Core2 Core3 Core4 bench co-runner(s) banks
  • 5. Modeling Memory Interference • Common (false) assumptions on COTS systems – A single resource • Reality: multiple parallel resources (banks) – A constant memory service cost • Reality: it varies depending on the DRAM bank state – Round-robin arbitration, in-order processing • Reality: FR-FCFS can re-order requests – Both read and write requests are treated equally • Reality: writes are buffered and processed opportunistically – One outstanding request per core • Reality: an out-of-order core can generate parallel reqs. 5 Addressed in This Work Addressed in [Kim’14] [Kim’14] H. Kim, D. de Niz, B. Andersson, M. Klein, O. Mutlu, and R. R.Rajkumar. “Bounding Memory Interference Delay in COTS-based Multi-Core Systems,” RTAS’14
  • 6. Our Approach • Realistic memory interference model for COTS systems – Memory-level parallelism (MLP) in COTS architecture – Write-buffering and opportunistic batch processing in MC • DRAM bank-partitioning – Reduce interference • Delay analysis – Compute worst-case memory interference delay of the task under analysis 6
  • 7. Outline • Motivation • Background – COTS multicore architecture – DRAM organization – Memory controller • Our approach • Evaluation • Conclusion 7
  • 8. Memory-Level Parallelism (MLP) • Broadly defined as the number of concurrent memory requests that a given architecture can handle at a time 8
  • 9. Last Level Cache (LLC) DRAM DIMM Memory Controller (MC) Core1 Core2 Core3 Core4 Request buffers Read Write Scheduler MSHRs CMD/ADDR DATA COTS Multicore Architecture Out-of-order core: Multiple memory requests Non-blocking caches: Multiple cache-misses MC and DRAM: Multiple banks Bank 4 Bank 3 Bank 2 Bank 1 MSHRs MSHRs MSHRs MSHRs 9
  • 10. DRAM Organization LLC DRAM DIMM Memory Controller (MC) Bank 4 Bank 3 Bank 2 Bank 1 Core1 Core2 Core3 Core4 Mess • intra-bank conflicts • Inter-bank conflicts 10
  • 11. DRAM Bank Partitioning L3 DRAM DIMM Memory Controller (MC) Bank 4 Bank 3 Bank 2 Bank 1 Core1 Core2 Core3 Core4 • Private banking – OS kernel allocates pages from dedicated banks for each core Eliminate intra bank conflicts 11
  • 12. Bank Access Cost Row 1 Row 2 Row 3 Row 4 Row 5 A DRAM Bank Row Bufferactivate precharge Read/write • State dependent access cost – Row hit: fast – Row miss: slow READ (Bank 1, Row 3, Col 7) Col7 12
  • 13. Memory Controller 13 Read request buffer Write request buffer Bank 1 scheduler Channel scheduler Bank 2 scheduler Bank N scheduler DRAM chip Memory requests from cores Writes are buffered and processed opportunistically
  • 14. “Intelligent” Read/Write Switching • Intuition – Writes are not in the critical path. So buffer and process them opportunistically • Algorithm [Hansson’14] – If there are reads, process them unless the write buffer is almost full (high watermark) – If there’s no reads and there is enough buffered writes (low watermark), process the writes until reads arrive 14 [Hansson’14] Hansson et al., “Simulating DRAM controllers for future syste m architecture exploration,” ISPASS’14
  • 15. FR-FCFS Scheduling[Rixner’00] • Priority order 1. Row hit request 2. Older request • Maximize memory throughput 15 Bank 1 scheduler Channel scheduler Bank 2 scheduler Bank N scheduler [Rixner’00] S. Rixner, W. J. Dally, U. J. Kapasi, P. Mattson, and J. Owens. Memo ry access scheduling. ACM SIGARCH Computer Architecture News. 2000
  • 16. Outline • Motivation • Background • Our approach – System model – Delay analysis • Evaluation • Conclusion 16
  • 17. System Model • Task – Solo execution time C – Memory demand (#of LLC misses): H • Core – Can generate multiple, but bounded, parallel requests • Upper-bounded by L1 cache’s MSHR size • Cache (LLC) – Assume no cache-level interference • Core-private or partitioned LLC • No MSHR contention • DRAM controller – Efficient FR-FCFS scheduler, open-page pollicy – Separate read and write request buffer • Watermark scheme on processing writes 17
  • 18. Delay Analysis • Goal – Compute the worst-case memory interference delay of a task under analysis • Request driven analysis – Based on the task’s own memory demand: H – Compute worst-case per request delay: RD – Memory interference delay = RD x H • Job driven analysis – Based on the other tasks’ memory requests over time – See paper 18
  • 19. Key Intuition #1 • The #of competing requests Nrq is bounded – Because the # of per-core parallel requests is bounded. – Example • Cortex-A15’s per-core bound = 6 • Nrq = 6 x 3 (cores) = 18 19
  • 20. Key Intuition #2 • DRAM sub-commands of the competing memory requests are overlapped – Much less pessimistic than [Kim’14], which simply sums up each sub-command’s maximum delay – See paper for the proof 20
  • 21. Key Intuition #3 • The worst-case delay happens when – The read buffer has Nrq requests – And the write request buffer just becomes full • Start a write batch – Then the read request under analysis arrives 21 RD = read batch delay + write batch delay
  • 22. Outline • Motivation • Background • Our approach • Evaluation • Conclusion 22
  • 23. Evaluation Setup • Gem5 simulator – 4 out-of-order cores (based on Cortex-A15) • L2 MSHR size is increased to eliminate MSHR contention – DRAM controller model [Hansson’14] – LPDDR2 @ 533Mhz • Linux 3.14 – Use PALLOC[Yun’14] to partition DRAM banks and LLC • Workload – Subject: Latency, SPEC2006 – Co-runners: Bandwidth (write) 23 DRAM LLC Core1 Core2 Core3 Core4 subject co-runner(s)
  • 24. Results with the Latency benchmark • Ours(ideal): Read only delay analysis (ignore writes) • Ours(opt): assume writes are balanced over multiple banks • Ours(worst): all writes are targeting one bank & all row misses 24 0 5 10 15 20 25 30 Measured [Kim'14] Ours(ideal) Ours(opt) Ours(worst) NormalizedResponseTime underestimate overestimate [Kim’14] H. Kim, D. de Niz, B. Andersson, M. Klein, O. Mutlu, and R. R.Rajkumar. “Bounding Memory Interference Delay in COTS-based Multi-Core Systems,” RTAS’14
  • 25. Results with SPEC2006 Benchmarks • Main source of pessimism: – The pathological case of write (LLC write-backs) processing 25
  • 26. Conclusion • Memory interference delay on COTS multicore – Existing analysis methods rely on strong assumptions • Our approach – A realistic model of COTS memory system • Parallel memory requests • Read prioritization and opportunistic write processing – Request and job-driven delay analysis methods • Pessimistic but still can be useful for low memory intensive tasks • Future work – Reduce pessimism in the analysis 26