SlideShare una empresa de Scribd logo
1 de 35
System-wide Energy Optimization for Multiple DVS Components and Real-time Tasks HeechulYun, Po-Liang Wu, AnshuArya, TarekAbdelzaher, Cheolgi Kim, and LuiSha
DVS in Real-time Systems The Goal To minimize energy consumption by adjusting freq. and voltage but still meet the deadline Most consider CPU only  Assume execution time depends on CPU freq.  But memory and bus are also important Affect execution time (e.g., memory intensive app will be slowed if memory or bus is slow.) Consume considerable energy (similar order of energy compared to CPU) Are DVS capable in many recent embedded processors 2
Motivation 3 Memxfer5b : memory benchmark program Half of CPU clock Exec. time increased only 3% Energy saved 30%
Motivation 4 Dhrystone: CPU benchmark program Half of Mem clock Exec time increased only 0.05% Energy saved 10%
Contents Motivation  Energy Model Considers CPU, BUS and Memory  and task characteristics Evaluation (Model validation) Energy Optimization of Real-time Tasks Static multi-DVS problem and solution Evaluation  Conclusion 5
Task Model 6 computation memory fetch (cache stall) power power Computation Memory fetch time time Task = Computation + Memory fetch
Task Model (2) power Lower CPU freq M power C time C M time power C : computation M : off-chip memory fetch       (cache-stall cycles) C Lower MEM freq M time 7
Task Model (3) Execution time of a task C : CPU cycles of a given task M : memory cycles of a given task  fc : CPU clock frequency fm : Memory clock frequency   8
Power Model Power of a component (i.e., CPU) k : capacitance constant f : frequency of the component  V : supplying voltage  R : leakage power  9 Different k for different modes: kactive - active mode capacitance  kstandby- standby mode capacitance
Energy Model 10 power Pure  Computation Memory  Fetch (Cache stall) idle time P (Period) e(exec. time) Total system energy is
Pure Computation Block 11 power CPU active Memory  Fetch (Cache stall) Bus, memstandby idle System static time e P kca : capacitance constant for activecpu kbs : capacitance constant for standby bus  kms : capacitance constant for standby memory R : system wide static power consumption
Memory Fetch Block 12 power Pure Computation CPU standby Bus, memactive idle System static time e P kcs : capacitance constant for standbycpu kba : capacitance constant for active bus  kma : capacitance constant for active memory
Idle Block 13 power Pure Computation Memory  Fetch (Cache stall) CPU, bus, mem idle System static time e P I : idle mode power consumption.  e: execution time  (C/fc + M/fm )
Energy Model Summary 14 power Ecpu Emem Eidle pure exec block MEM fetch block idle block  CPU active CPU standby Memory  Fetch Dynamic  power Bus, memactive Bus, memstandby CPU, bus, mem idle idle System static time e P System wide energy model Considers CPU, bus, and memory power consumption  Considers active, standby and idle modes Other components are assumed to be static (included in R)
Energy Equation 15 CPU block Memory block Idle block System-wide  energy consumption of a task during period P
Power supply ARM926 PSRAM (256KB) 8K-I 8K-D System bus STMP3650 SoC External peripherals  (flash, LCD, External DRAM, …)   BOARD 16 Evaluation Platform Multi-meter
Evaluation Platform (2) ARM9 based SoC CPU : up to 200Mhz, BUS : up to 100Mhz  CPU and BUS are synchronous (BUS = CPU/N) Memory (PSRAM) freq is equal to system bus frequency  (fb=fm) CPU, BUS, and memory all share the common voltage Vdd : 1.504V ~ 1.804V (0.32V step) Energy equation V : shared voltage for CPU, bus, and memory         : active bus and memory  constant       : standby bus and memory  constant 17
Validation Methodology 4 synthetic programs with different cache stall ratio (0%, 10%, 25%, 55%)  8 clock configurations (fc, fm) for each program Performed nonlinear least square analysis for total 32 data points against the energy equation 18
Energy Model Fitting 19 Coefficient of determination(R2) is 99.97% (100% is a perfect fit)
Energy Equation for Our Platform 20 Obtained coefficients in the energy equation
Contents Motivation  Energy Model Considers CPU, BUS and Memory  and task characteristics Evaluation Energy Optimization of Real-time Tasks Static Multi-DVS Problem and optimal solution  Evaluation Conclusion 21
Static Multi-DVS Problem Given a set of periodic real-time tasks (T1, …,Tn), where each task invocation requires up to Ci CPU cycles and up to Mi memory cycles at worst.  Find the energy optimal static frequencies for multiple DVS capable components (CPU, bus, and memory) 22
Problem Formulation Minimize Subjects to where 23 H : hyper period ei : execution time of task i Ecomp,i: computation block energy of task i Emem,i: cache stall block energy of task i Eidle: idle block energy
Optimal Solution Intuitive procedure   Find an unconstrained minimal over fc and fm (fb= fm) Check boundary conditions due to system specific constraints. (e.g., minimum and maximum clock range) Details are in the paper 24
Energy Plot 25 Blue : less energy Red : more energy fm(MHz) Deadlineboundary fc(MHz) Task set : CH = 140*106, MH = 30*106 ,H = 3s
Evaluation Compare the following schemes:  MAX CPU and memory are all set to maximum. CPU-only static DVS Memory frequency is set to maximum Baseline static multi-DVS CPU and memory frequencies change proportionally  Optimal static multi-DVS  Proposed scheme Optimal dynamic multi-DVS Can change frequencies at each task schedule Brute force search among all the possible combination  Simulation setup  Use energy equation obtained from measurements on our real hardware platform 26
Energy vs Utilization 27 Normalized average power consumption utilization Task set cache stall ratio (MH/(CH+MH) ):0.3
Energy vs Cache Stall Ratio 28 Normalized average power consumption Cache stall ratio Task set utilization ratio(eH/H):0.5
Effect of Diversity of Cache Stall Ratio 29 Normalized energy consumption diversity homogeneous diverse Task set cache stall ratio = 0.45,  Task set utilization ratio(eH/H):0.5
Conclusion Energy model  Considers multiple DVS capable components and task characteristic Validated on a real hardware platform Static multi-DVS problem  Assigns energy optimal static frequencies of multiple DVS components for periodic real-time tasks  Optimal solution (static multi-DVS scheme) shows better energy saving compared to CPU-only DVS 30
Thank you. 31
Additional Slides 32
CPU-only DVS 33 Valid range (~200Mhz) Energy (mJ) fc (Mhz) Not effective in allowed range (*) based on energy equation for out h/w platform.  Memory clock was set to max
Power Distribution 34 Cache stall ratio = 55% (cpu,bus)=(80,80Mhz) Cache stall ratio = 10% (cpu,bus)=(80,80Mhz) (*) based on energy equation for our h/w platform  E = Ecpu + Emem + Estatic
Active and Idle 35 mJ mJ fc (Mhz) fm (Mhz) (*) actual measurement result

Más contenido relacionado

La actualidad más candente

Profit based unit commitment for GENCOs using Parallel PSO in a distributed c...
Profit based unit commitment for GENCOs using Parallel PSO in a distributed c...Profit based unit commitment for GENCOs using Parallel PSO in a distributed c...
Profit based unit commitment for GENCOs using Parallel PSO in a distributed c...IDES Editor
 
Test different neural networks models for forecasting of wind,solar and energ...
Test different neural networks models for forecasting of wind,solar and energ...Test different neural networks models for forecasting of wind,solar and energ...
Test different neural networks models for forecasting of wind,solar and energ...Tonmoy Ibne Arif
 
IJERD(www.ijerd.com)International Journal of Engineering Research and Develop...
IJERD(www.ijerd.com)International Journal of Engineering Research and Develop...IJERD(www.ijerd.com)International Journal of Engineering Research and Develop...
IJERD(www.ijerd.com)International Journal of Engineering Research and Develop...IJERD Editor
 
GTC Japan 2016 Chainer feature introduction
GTC Japan 2016 Chainer feature introductionGTC Japan 2016 Chainer feature introduction
GTC Japan 2016 Chainer feature introductionKenta Oono
 
LCU13: Power-efficient scheduling, and the latest news from the kernel summit
LCU13: Power-efficient scheduling, and the latest news from the kernel summitLCU13: Power-efficient scheduling, and the latest news from the kernel summit
LCU13: Power-efficient scheduling, and the latest news from the kernel summitLinaro
 
Study of using particle swarm for optimal power flow
Study of using particle swarm for optimal power flowStudy of using particle swarm for optimal power flow
Study of using particle swarm for optimal power flowMohamed Abuella
 
A Study on Task Scheduling in Could Data Centers for Energy Efficacy
A Study on Task Scheduling in Could Data Centers for Energy Efficacy A Study on Task Scheduling in Could Data Centers for Energy Efficacy
A Study on Task Scheduling in Could Data Centers for Energy Efficacy Ehsan Sharifi
 
HYDROTHERMAL COORDINATION FOR SHORT RANGE FIXED HEAD STATIONS USING FAST GENE...
HYDROTHERMAL COORDINATION FOR SHORT RANGE FIXED HEAD STATIONS USING FAST GENE...HYDROTHERMAL COORDINATION FOR SHORT RANGE FIXED HEAD STATIONS USING FAST GENE...
HYDROTHERMAL COORDINATION FOR SHORT RANGE FIXED HEAD STATIONS USING FAST GENE...ecij
 
Short term Multi Chain Hydrothermal Scheduling Using Modified Gravitational S...
Short term Multi Chain Hydrothermal Scheduling Using Modified Gravitational S...Short term Multi Chain Hydrothermal Scheduling Using Modified Gravitational S...
Short term Multi Chain Hydrothermal Scheduling Using Modified Gravitational S...IJARTES
 
SOME WORKLOAD SCHEDULING ALTERNATIVES 11.07.2013
SOME WORKLOAD SCHEDULING ALTERNATIVES 11.07.2013SOME WORKLOAD SCHEDULING ALTERNATIVES 11.07.2013
SOME WORKLOAD SCHEDULING ALTERNATIVES 11.07.2013James McGalliard
 
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecture
Revisiting Co-Processing for Hash Joins on the CoupledCpu-GPU ArchitectureRevisiting Co-Processing for Hash Joins on the CoupledCpu-GPU Architecture
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecturemohamedragabslideshare
 
Dynamic Economic Dispatch Assessment Using Particle Swarm Optimization Technique
Dynamic Economic Dispatch Assessment Using Particle Swarm Optimization TechniqueDynamic Economic Dispatch Assessment Using Particle Swarm Optimization Technique
Dynamic Economic Dispatch Assessment Using Particle Swarm Optimization TechniquejournalBEEI
 
Genetic Algorithm for Solving the Economic Load Dispatch
Genetic Algorithm for Solving the Economic Load DispatchGenetic Algorithm for Solving the Economic Load Dispatch
Genetic Algorithm for Solving the Economic Load DispatchSatyendra Singh
 

La actualidad más candente (20)

Profit based unit commitment for GENCOs using Parallel PSO in a distributed c...
Profit based unit commitment for GENCOs using Parallel PSO in a distributed c...Profit based unit commitment for GENCOs using Parallel PSO in a distributed c...
Profit based unit commitment for GENCOs using Parallel PSO in a distributed c...
 
Test different neural networks models for forecasting of wind,solar and energ...
Test different neural networks models for forecasting of wind,solar and energ...Test different neural networks models for forecasting of wind,solar and energ...
Test different neural networks models for forecasting of wind,solar and energ...
 
Economic dipatch
Economic dipatch Economic dipatch
Economic dipatch
 
IJERD(www.ijerd.com)International Journal of Engineering Research and Develop...
IJERD(www.ijerd.com)International Journal of Engineering Research and Develop...IJERD(www.ijerd.com)International Journal of Engineering Research and Develop...
IJERD(www.ijerd.com)International Journal of Engineering Research and Develop...
 
GTC Japan 2016 Chainer feature introduction
GTC Japan 2016 Chainer feature introductionGTC Japan 2016 Chainer feature introduction
GTC Japan 2016 Chainer feature introduction
 
LCU13: Power-efficient scheduling, and the latest news from the kernel summit
LCU13: Power-efficient scheduling, and the latest news from the kernel summitLCU13: Power-efficient scheduling, and the latest news from the kernel summit
LCU13: Power-efficient scheduling, and the latest news from the kernel summit
 
Study of using particle swarm for optimal power flow
Study of using particle swarm for optimal power flowStudy of using particle swarm for optimal power flow
Study of using particle swarm for optimal power flow
 
A Study on Task Scheduling in Could Data Centers for Energy Efficacy
A Study on Task Scheduling in Could Data Centers for Energy Efficacy A Study on Task Scheduling in Could Data Centers for Energy Efficacy
A Study on Task Scheduling in Could Data Centers for Energy Efficacy
 
Isope topfarm
Isope topfarmIsope topfarm
Isope topfarm
 
Lecture 16
Lecture 16Lecture 16
Lecture 16
 
HYDROTHERMAL COORDINATION FOR SHORT RANGE FIXED HEAD STATIONS USING FAST GENE...
HYDROTHERMAL COORDINATION FOR SHORT RANGE FIXED HEAD STATIONS USING FAST GENE...HYDROTHERMAL COORDINATION FOR SHORT RANGE FIXED HEAD STATIONS USING FAST GENE...
HYDROTHERMAL COORDINATION FOR SHORT RANGE FIXED HEAD STATIONS USING FAST GENE...
 
Short term Multi Chain Hydrothermal Scheduling Using Modified Gravitational S...
Short term Multi Chain Hydrothermal Scheduling Using Modified Gravitational S...Short term Multi Chain Hydrothermal Scheduling Using Modified Gravitational S...
Short term Multi Chain Hydrothermal Scheduling Using Modified Gravitational S...
 
SOME WORKLOAD SCHEDULING ALTERNATIVES 11.07.2013
SOME WORKLOAD SCHEDULING ALTERNATIVES 11.07.2013SOME WORKLOAD SCHEDULING ALTERNATIVES 11.07.2013
SOME WORKLOAD SCHEDULING ALTERNATIVES 11.07.2013
 
Farag1995
Farag1995Farag1995
Farag1995
 
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecture
Revisiting Co-Processing for Hash Joins on the CoupledCpu-GPU ArchitectureRevisiting Co-Processing for Hash Joins on the CoupledCpu-GPU Architecture
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecture
 
A Case Study of Economic Load Dispatch for a Thermal Power Plant using Partic...
A Case Study of Economic Load Dispatch for a Thermal Power Plant using Partic...A Case Study of Economic Load Dispatch for a Thermal Power Plant using Partic...
A Case Study of Economic Load Dispatch for a Thermal Power Plant using Partic...
 
Dynamic Economic Dispatch Assessment Using Particle Swarm Optimization Technique
Dynamic Economic Dispatch Assessment Using Particle Swarm Optimization TechniqueDynamic Economic Dispatch Assessment Using Particle Swarm Optimization Technique
Dynamic Economic Dispatch Assessment Using Particle Swarm Optimization Technique
 
Ece4762011 lect16
Ece4762011 lect16Ece4762011 lect16
Ece4762011 lect16
 
Genetic Algorithm for Solving the Economic Load Dispatch
Genetic Algorithm for Solving the Economic Load DispatchGenetic Algorithm for Solving the Economic Load Dispatch
Genetic Algorithm for Solving the Economic Load Dispatch
 
Modern processors
Modern processorsModern processors
Modern processors
 

Destacado

Improving Real-Time Performance on Multicore Platforms using MemGuard
Improving Real-Time Performance on Multicore Platforms using MemGuardImproving Real-Time Performance on Multicore Platforms using MemGuard
Improving Real-Time Performance on Multicore Platforms using MemGuardHeechul Yun
 
Parallelism-Aware Memory Interference Delay Analysis for COTS Multicore Systems
Parallelism-Aware Memory Interference Delay Analysis for COTS Multicore SystemsParallelism-Aware Memory Interference Delay Analysis for COTS Multicore Systems
Parallelism-Aware Memory Interference Delay Analysis for COTS Multicore SystemsHeechul Yun
 
A Simplex Architecture for Intelligent and Safe Unmanned Aerial Vehicles
A Simplex Architecture for Intelligent and Safe Unmanned Aerial VehiclesA Simplex Architecture for Intelligent and Safe Unmanned Aerial Vehicles
A Simplex Architecture for Intelligent and Safe Unmanned Aerial VehiclesHeechul Yun
 
Web Enabled DDS - London Connext DDS Conference
Web Enabled DDS - London Connext DDS ConferenceWeb Enabled DDS - London Connext DDS Conference
Web Enabled DDS - London Connext DDS ConferenceGerardo Pardo-Castellote
 
Memory access control in multiprocessor for real-time system with mixed criti...
Memory access control in multiprocessor for real-time system with mixed criti...Memory access control in multiprocessor for real-time system with mixed criti...
Memory access control in multiprocessor for real-time system with mixed criti...Heechul Yun
 
Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems
Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time SystemsTaming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems
Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time SystemsHeechul Yun
 
UAV Data Link Design for Dependable Real-Time Communications
UAV Data Link Design for Dependable Real-Time CommunicationsUAV Data Link Design for Dependable Real-Time Communications
UAV Data Link Design for Dependable Real-Time CommunicationsGerardo Pardo-Castellote
 
Using DDS to Secure the Industrial Internet of Things (IIoT)
Using DDS to Secure the Industrial Internet of Things (IIoT)Using DDS to Secure the Industrial Internet of Things (IIoT)
Using DDS to Secure the Industrial Internet of Things (IIoT)Gerardo Pardo-Castellote
 
MemGuard: Memory Bandwidth Reservation System for Efficient Performance Isola...
MemGuard: Memory Bandwidth Reservation System for Efficient Performance Isola...MemGuard: Memory Bandwidth Reservation System for Efficient Performance Isola...
MemGuard: Memory Bandwidth Reservation System for Efficient Performance Isola...Heechul Yun
 
DDS - The Proven Data Connectivity Standard for the Industrial IoT (IIoT)
DDS - The Proven Data Connectivity Standard for the Industrial IoT (IIoT)DDS - The Proven Data Connectivity Standard for the Industrial IoT (IIoT)
DDS - The Proven Data Connectivity Standard for the Industrial IoT (IIoT)Gerardo Pardo-Castellote
 
The Platform for the Industrial Internet of Things (IIoT)
The Platform for the Industrial Internet of Things (IIoT)The Platform for the Industrial Internet of Things (IIoT)
The Platform for the Industrial Internet of Things (IIoT)Gerardo Pardo-Castellote
 

Destacado (12)

Improving Real-Time Performance on Multicore Platforms using MemGuard
Improving Real-Time Performance on Multicore Platforms using MemGuardImproving Real-Time Performance on Multicore Platforms using MemGuard
Improving Real-Time Performance on Multicore Platforms using MemGuard
 
Parallelism-Aware Memory Interference Delay Analysis for COTS Multicore Systems
Parallelism-Aware Memory Interference Delay Analysis for COTS Multicore SystemsParallelism-Aware Memory Interference Delay Analysis for COTS Multicore Systems
Parallelism-Aware Memory Interference Delay Analysis for COTS Multicore Systems
 
A Simplex Architecture for Intelligent and Safe Unmanned Aerial Vehicles
A Simplex Architecture for Intelligent and Safe Unmanned Aerial VehiclesA Simplex Architecture for Intelligent and Safe Unmanned Aerial Vehicles
A Simplex Architecture for Intelligent and Safe Unmanned Aerial Vehicles
 
Web Enabled DDS - London Connext DDS Conference
Web Enabled DDS - London Connext DDS ConferenceWeb Enabled DDS - London Connext DDS Conference
Web Enabled DDS - London Connext DDS Conference
 
Memory access control in multiprocessor for real-time system with mixed criti...
Memory access control in multiprocessor for real-time system with mixed criti...Memory access control in multiprocessor for real-time system with mixed criti...
Memory access control in multiprocessor for real-time system with mixed criti...
 
Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems
Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time SystemsTaming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems
Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems
 
UAV Data Link Design for Dependable Real-Time Communications
UAV Data Link Design for Dependable Real-Time CommunicationsUAV Data Link Design for Dependable Real-Time Communications
UAV Data Link Design for Dependable Real-Time Communications
 
Using DDS to Secure the Industrial Internet of Things (IIoT)
Using DDS to Secure the Industrial Internet of Things (IIoT)Using DDS to Secure the Industrial Internet of Things (IIoT)
Using DDS to Secure the Industrial Internet of Things (IIoT)
 
Industrial IOT Data Connectivity Standard
Industrial IOT Data Connectivity StandardIndustrial IOT Data Connectivity Standard
Industrial IOT Data Connectivity Standard
 
MemGuard: Memory Bandwidth Reservation System for Efficient Performance Isola...
MemGuard: Memory Bandwidth Reservation System for Efficient Performance Isola...MemGuard: Memory Bandwidth Reservation System for Efficient Performance Isola...
MemGuard: Memory Bandwidth Reservation System for Efficient Performance Isola...
 
DDS - The Proven Data Connectivity Standard for the Industrial IoT (IIoT)
DDS - The Proven Data Connectivity Standard for the Industrial IoT (IIoT)DDS - The Proven Data Connectivity Standard for the Industrial IoT (IIoT)
DDS - The Proven Data Connectivity Standard for the Industrial IoT (IIoT)
 
The Platform for the Industrial Internet of Things (IIoT)
The Platform for the Industrial Internet of Things (IIoT)The Platform for the Industrial Internet of Things (IIoT)
The Platform for the Industrial Internet of Things (IIoT)
 

Similar a System-wide Energy Optimization for Multiple DVS Components and Real-time Tasks

Energy Efficiency in Large Scale Systems
Energy Efficiency in Large Scale SystemsEnergy Efficiency in Large Scale Systems
Energy Efficiency in Large Scale SystemsJerry Sheehan
 
CNR @ VMUG.IT 20150304
CNR @ VMUG.IT 20150304CNR @ VMUG.IT 20150304
CNR @ VMUG.IT 20150304VMUG IT
 
BKK16-311 EAS Upstream Stategy
BKK16-311 EAS Upstream StategyBKK16-311 EAS Upstream Stategy
BKK16-311 EAS Upstream StategyLinaro
 
Runtime Methods to Improve Energy Efficiency in HPC Applications
Runtime Methods to Improve Energy Efficiency in HPC ApplicationsRuntime Methods to Improve Energy Efficiency in HPC Applications
Runtime Methods to Improve Energy Efficiency in HPC ApplicationsFacultad de Informática UCM
 
Large scale simulation ship power system hebner-herbst-gatozzi - july 2010
Large scale simulation ship power system  hebner-herbst-gatozzi - july 2010Large scale simulation ship power system  hebner-herbst-gatozzi - july 2010
Large scale simulation ship power system hebner-herbst-gatozzi - july 2010cahouser
 
Distributed Convex Optimization Thesis - Behroz Sikander
Distributed Convex Optimization Thesis - Behroz SikanderDistributed Convex Optimization Thesis - Behroz Sikander
Distributed Convex Optimization Thesis - Behroz Sikanderrogerz1234567
 
load-balancing-method-for-embedded-rt-system-20120711-0940
load-balancing-method-for-embedded-rt-system-20120711-0940load-balancing-method-for-embedded-rt-system-20120711-0940
load-balancing-method-for-embedded-rt-system-20120711-0940Samsung Electronics
 
OPAL-RT RT13 Conference: Real-time Optimization and Simulation for Integrated...
OPAL-RT RT13 Conference: Real-time Optimization and Simulation for Integrated...OPAL-RT RT13 Conference: Real-time Optimization and Simulation for Integrated...
OPAL-RT RT13 Conference: Real-time Optimization and Simulation for Integrated...OPAL-RT TECHNOLOGIES
 
An application classification guided cache tuning heuristic for
An application classification guided cache tuning heuristic forAn application classification guided cache tuning heuristic for
An application classification guided cache tuning heuristic forKhyati Rajput
 
PuShort Term Hydrothermal Scheduling using Evolutionary Programmingblished pa...
PuShort Term Hydrothermal Scheduling using Evolutionary Programmingblished pa...PuShort Term Hydrothermal Scheduling using Evolutionary Programmingblished pa...
PuShort Term Hydrothermal Scheduling using Evolutionary Programmingblished pa...Satyendra Singh
 
byteLAKE's expertise across NVIDIA architectures and configurations
byteLAKE's expertise across NVIDIA architectures and configurationsbyteLAKE's expertise across NVIDIA architectures and configurations
byteLAKE's expertise across NVIDIA architectures and configurationsbyteLAKE
 
Large-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC WorkloadsLarge-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC Workloadsinside-BigData.com
 
Parallel Programming for Multi- Core and Cluster Systems - Performance Analysis
Parallel Programming for Multi- Core and Cluster Systems - Performance AnalysisParallel Programming for Multi- Core and Cluster Systems - Performance Analysis
Parallel Programming for Multi- Core and Cluster Systems - Performance AnalysisShah Zaib
 
Energy efficient-resource-allocation-in-distributed-computing-systems
Energy efficient-resource-allocation-in-distributed-computing-systemsEnergy efficient-resource-allocation-in-distributed-computing-systems
Energy efficient-resource-allocation-in-distributed-computing-systemsCemal Ardil
 
XPDS13: Performance Evaluation of Live Migration based on Xen ARM PVH - Jaeyo...
XPDS13: Performance Evaluation of Live Migration based on Xen ARM PVH - Jaeyo...XPDS13: Performance Evaluation of Live Migration based on Xen ARM PVH - Jaeyo...
XPDS13: Performance Evaluation of Live Migration based on Xen ARM PVH - Jaeyo...The Linux Foundation
 
Windows server power_efficiency___robben_and_worthington__final
Windows server power_efficiency___robben_and_worthington__finalWindows server power_efficiency___robben_and_worthington__final
Windows server power_efficiency___robben_and_worthington__finalBruce Worthington
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuninginside-BigData.com
 
Economic Dispatch of Generated Power Using Modified Lambda-Iteration Method
Economic Dispatch of Generated Power Using Modified Lambda-Iteration MethodEconomic Dispatch of Generated Power Using Modified Lambda-Iteration Method
Economic Dispatch of Generated Power Using Modified Lambda-Iteration MethodIOSR Journals
 

Similar a System-wide Energy Optimization for Multiple DVS Components and Real-time Tasks (20)

Energy Efficiency in Large Scale Systems
Energy Efficiency in Large Scale SystemsEnergy Efficiency in Large Scale Systems
Energy Efficiency in Large Scale Systems
 
CNR @ VMUG.IT 20150304
CNR @ VMUG.IT 20150304CNR @ VMUG.IT 20150304
CNR @ VMUG.IT 20150304
 
BKK16-311 EAS Upstream Stategy
BKK16-311 EAS Upstream StategyBKK16-311 EAS Upstream Stategy
BKK16-311 EAS Upstream Stategy
 
Runtime Methods to Improve Energy Efficiency in HPC Applications
Runtime Methods to Improve Energy Efficiency in HPC ApplicationsRuntime Methods to Improve Energy Efficiency in HPC Applications
Runtime Methods to Improve Energy Efficiency in HPC Applications
 
Large scale simulation ship power system hebner-herbst-gatozzi - july 2010
Large scale simulation ship power system  hebner-herbst-gatozzi - july 2010Large scale simulation ship power system  hebner-herbst-gatozzi - july 2010
Large scale simulation ship power system hebner-herbst-gatozzi - july 2010
 
Distributed Convex Optimization Thesis - Behroz Sikander
Distributed Convex Optimization Thesis - Behroz SikanderDistributed Convex Optimization Thesis - Behroz Sikander
Distributed Convex Optimization Thesis - Behroz Sikander
 
load-balancing-method-for-embedded-rt-system-20120711-0940
load-balancing-method-for-embedded-rt-system-20120711-0940load-balancing-method-for-embedded-rt-system-20120711-0940
load-balancing-method-for-embedded-rt-system-20120711-0940
 
OPAL-RT RT13 Conference: Real-time Optimization and Simulation for Integrated...
OPAL-RT RT13 Conference: Real-time Optimization and Simulation for Integrated...OPAL-RT RT13 Conference: Real-time Optimization and Simulation for Integrated...
OPAL-RT RT13 Conference: Real-time Optimization and Simulation for Integrated...
 
An application classification guided cache tuning heuristic for
An application classification guided cache tuning heuristic forAn application classification guided cache tuning heuristic for
An application classification guided cache tuning heuristic for
 
PuShort Term Hydrothermal Scheduling using Evolutionary Programmingblished pa...
PuShort Term Hydrothermal Scheduling using Evolutionary Programmingblished pa...PuShort Term Hydrothermal Scheduling using Evolutionary Programmingblished pa...
PuShort Term Hydrothermal Scheduling using Evolutionary Programmingblished pa...
 
byteLAKE's expertise across NVIDIA architectures and configurations
byteLAKE's expertise across NVIDIA architectures and configurationsbyteLAKE's expertise across NVIDIA architectures and configurations
byteLAKE's expertise across NVIDIA architectures and configurations
 
Large-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC WorkloadsLarge-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC Workloads
 
Parallel Programming for Multi- Core and Cluster Systems - Performance Analysis
Parallel Programming for Multi- Core and Cluster Systems - Performance AnalysisParallel Programming for Multi- Core and Cluster Systems - Performance Analysis
Parallel Programming for Multi- Core and Cluster Systems - Performance Analysis
 
Balancing Power & Performance Webinar
Balancing Power & Performance WebinarBalancing Power & Performance Webinar
Balancing Power & Performance Webinar
 
Energy efficient-resource-allocation-in-distributed-computing-systems
Energy efficient-resource-allocation-in-distributed-computing-systemsEnergy efficient-resource-allocation-in-distributed-computing-systems
Energy efficient-resource-allocation-in-distributed-computing-systems
 
XPDS13: Performance Evaluation of Live Migration based on Xen ARM PVH - Jaeyo...
XPDS13: Performance Evaluation of Live Migration based on Xen ARM PVH - Jaeyo...XPDS13: Performance Evaluation of Live Migration based on Xen ARM PVH - Jaeyo...
XPDS13: Performance Evaluation of Live Migration based on Xen ARM PVH - Jaeyo...
 
Windows server power_efficiency___robben_and_worthington__final
Windows server power_efficiency___robben_and_worthington__finalWindows server power_efficiency___robben_and_worthington__final
Windows server power_efficiency___robben_and_worthington__final
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
 
Economic Dispatch of Generated Power Using Modified Lambda-Iteration Method
Economic Dispatch of Generated Power Using Modified Lambda-Iteration MethodEconomic Dispatch of Generated Power Using Modified Lambda-Iteration Method
Economic Dispatch of Generated Power Using Modified Lambda-Iteration Method
 
Green scheduling
Green schedulingGreen scheduling
Green scheduling
 

Último

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 

Último (20)

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 

System-wide Energy Optimization for Multiple DVS Components and Real-time Tasks

  • 1. System-wide Energy Optimization for Multiple DVS Components and Real-time Tasks HeechulYun, Po-Liang Wu, AnshuArya, TarekAbdelzaher, Cheolgi Kim, and LuiSha
  • 2. DVS in Real-time Systems The Goal To minimize energy consumption by adjusting freq. and voltage but still meet the deadline Most consider CPU only Assume execution time depends on CPU freq. But memory and bus are also important Affect execution time (e.g., memory intensive app will be slowed if memory or bus is slow.) Consume considerable energy (similar order of energy compared to CPU) Are DVS capable in many recent embedded processors 2
  • 3. Motivation 3 Memxfer5b : memory benchmark program Half of CPU clock Exec. time increased only 3% Energy saved 30%
  • 4. Motivation 4 Dhrystone: CPU benchmark program Half of Mem clock Exec time increased only 0.05% Energy saved 10%
  • 5. Contents Motivation Energy Model Considers CPU, BUS and Memory and task characteristics Evaluation (Model validation) Energy Optimization of Real-time Tasks Static multi-DVS problem and solution Evaluation Conclusion 5
  • 6. Task Model 6 computation memory fetch (cache stall) power power Computation Memory fetch time time Task = Computation + Memory fetch
  • 7. Task Model (2) power Lower CPU freq M power C time C M time power C : computation M : off-chip memory fetch (cache-stall cycles) C Lower MEM freq M time 7
  • 8. Task Model (3) Execution time of a task C : CPU cycles of a given task M : memory cycles of a given task fc : CPU clock frequency fm : Memory clock frequency 8
  • 9. Power Model Power of a component (i.e., CPU) k : capacitance constant f : frequency of the component V : supplying voltage R : leakage power 9 Different k for different modes: kactive - active mode capacitance kstandby- standby mode capacitance
  • 10. Energy Model 10 power Pure Computation Memory Fetch (Cache stall) idle time P (Period) e(exec. time) Total system energy is
  • 11. Pure Computation Block 11 power CPU active Memory Fetch (Cache stall) Bus, memstandby idle System static time e P kca : capacitance constant for activecpu kbs : capacitance constant for standby bus kms : capacitance constant for standby memory R : system wide static power consumption
  • 12. Memory Fetch Block 12 power Pure Computation CPU standby Bus, memactive idle System static time e P kcs : capacitance constant for standbycpu kba : capacitance constant for active bus kma : capacitance constant for active memory
  • 13. Idle Block 13 power Pure Computation Memory Fetch (Cache stall) CPU, bus, mem idle System static time e P I : idle mode power consumption. e: execution time (C/fc + M/fm )
  • 14. Energy Model Summary 14 power Ecpu Emem Eidle pure exec block MEM fetch block idle block CPU active CPU standby Memory Fetch Dynamic power Bus, memactive Bus, memstandby CPU, bus, mem idle idle System static time e P System wide energy model Considers CPU, bus, and memory power consumption Considers active, standby and idle modes Other components are assumed to be static (included in R)
  • 15. Energy Equation 15 CPU block Memory block Idle block System-wide energy consumption of a task during period P
  • 16. Power supply ARM926 PSRAM (256KB) 8K-I 8K-D System bus STMP3650 SoC External peripherals (flash, LCD, External DRAM, …) BOARD 16 Evaluation Platform Multi-meter
  • 17. Evaluation Platform (2) ARM9 based SoC CPU : up to 200Mhz, BUS : up to 100Mhz CPU and BUS are synchronous (BUS = CPU/N) Memory (PSRAM) freq is equal to system bus frequency (fb=fm) CPU, BUS, and memory all share the common voltage Vdd : 1.504V ~ 1.804V (0.32V step) Energy equation V : shared voltage for CPU, bus, and memory : active bus and memory constant : standby bus and memory constant 17
  • 18. Validation Methodology 4 synthetic programs with different cache stall ratio (0%, 10%, 25%, 55%) 8 clock configurations (fc, fm) for each program Performed nonlinear least square analysis for total 32 data points against the energy equation 18
  • 19. Energy Model Fitting 19 Coefficient of determination(R2) is 99.97% (100% is a perfect fit)
  • 20. Energy Equation for Our Platform 20 Obtained coefficients in the energy equation
  • 21. Contents Motivation Energy Model Considers CPU, BUS and Memory and task characteristics Evaluation Energy Optimization of Real-time Tasks Static Multi-DVS Problem and optimal solution Evaluation Conclusion 21
  • 22. Static Multi-DVS Problem Given a set of periodic real-time tasks (T1, …,Tn), where each task invocation requires up to Ci CPU cycles and up to Mi memory cycles at worst. Find the energy optimal static frequencies for multiple DVS capable components (CPU, bus, and memory) 22
  • 23. Problem Formulation Minimize Subjects to where 23 H : hyper period ei : execution time of task i Ecomp,i: computation block energy of task i Emem,i: cache stall block energy of task i Eidle: idle block energy
  • 24. Optimal Solution Intuitive procedure Find an unconstrained minimal over fc and fm (fb= fm) Check boundary conditions due to system specific constraints. (e.g., minimum and maximum clock range) Details are in the paper 24
  • 25. Energy Plot 25 Blue : less energy Red : more energy fm(MHz) Deadlineboundary fc(MHz) Task set : CH = 140*106, MH = 30*106 ,H = 3s
  • 26. Evaluation Compare the following schemes: MAX CPU and memory are all set to maximum. CPU-only static DVS Memory frequency is set to maximum Baseline static multi-DVS CPU and memory frequencies change proportionally Optimal static multi-DVS Proposed scheme Optimal dynamic multi-DVS Can change frequencies at each task schedule Brute force search among all the possible combination Simulation setup Use energy equation obtained from measurements on our real hardware platform 26
  • 27. Energy vs Utilization 27 Normalized average power consumption utilization Task set cache stall ratio (MH/(CH+MH) ):0.3
  • 28. Energy vs Cache Stall Ratio 28 Normalized average power consumption Cache stall ratio Task set utilization ratio(eH/H):0.5
  • 29. Effect of Diversity of Cache Stall Ratio 29 Normalized energy consumption diversity homogeneous diverse Task set cache stall ratio = 0.45, Task set utilization ratio(eH/H):0.5
  • 30. Conclusion Energy model Considers multiple DVS capable components and task characteristic Validated on a real hardware platform Static multi-DVS problem Assigns energy optimal static frequencies of multiple DVS components for periodic real-time tasks Optimal solution (static multi-DVS scheme) shows better energy saving compared to CPU-only DVS 30
  • 33. CPU-only DVS 33 Valid range (~200Mhz) Energy (mJ) fc (Mhz) Not effective in allowed range (*) based on energy equation for out h/w platform. Memory clock was set to max
  • 34. Power Distribution 34 Cache stall ratio = 55% (cpu,bus)=(80,80Mhz) Cache stall ratio = 10% (cpu,bus)=(80,80Mhz) (*) based on energy equation for our h/w platform E = Ecpu + Emem + Estatic
  • 35. Active and Idle 35 mJ mJ fc (Mhz) fm (Mhz) (*) actual measurement result

Notas del editor

  1. DVS is extensively studied in real-time system. The goal is to minimize energy consumption and still meet the deadline of tasks. Most DVS schemes only considerCPU and assume execution time depends on CPU freq. In reality, there are other similarly important components such as memory and bus that affects execution time and energy consumption. Moreover, unlike desktop processors, in many embedded processors, we can control bus and memory clocks as well as cpu clock.
  2. Here is one real example. We ran a memory intensive benchmark on our hardware platform and measured the power consumption of the entire system. When we lower the CPU clock to half (click), the execution time does not change much (click), only a 3% increase, instead of double the execution time. This is because most of the time, the task fetch data from memory which is independent from CPU clock speed. So, by reducing the CPU clock, we save 30% on energy consumption with only 3% of time increase.
  3. In a second example, we ran a CPU benchmark program called dhrystone. Of course, if we change CPU clock to half, the execution time will double. However, if we change memory clock to half, its execution time does not change at all, because this program does not fetch data from memory much, but we save 10% on energy consumption. The amount of save energy is smaller than the previous experiment, but still significant.  These experiments shows that CPU only DVS is not enough and motivate to us develop a realistic model that can explain these behaviors.
  4. Mathematically, the execution time of a task is defined in this eq. Here C is cpu cycles, M is memory cycles, fc is cpu clock, and fm is memory clock. Then the execution time e is C over fc plus M over fm.
  5. And this equation is standard power equation. Here k is capacitance constant for the component and f is frequency, V is voltage, and R is leakage power of the component. that is Power consumption W equal to kfV square plus RHere, it is important to understand that if operation mode is different then k is different. For example, when CPU is busy, its power consumption is much bigger compared to when cpu is not doing anything even though its operating frequency is the same. It is because of low power h/w design such as clock gating technique reduce the number of active part of the component so that it lower its aggeregated capacitance constance.
  6. Now, let’s consider a task energy consumption for a given period P. Task execution can be divided into two blocks: pure computation and cache stall which means time to fetch data from off chip memory to fill the cache line of the processor core. While memory fetch operations are scattered throughout the entire execution, we aggregate them into this single block. This is true for in-order processor in which processor must wait until data to be fetched when there is any single cache miss. Therefore there’s no overlap between CPU execution and memory fetch. This is not true for out-of-order processors because there is overlapping period due to out-of-order executions, but it is relatively small compared to long memory fetch time. When a task complete, cpu, bus and memory are all in idle state. Then, the energy consumption can be expressed as the sum of energy consumption of these three blocks.
  7. Now, let’s look at the first block – pure computation block. As I said, we have three major component to consider, cpu, memory, and bus that each component is a single term in this equation.The cpu … The bus and memory It is important to note that each component is in different mode of operation. At this block, the CPU is actively executing instructions without any delay. But bus and memory are not doing anything. When a component is in different mode of operation, the capacitance constant can be quite different even if its operating frequency remain the same because of various power saving techniques, such as clock gating, used in recent hardware design. Therefore we use different capacitance constant for different for active and standby mode. Therefore, in this equation, kca is capacitance constant for active cpu, kbs is standby bus, and kms is also standby memory. R represents the sum of all static power components in the system. The execution time of this portion of the task is C over fc. So far, we described this block.
  8. The next block is cache stall block. The difference, compared to pure computation block, is mode of operation of each component. In this block, CPU is do noting but wait data are being fetched from memory. Therefore, in this block, CPU is idle but bus and memory is active. Again we use different capacitance constant for each component at each mode of operation. Here, Kcs is standby cpu, kbs is active bus, kma is active memory. And the execution time of this portion is M over fm.
  9. Final block is idle block. Since the task is finished, all components – cpu, bus, memory – are in idle mode. We assume there is special mode in the system that save power more aggressively, which can be found in many recent embedded processors. Therefore, we used a separate term I instead to represent the power consumption in that special mode of the system. The execution time is period minus the execution time of the task.
  10. And this is the equation we derived which describes energy consumption of entire system running a periodic task.
  11. In this hardware platform, we used 4 synthetic programs with different cache stall ratio (0-55%) and for each task, we used 8 different clock configurations and measured the energy consumption. After measuring total 32 data we performed non-linear-…
  12. Hyper period EDF schedule (dynamic scheduling)