SlideShare una empresa de Scribd logo
1 de 28
Descargar para leer sin conexión
1
Update on big.LITTLE on TC2
Morten Rasmussen
Technology Researcher
2
Agenda
 big.LITTLE Software solutions overview
 ARM's Test Chip 2 overview
 Benchmarking Methodology and Use Cases
 IKS status update
 big.LITTLE MP status update
3
big.LITTLE overview
 Performance and power efficiency in one system:
Cortex-A15 vs Cortex-A7
Performance
Cortex-A7 vs Cortex-A15
Energy Efficiency
Dhrystone 1.9x 3.5x
FDCT 2.3x 3.8x
IMDCT 3.0x 3.0x
MemCopy L1 1.9x 2.3x
MemCopy L2 1.9x 3.4x
4
IKS solution – Basics
 In-Kernel Switcher (IKS):
 Targeted first generation big.LITTLE products.
Cortex-A7
Cortex-A15
Kernel
scheduler IKS
Task 1
Task 2
Logical CPU ?
5
MP solution
Cortex-A7
Cortex-A15
Kernel
scheduler
Task 1
Task 2
?
6
ARM’s Test Chip 2 (TC#2): An Overview
 A Versatile Express core tile
publically available:
 Capabilities
 2 x A15 (r2p1) @ up to 1.2 Ghz
 3 x A7 (r0p1) @ up to 1Ghz
 CCI/DMC/GIC/ADB (r0p0)
 DMA (PL330)
 2GB external DDR2 memory
@ 400Mhz
 64k internal SRAM
 Coresight debug (including JTAG
and ITM trace but no STM)
 No GPU
 cpufreq support: Independent for
each cluster with limited voltage
scaling
 cpuidle support: Cluster power
gating
TC2
7
Benchmarking Methodology
Results
Performance
Power
Configurable:
- CCI
- ftrace
- streamline
CSV config:
- Use case
- Scheduling model
- Numbers of cores to use
- Scaling governors
 Automated system for
running user workloads
on target device
Choose workload
Choose CPU mode:
Cortex-A7, Cortex-A15, Migration
(cluster or CPU), or MP
Choose active cores in each
cluster
TC2: 1-2 big, 1-3 LITTLE
Choose DVFS governor:
Interactive, performance,
powersave, ondemand
Extensible – parameterisation
8
IKS solution
 Targeted first generation big.LITTLE products.
Cortex-A7
Cortex-A15
Kernel
scheduler IKS
Task 1
Task 2
Logical CPU ?
CONFIDENTIAL9
IKS: CPU Migration
 big.LITTLE extends DVFS
 DVFS algorithm monitors load on each
CPU
 When load is low it can be handled on a
LITTLE processor
 When load is high the context is
transferred to a big processor
 The unused processor can be powered
down
 When all processors in a cluster are
inactive the cluster and its L2 cache can
be powered down
CONFIDENTIAL10
IKS: CPU Migration
 big.LITTLE extends DVFS
 DVFS algorithm monitors load on each
CPU
 When load is low it can be handled on a
LITTLE processor
 When load is high the context is
transferred to a big processor
 The unused processor can be powered
down
 When all processors in a cluster are
inactive the cluster and its L2 cache can
be powered down
11
IKS: OPP mapping to A7 / A15 on TC2
 Virtual Frequency maps OPPs to big or LITTLE cores
Virtual
OPP
Physical OPP
A7
Physical OPP
A15 Voltage
A7
350000 350000 V1
400000 400000 V1
... X X V1
800000 800000 V1
900000 900000 V2
1000000 1000000 V3
A15
1200000 600000 V1
1400000 700000 V1
... X 2X V1
2000000 1000000 V1
2200000 1100000 V2
2400000 1200000 V3
12
IKS: Results for Audio on TC2
 Power compared to executing the use case on A15
 IKS does not use A15s during Audio run
70% saving
TC2:
A15 up to 1.2 GHz
A7 up to 1 GHz
Better results expected on
representative silicon.
13
IKS: Results for BBench + Audio on TC2
 Performance is measured as from page loading times of
BBench
 Results normalised to power and performance consumed on
same use case run on A15 only
BBench page + Audio
TC2:
A15 up to 1.2 GHz
A7 up to 1 GHz
Better results expected on
representative silicon.
14
IKS: OPPs on TC2
15
IKS: Interactive governor on TC2
if (cpu_load >= go_hispeed_load){
...
new_freq = max_freq * cpu_load / 100;
...
}
else {
...
new_freq = hispeed_freq*cpu_load/100;
...
}
 For A15 on TC2 with a go_highspeed at 85% (default) this algorithm
only uses overdrive section of A15
 Approach is to introduce a second point of inflection:highspeed2
16
IKS: Hispeed2
17
IKS: Results: Bbench + Audio
 Power improves with no performance cost
BBench page + Audio
TC2:
A15 up to 1.2 GHz
A7 up to 1 GHz
Better results expected on
representative silicon.
18
MP solution
Cortex-A7
Cortex-A15
Kernel
scheduler
Task 1
Task 2
?
19
MP solution – more details
 Scheduler modifications:
 Treat big and LITTLE cpus as
separate scheduling domains.
 Use PJT's load-tracking patches
to track individual task load.
 Migrate tasks between the big and
the LITTLE domains based on
task load.
 Patch set available through Linaro.
L
BB
L
Load balance Load balance
Load-based task migration
Task load
Task state
Executing Sleep
Load decay
20
MP: Experimental Implementation
 Scheduler modifications:
 Apply PJTs’ load-tracking patch set.
 Set up big and little sched_domains with no
load-balancing between them.
 select_task_rq_fair() checks task load
history to select appropriate target CPU for
tasks waking up.
 Add forced migration mechanism to push of
the currently running task to big core similar
to the existing active load balancing
mechanism.
 Periodically check
(run_rebalance_domains()) current task on
little runqueues for tasks that need to be
forced to migrate to a big core.
L
BB
L
load_balance load_balance
select_task_rq_fair()/
Forced migration
21
MP: ARM TC2: Audio
 Workload: Audio (mp3 playback)
 Performance/Energy target:
 A7 energy
 Status:
 Audio related task do not use A15s, but
the power consumption is still
significantly more than A7 alone.
 MP not as power efficient as IKS yet
 Todo:
 Target spurious wake-ups on A15. All
the extra power comes from the A15's
which shouldn't be used at all. Energy
A7 30.79%
MP 39.86%
0
10
20
30
40
50
60
70
80
90
100 Audio
A15
A7 2CPU
IKS
MP
Energy
TC2:
A15 up to 1.2 GHz
A7 up to 1 GHz
Better results expected on
representative silicon.
22
MP: Audio workload analysis
 Where is the extra energy spent
with MP?
 Need a look at why A15's consume
power when they are not necessary.
A7 MP
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
Audio energy breakdown
A15 cluster
A7 cluster
Energy
hrtimer functions cpu0 cpu1 cpu2 cpu3 cpu4
hrtimer_wakeup 2 2 1212 417 190
tick_sched_timer 404 58 483 507 779
WQ functions cpu0 cpu1 cpu2 cpu3 cpu4
vmstat_update 30 2 27 25 28
cache_reap 15 2 14 13 14
phy_state_machine 31 0 0 0 0
Enter idle cpu0 cpu1 cpu2 cpu3 cpu4
0 6 2 2379 260 423
1 801 807 8316 9373 9652
TC2:
A15 up to 1.2 GHz
A7 up to 1 GHz
Better results expected on
representative silicon.
23
Scale invariant load
 Load accumulation rate does not scale with available
compute capacity (frequency, big/LITTLE cpu)
 Currently, there is no link between cpufreq and the scheduler
 Tasks may be migrated away from a cpu at low frequency by the
scheduler before cpufreq has increased the frequency to match the
cpu load.
 Scaling the tracked load accumulation to match the current
frequency mitigates this issue.
 Tasks cannot accumulate enough load at low frequency to trigger
migration and must wait for cpufreq to react first.
Freq = x Freq = 2x
24
Scale invariant load
76782.1 76782.2 76782.3 76782.4 76782.5 76782.6
0
200
400
600
800
1000
76332.95 76333.05 76333.15 76333.25 76333.35 76333.45
0
200
400
600
800
1000
Original Frequency invariant
25
Load accumulation rate
 For some workloads tracked load saturates too fast and leads
to unnecessary task migrations.
 Extending the tracked load history reduces tracked load
variations due to sudden changes in the load characteristics.
 Increasing the y factor in the load expression decreases the
load accumulation and decay rates.
load=
u0+u1⋅y+u2⋅y
2
+…+un⋅y
n
1024+y+ y
2
+…+ y
n
+1
1 21 41 61 81 101
6
11
16 26
31
36 46
51
56 66
71
76 86
91
96 106
111
116
121
126
131
136
141
146
151
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
y=0.9785
Time [ms]
y<1,0⩽u<1024
26
Load accumulation rate
 Increasing y leads to a more conservative tracked load
 Should lead to less up/down migrations
 Increases up/down migrations delay for tasks that needs to be
migrated.
1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97
4 10 16 22 28 34 40 46 52 58 64 70 76 82 88 94 100
103
106
109
112
115
118
121
124
127
130
133
136
139
142
145
148
151
154
157
160
163
166
169
172
175
178
181
184
187
190
193
196
199
Load accumulation rate
Task
y=0.9785
y=0.9844
y=0.9922
Time [ms]
Trackedload
27
MP – Top Issues
 Spurious wakeups
 A15s are woken up by scheduler ticks (mainly)
 Workqueues
 Timers
 RCU
 cpu wakeup prioritisation
 Pick the cheapest target cpu
 Global balancing
 Spread load to A7s when A15s are overloaded
 Pack vs. spread
 Cluster aware cpufreq governors
28
Questions?

Más contenido relacionado

La actualidad más candente

Adam_Mcconnell_Revision3
Adam_Mcconnell_Revision3Adam_Mcconnell_Revision3
Adam_Mcconnell_Revision3Adam McConnell
 
Mobile Development For Arduino 201 - ConnectTech
Mobile Development For Arduino 201 - ConnectTechMobile Development For Arduino 201 - ConnectTech
Mobile Development For Arduino 201 - ConnectTechstable|kernel
 
OSMC 2012 | Monitoring at CERN by Christophe Haen
OSMC 2012 | Monitoring at CERN by Christophe HaenOSMC 2012 | Monitoring at CERN by Christophe Haen
OSMC 2012 | Monitoring at CERN by Christophe HaenNETWAYS
 
Open Networking Better Networking Through Programmability
Open Networking Better Networking Through ProgrammabilityOpen Networking Better Networking Through Programmability
Open Networking Better Networking Through ProgrammabilityTal Lavian Ph.D.
 
vlsi training in chennai
vlsi training in chennaivlsi training in chennai
vlsi training in chennaimatrixphagwara
 
AgO Product Road Map
AgO Product Road MapAgO Product Road Map
AgO Product Road MapManoj Nagesh
 
Low power wake up reciever operating in the
Low power wake up reciever operating in theLow power wake up reciever operating in the
Low power wake up reciever operating in theriyasahammedc
 
Concurrent Triple Band Low Noise Amplifier Design
Concurrent Triple Band Low Noise Amplifier DesignConcurrent Triple Band Low Noise Amplifier Design
Concurrent Triple Band Low Noise Amplifier DesignHalil Kayıhan
 
2017 Atlanta Regional User Seminar - Residential Battery Storage Systems. Des...
2017 Atlanta Regional User Seminar - Residential Battery Storage Systems. Des...2017 Atlanta Regional User Seminar - Residential Battery Storage Systems. Des...
2017 Atlanta Regional User Seminar - Residential Battery Storage Systems. Des...OPAL-RT TECHNOLOGIES
 

La actualidad más candente (15)

Abhi monal
Abhi monalAbhi monal
Abhi monal
 
Saccolfinal 090505095735-phpapp01
Saccolfinal 090505095735-phpapp01Saccolfinal 090505095735-phpapp01
Saccolfinal 090505095735-phpapp01
 
Cow Creek Data Center 2013
Cow Creek Data Center 2013Cow Creek Data Center 2013
Cow Creek Data Center 2013
 
Adam_Mcconnell_Revision3
Adam_Mcconnell_Revision3Adam_Mcconnell_Revision3
Adam_Mcconnell_Revision3
 
Low noise amplifier
Low noise amplifierLow noise amplifier
Low noise amplifier
 
Mobile Development For Arduino 201 - ConnectTech
Mobile Development For Arduino 201 - ConnectTechMobile Development For Arduino 201 - ConnectTech
Mobile Development For Arduino 201 - ConnectTech
 
OSMC 2012 | Monitoring at CERN by Christophe Haen
OSMC 2012 | Monitoring at CERN by Christophe HaenOSMC 2012 | Monitoring at CERN by Christophe Haen
OSMC 2012 | Monitoring at CERN by Christophe Haen
 
final report
final reportfinal report
final report
 
Open Networking Better Networking Through Programmability
Open Networking Better Networking Through ProgrammabilityOpen Networking Better Networking Through Programmability
Open Networking Better Networking Through Programmability
 
vlsi training in pune
vlsi training in punevlsi training in pune
vlsi training in pune
 
vlsi training in chennai
vlsi training in chennaivlsi training in chennai
vlsi training in chennai
 
AgO Product Road Map
AgO Product Road MapAgO Product Road Map
AgO Product Road Map
 
Low power wake up reciever operating in the
Low power wake up reciever operating in theLow power wake up reciever operating in the
Low power wake up reciever operating in the
 
Concurrent Triple Band Low Noise Amplifier Design
Concurrent Triple Band Low Noise Amplifier DesignConcurrent Triple Band Low Noise Amplifier Design
Concurrent Triple Band Low Noise Amplifier Design
 
2017 Atlanta Regional User Seminar - Residential Battery Storage Systems. Des...
2017 Atlanta Regional User Seminar - Residential Battery Storage Systems. Des...2017 Atlanta Regional User Seminar - Residential Battery Storage Systems. Des...
2017 Atlanta Regional User Seminar - Residential Battery Storage Systems. Des...
 

Similar a LCE12: big.LITTLE TC2 update

MICROPROCESSOR BASED SUN TRACKING SOLAR PANEL SYSTEM TO MAXIMIZE ENERGY GENER...
MICROPROCESSOR BASED SUN TRACKING SOLAR PANEL SYSTEM TO MAXIMIZE ENERGY GENER...MICROPROCESSOR BASED SUN TRACKING SOLAR PANEL SYSTEM TO MAXIMIZE ENERGY GENER...
MICROPROCESSOR BASED SUN TRACKING SOLAR PANEL SYSTEM TO MAXIMIZE ENERGY GENER...moiz89
 
Implementing AI: High Performance Architectures: A Universal Accelerated Comp...
Implementing AI: High Performance Architectures: A Universal Accelerated Comp...Implementing AI: High Performance Architectures: A Universal Accelerated Comp...
Implementing AI: High Performance Architectures: A Universal Accelerated Comp...KTN
 
Q2.12: Research Update on big.LITTLE MP Scheduling
Q2.12: Research Update on big.LITTLE MP SchedulingQ2.12: Research Update on big.LITTLE MP Scheduling
Q2.12: Research Update on big.LITTLE MP SchedulingLinaro
 
Large-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC WorkloadsLarge-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC Workloadsinside-BigData.com
 
intelligent braking system report
intelligent braking system reportintelligent braking system report
intelligent braking system reportSumit Kumar
 
Ertss2010 multicore scheduling
Ertss2010 multicore schedulingErtss2010 multicore scheduling
Ertss2010 multicore schedulingNicolas Navet
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentIJERD Editor
 
Bluetooth based home appliances control
Bluetooth based home appliances controlBluetooth based home appliances control
Bluetooth based home appliances controlPROJECTRONICS
 
General Purpose Computing using Graphics Hardware
General Purpose Computing using Graphics HardwareGeneral Purpose Computing using Graphics Hardware
General Purpose Computing using Graphics HardwareDaniel Blezek
 
EMERSON EDUARDO RODRIGUES wcdma-optimization-related-questions-m-com-academy
EMERSON EDUARDO RODRIGUES wcdma-optimization-related-questions-m-com-academyEMERSON EDUARDO RODRIGUES wcdma-optimization-related-questions-m-com-academy
EMERSON EDUARDO RODRIGUES wcdma-optimization-related-questions-m-com-academyEMERSON EDUARDO RODRIGUES
 
Scaling Green Instrumentation to more than 10 Million Cores
Scaling Green Instrumentation to more than 10 Million CoresScaling Green Instrumentation to more than 10 Million Cores
Scaling Green Instrumentation to more than 10 Million Coresinside-BigData.com
 
An_FPGA_Based_Passive_K_Delta_1_Sigma_Modulator
An_FPGA_Based_Passive_K_Delta_1_Sigma_ModulatorAn_FPGA_Based_Passive_K_Delta_1_Sigma_Modulator
An_FPGA_Based_Passive_K_Delta_1_Sigma_ModulatorMatthew Albert Meza
 
Design & implementation of 16 bit low power ALU with clock gating
Design & implementation of 16 bit low power ALU with clock gatingDesign & implementation of 16 bit low power ALU with clock gating
Design & implementation of 16 bit low power ALU with clock gatingIRJET Journal
 

Similar a LCE12: big.LITTLE TC2 update (20)

MICROPROCESSOR BASED SUN TRACKING SOLAR PANEL SYSTEM TO MAXIMIZE ENERGY GENER...
MICROPROCESSOR BASED SUN TRACKING SOLAR PANEL SYSTEM TO MAXIMIZE ENERGY GENER...MICROPROCESSOR BASED SUN TRACKING SOLAR PANEL SYSTEM TO MAXIMIZE ENERGY GENER...
MICROPROCESSOR BASED SUN TRACKING SOLAR PANEL SYSTEM TO MAXIMIZE ENERGY GENER...
 
Implementing AI: High Performance Architectures: A Universal Accelerated Comp...
Implementing AI: High Performance Architectures: A Universal Accelerated Comp...Implementing AI: High Performance Architectures: A Universal Accelerated Comp...
Implementing AI: High Performance Architectures: A Universal Accelerated Comp...
 
Q2.12: Research Update on big.LITTLE MP Scheduling
Q2.12: Research Update on big.LITTLE MP SchedulingQ2.12: Research Update on big.LITTLE MP Scheduling
Q2.12: Research Update on big.LITTLE MP Scheduling
 
Large-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC WorkloadsLarge-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC Workloads
 
A 1.2V 10-bit 165MSPS Video ADC
A 1.2V 10-bit 165MSPS Video ADCA 1.2V 10-bit 165MSPS Video ADC
A 1.2V 10-bit 165MSPS Video ADC
 
5 FINAL PROJECT REPORT
5 FINAL PROJECT REPORT5 FINAL PROJECT REPORT
5 FINAL PROJECT REPORT
 
intelligent braking system report
intelligent braking system reportintelligent braking system report
intelligent braking system report
 
19212757
1921275719212757
19212757
 
Ertss2010 multicore scheduling
Ertss2010 multicore schedulingErtss2010 multicore scheduling
Ertss2010 multicore scheduling
 
Multicore scheduling in automotive ECUs
Multicore scheduling in automotive ECUsMulticore scheduling in automotive ECUs
Multicore scheduling in automotive ECUs
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
 
ARM cortex A15
ARM cortex A15ARM cortex A15
ARM cortex A15
 
Nvidia tesla-k80-overview
Nvidia tesla-k80-overviewNvidia tesla-k80-overview
Nvidia tesla-k80-overview
 
Bluetooth based home appliances control
Bluetooth based home appliances controlBluetooth based home appliances control
Bluetooth based home appliances control
 
General Purpose Computing using Graphics Hardware
General Purpose Computing using Graphics HardwareGeneral Purpose Computing using Graphics Hardware
General Purpose Computing using Graphics Hardware
 
EMERSON EDUARDO RODRIGUES wcdma-optimization-related-questions-m-com-academy
EMERSON EDUARDO RODRIGUES wcdma-optimization-related-questions-m-com-academyEMERSON EDUARDO RODRIGUES wcdma-optimization-related-questions-m-com-academy
EMERSON EDUARDO RODRIGUES wcdma-optimization-related-questions-m-com-academy
 
Scaling Green Instrumentation to more than 10 Million Cores
Scaling Green Instrumentation to more than 10 Million CoresScaling Green Instrumentation to more than 10 Million Cores
Scaling Green Instrumentation to more than 10 Million Cores
 
An_FPGA_Based_Passive_K_Delta_1_Sigma_Modulator
An_FPGA_Based_Passive_K_Delta_1_Sigma_ModulatorAn_FPGA_Based_Passive_K_Delta_1_Sigma_Modulator
An_FPGA_Based_Passive_K_Delta_1_Sigma_Modulator
 
Design & implementation of 16 bit low power ALU with clock gating
Design & implementation of 16 bit low power ALU with clock gatingDesign & implementation of 16 bit low power ALU with clock gating
Design & implementation of 16 bit low power ALU with clock gating
 
Chapter1
Chapter1Chapter1
Chapter1
 

Más de Linaro

Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloDeep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloLinaro
 
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta VekariaArm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta VekariaLinaro
 
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua MoraHuawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua MoraLinaro
 
Bud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qaBud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qaLinaro
 
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018Linaro
 
HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018Linaro
 
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...Linaro
 
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Linaro
 
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Linaro
 
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Linaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineLinaro
 
HKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteHKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteLinaro
 
HKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopHKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopLinaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineLinaro
 
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allHKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allLinaro
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorLinaro
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMULinaro
 
HKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MHKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MLinaro
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation Linaro
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootLinaro
 

Más de Linaro (20)

Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloDeep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
 
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta VekariaArm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
 
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua MoraHuawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
 
Bud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qaBud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qa
 
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
 
HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018
 
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
 
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
 
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
 
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
HKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteHKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening Keynote
 
HKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopHKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP Workshop
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allHKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMU
 
HKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MHKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8M
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted boot
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 

Último (20)

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 

LCE12: big.LITTLE TC2 update

  • 1. 1 Update on big.LITTLE on TC2 Morten Rasmussen Technology Researcher
  • 2. 2 Agenda  big.LITTLE Software solutions overview  ARM's Test Chip 2 overview  Benchmarking Methodology and Use Cases  IKS status update  big.LITTLE MP status update
  • 3. 3 big.LITTLE overview  Performance and power efficiency in one system: Cortex-A15 vs Cortex-A7 Performance Cortex-A7 vs Cortex-A15 Energy Efficiency Dhrystone 1.9x 3.5x FDCT 2.3x 3.8x IMDCT 3.0x 3.0x MemCopy L1 1.9x 2.3x MemCopy L2 1.9x 3.4x
  • 4. 4 IKS solution – Basics  In-Kernel Switcher (IKS):  Targeted first generation big.LITTLE products. Cortex-A7 Cortex-A15 Kernel scheduler IKS Task 1 Task 2 Logical CPU ?
  • 6. 6 ARM’s Test Chip 2 (TC#2): An Overview  A Versatile Express core tile publically available:  Capabilities  2 x A15 (r2p1) @ up to 1.2 Ghz  3 x A7 (r0p1) @ up to 1Ghz  CCI/DMC/GIC/ADB (r0p0)  DMA (PL330)  2GB external DDR2 memory @ 400Mhz  64k internal SRAM  Coresight debug (including JTAG and ITM trace but no STM)  No GPU  cpufreq support: Independent for each cluster with limited voltage scaling  cpuidle support: Cluster power gating TC2
  • 7. 7 Benchmarking Methodology Results Performance Power Configurable: - CCI - ftrace - streamline CSV config: - Use case - Scheduling model - Numbers of cores to use - Scaling governors  Automated system for running user workloads on target device Choose workload Choose CPU mode: Cortex-A7, Cortex-A15, Migration (cluster or CPU), or MP Choose active cores in each cluster TC2: 1-2 big, 1-3 LITTLE Choose DVFS governor: Interactive, performance, powersave, ondemand Extensible – parameterisation
  • 8. 8 IKS solution  Targeted first generation big.LITTLE products. Cortex-A7 Cortex-A15 Kernel scheduler IKS Task 1 Task 2 Logical CPU ?
  • 9. CONFIDENTIAL9 IKS: CPU Migration  big.LITTLE extends DVFS  DVFS algorithm monitors load on each CPU  When load is low it can be handled on a LITTLE processor  When load is high the context is transferred to a big processor  The unused processor can be powered down  When all processors in a cluster are inactive the cluster and its L2 cache can be powered down
  • 10. CONFIDENTIAL10 IKS: CPU Migration  big.LITTLE extends DVFS  DVFS algorithm monitors load on each CPU  When load is low it can be handled on a LITTLE processor  When load is high the context is transferred to a big processor  The unused processor can be powered down  When all processors in a cluster are inactive the cluster and its L2 cache can be powered down
  • 11. 11 IKS: OPP mapping to A7 / A15 on TC2  Virtual Frequency maps OPPs to big or LITTLE cores Virtual OPP Physical OPP A7 Physical OPP A15 Voltage A7 350000 350000 V1 400000 400000 V1 ... X X V1 800000 800000 V1 900000 900000 V2 1000000 1000000 V3 A15 1200000 600000 V1 1400000 700000 V1 ... X 2X V1 2000000 1000000 V1 2200000 1100000 V2 2400000 1200000 V3
  • 12. 12 IKS: Results for Audio on TC2  Power compared to executing the use case on A15  IKS does not use A15s during Audio run 70% saving TC2: A15 up to 1.2 GHz A7 up to 1 GHz Better results expected on representative silicon.
  • 13. 13 IKS: Results for BBench + Audio on TC2  Performance is measured as from page loading times of BBench  Results normalised to power and performance consumed on same use case run on A15 only BBench page + Audio TC2: A15 up to 1.2 GHz A7 up to 1 GHz Better results expected on representative silicon.
  • 15. 15 IKS: Interactive governor on TC2 if (cpu_load >= go_hispeed_load){ ... new_freq = max_freq * cpu_load / 100; ... } else { ... new_freq = hispeed_freq*cpu_load/100; ... }  For A15 on TC2 with a go_highspeed at 85% (default) this algorithm only uses overdrive section of A15  Approach is to introduce a second point of inflection:highspeed2
  • 17. 17 IKS: Results: Bbench + Audio  Power improves with no performance cost BBench page + Audio TC2: A15 up to 1.2 GHz A7 up to 1 GHz Better results expected on representative silicon.
  • 19. 19 MP solution – more details  Scheduler modifications:  Treat big and LITTLE cpus as separate scheduling domains.  Use PJT's load-tracking patches to track individual task load.  Migrate tasks between the big and the LITTLE domains based on task load.  Patch set available through Linaro. L BB L Load balance Load balance Load-based task migration Task load Task state Executing Sleep Load decay
  • 20. 20 MP: Experimental Implementation  Scheduler modifications:  Apply PJTs’ load-tracking patch set.  Set up big and little sched_domains with no load-balancing between them.  select_task_rq_fair() checks task load history to select appropriate target CPU for tasks waking up.  Add forced migration mechanism to push of the currently running task to big core similar to the existing active load balancing mechanism.  Periodically check (run_rebalance_domains()) current task on little runqueues for tasks that need to be forced to migrate to a big core. L BB L load_balance load_balance select_task_rq_fair()/ Forced migration
  • 21. 21 MP: ARM TC2: Audio  Workload: Audio (mp3 playback)  Performance/Energy target:  A7 energy  Status:  Audio related task do not use A15s, but the power consumption is still significantly more than A7 alone.  MP not as power efficient as IKS yet  Todo:  Target spurious wake-ups on A15. All the extra power comes from the A15's which shouldn't be used at all. Energy A7 30.79% MP 39.86% 0 10 20 30 40 50 60 70 80 90 100 Audio A15 A7 2CPU IKS MP Energy TC2: A15 up to 1.2 GHz A7 up to 1 GHz Better results expected on representative silicon.
  • 22. 22 MP: Audio workload analysis  Where is the extra energy spent with MP?  Need a look at why A15's consume power when they are not necessary. A7 MP 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 Audio energy breakdown A15 cluster A7 cluster Energy hrtimer functions cpu0 cpu1 cpu2 cpu3 cpu4 hrtimer_wakeup 2 2 1212 417 190 tick_sched_timer 404 58 483 507 779 WQ functions cpu0 cpu1 cpu2 cpu3 cpu4 vmstat_update 30 2 27 25 28 cache_reap 15 2 14 13 14 phy_state_machine 31 0 0 0 0 Enter idle cpu0 cpu1 cpu2 cpu3 cpu4 0 6 2 2379 260 423 1 801 807 8316 9373 9652 TC2: A15 up to 1.2 GHz A7 up to 1 GHz Better results expected on representative silicon.
  • 23. 23 Scale invariant load  Load accumulation rate does not scale with available compute capacity (frequency, big/LITTLE cpu)  Currently, there is no link between cpufreq and the scheduler  Tasks may be migrated away from a cpu at low frequency by the scheduler before cpufreq has increased the frequency to match the cpu load.  Scaling the tracked load accumulation to match the current frequency mitigates this issue.  Tasks cannot accumulate enough load at low frequency to trigger migration and must wait for cpufreq to react first. Freq = x Freq = 2x
  • 24. 24 Scale invariant load 76782.1 76782.2 76782.3 76782.4 76782.5 76782.6 0 200 400 600 800 1000 76332.95 76333.05 76333.15 76333.25 76333.35 76333.45 0 200 400 600 800 1000 Original Frequency invariant
  • 25. 25 Load accumulation rate  For some workloads tracked load saturates too fast and leads to unnecessary task migrations.  Extending the tracked load history reduces tracked load variations due to sudden changes in the load characteristics.  Increasing the y factor in the load expression decreases the load accumulation and decay rates. load= u0+u1⋅y+u2⋅y 2 +…+un⋅y n 1024+y+ y 2 +…+ y n +1 1 21 41 61 81 101 6 11 16 26 31 36 46 51 56 66 71 76 86 91 96 106 111 116 121 126 131 136 141 146 151 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 y=0.9785 Time [ms] y<1,0⩽u<1024
  • 26. 26 Load accumulation rate  Increasing y leads to a more conservative tracked load  Should lead to less up/down migrations  Increases up/down migrations delay for tasks that needs to be migrated. 1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97 4 10 16 22 28 34 40 46 52 58 64 70 76 82 88 94 100 103 106 109 112 115 118 121 124 127 130 133 136 139 142 145 148 151 154 157 160 163 166 169 172 175 178 181 184 187 190 193 196 199 Load accumulation rate Task y=0.9785 y=0.9844 y=0.9922 Time [ms] Trackedload
  • 27. 27 MP – Top Issues  Spurious wakeups  A15s are woken up by scheduler ticks (mainly)  Workqueues  Timers  RCU  cpu wakeup prioritisation  Pick the cheapest target cpu  Global balancing  Spread load to A7s when A15s are overloaded  Pack vs. spread  Cluster aware cpufreq governors