GreenDroid is a mobile application processor architecture that aims to improve energy efficiency. It utilizes "dark silicon" - silicon that cannot be actively switched due to power constraints - by filling it with specialized "c-cores" optimized for running common tasks. The c-cores provide up to an 18x reduction in energy usage compared to general purpose cores. GreenDroid's tiled architecture contains c-cores alongside general purpose cores and caches. It can execute Android applications and system software with 11 times less energy than traditional architectures.
2. CONTENTS
Utilization wall and Dark
silicon
C-core
Greendroid and its
Architecture
C-core Energy efficiency
Conclusion
3. INTRODUCTION
This seminar emphasizes on increasing the
utilization of rarely used silicon called
Dark Silicon for an energy efficient
architecture in android.
GreenDroid attains this by filling the dark
silicon with specialised cores
4. WHAT IS
GREENDROID?
A mobile application processor
45-nm multicore research prototype
Targets the Android mobile-phone software
stack.
Can execute general-purpose mobile programs
with 11 times less energy
5. With each successive generation, the percentage
of a chip that can actively switch drops
exponentially due to power constraints
• A direct consequence of this is Dark Silicon
limits the utilization of the application
processors
UTILIZATION WALL
6. WHAT DO WE DO WITH
DARK SILICON??
Goal: Leverage Dark silicon for more efficient architecture
Approach:
1. Fill dark silicon with specialised cores to save energy on
common apps.
2. Provide focused re-configurability to evolving
workloads
9. 50% C-cores
25% D-cache
25% MIPS core,
I-cache and on-chip
network
GreenDroid Tile Floor plan
10. CONSERVATION CORES
Specialized cores for reducing
energy
Hotcode run by c-cores,and cold
code runs on host cpu
C-cores uses upto 18x less energy
Fully automated toolchain
No “deep” analysis required
C-cores automatically generated
from hot program regions.
D cache
Host
CPU
(general purpose)
I cache
Hot code
Cold code
C-Core
11. ANDROID
Google’s OS+app.
Environment for mobile
devices
Java applications run on
the Dalvik virtual machine
Apps share a set of libraries
(libc,OpenGL,SQLite,etc)
APPLICATIONS
LIBRARIES
DALVIK
CACHE
HARDWARE
LINUX KERNEL
12. Applying C-cores to
Android
Android well suited for c-cores
Core set of commonly
used applications
Libraries are hot code
Dalvik virtual machine is
hot code
Libraries,Dalvik,kernel
and application hotspots
c-cores
APPLICATIONS
LIBRARIES
DALVIK
CACHE
HARDWARE
LINUX KERNEL
C-CORES
13.
14. C-CORE ENERGY EFFICIENCY
c-cores don’t requires overheads.
specialization of the c-cores’ data path.
energy drops from 91 pJ per instruction to just 8 pJ
per instruction. D-Cache
6%
I-Cache
23%
Fetch/D
ecode
19%
Register
,
14%
Datapat
h
38%
D-cache
6%
Datapath
3%
Energy
Saved,
91%
0, 0%
C-cores 8pJ/instr Baseline CPU 91pJ/instr
15. ADVANTAGES
It saves energy by using
specialized cores called
conservation cores
C-cores span approximately
95percent of execution time
Reduce processor energy
consumption by 91 percent for
hot code.
15
16. C0NCLUSION
Over the next 5 to 10 years, the
amount of dark silicon will increase
exponentially.
c-cores technique converts dark
silicon into energy savings.
16
It has a specially built structure that can analyze a current Android phone and determine which apps, and which CPU circuits the phone is using the most. Then it can dream up a processor design that best takes advantage of those usage habits, creating a CPU that’s both faster and more energy efficient.
Convert the cores into verilogs that has this specialised core injected into it.
We just turn on the cores we needed when we need them.
Execution model is by jumping from c-cores to c-cores and for each loop we have we are running specialised hardware that’s been targeted for just that loop.
Trading area, which is dark anyways for energy efficiency,
C-core sents all the memory accessing through the data cache that is shared by host cpu
If it’s a code that’s not executed so much then its execute in host cpu and then while we have hotspots , we jump over to specialised piece of HW, and we don’t have to transfer any data , because data is already in the shared data cache, so allows to jump back and forth very quickly and very efficiently. We generate c-cores using fully automated tool chain. The tool chain generate synthesizable Verilog and at the same time integrate c-cores into the software, it does this by inserting function steps into the application that called the c-cores during the run time
This simple transformation get u about 18x less energy for the code they target, without even trying to parallelise the code
The diag shows android software stack running on typical hardware.
Applications are written in java and compiled to run in DVM
The application also call in a set of libraries including libc,opengl,etc
This software model makes android a great fit for software models
This is because Android runs a core set of commonly used applications,eg.web browser,email and various media player.
This application rely on DVM and libraries making this part of the SW stack particularly hot code.
We can also target specific hotspots from certain applications and linux kernel
We can convert all of these hotspots into conservation cores for great energy savings.
Another reason is the relatively short replacement cycle of the handsets. Most of the android phones are used for only 2-3yrs
We can continuily develop new c-cores as more application appear and become popular.
At the same time the c-cores interface allows us to remove the c-cores at any time without affecting the system. Because the application can fall back to the general purpose host CPU.
So we have been applying this c-core technique to android environment and actually extract this hot spots from android and the building a chip .
The fig on ri8 is the output of a layout tool basically shows 9 different c-cores clusterd around the datacache with a processor on the left
Look on the left it is the breakdown of energy for one of the very efficient processor, and on the ri8 is one of the c-cores
The main benefit is that we got rid of all the overheads in executing an instruction, we dont have an instruction cache so there is no fetching and decoding of instr. There is no big reg file to write operands to and even most of the data path is eliminated.
All that left is data cache and a little sliver of the datapath where the actual computation takes place