3. Abstract
A low power processor for embedded systems is designed and
implemented. The proposed processor can operate on RV32E
instruction set architecture using a modified MIPS micro-
architecture. Clock gating technique and Standby mode are
applied to reduce power consumption.
This Photo by Unknown author is licensed under CC BY.
4. INTRODUCTION
Embedded systems are ubiquitous. They are
everywhere to control or operate various devices
or equipment to satisfy the need of a user. With
advancing technology, these embedded systems
are now able to communicate among themselves,
thus creating an internet of things or IoT. To prolong
the battery life of the embedded system, the
processor must consume as little power as possible.
Design of a low power processor is therefore
of great importance. This paper presents the design
of a low power processor intended for use in
embedded systems where power, not performance,
is of greater concern.
This Photo by Unknown author is licensed under CC BY-ND.
5. ARCHITECTURE
• The proposed processor is designed based on
MIPS microarchitecture , capable of operating
on the RV32E base integer instruction set, which
is one of the four RISC-V standard base
instruction set architecture (ISA) . RISC-V is a
standard free and open instruction set
architecture which can be used in research and
education.
• RV32E Instruction Set The RV32E instruction
set is a reduced version of RV32I, intended for
use in embedded systems. To reduce hardware
complexity of the processor, RV32E assumes
only 16 32-bit integer registers as compared to
32 registers in RV32I. RV32E also removes all the
counters that are mandatory in RV32I.
This Photo by Unknown author is licensed under CC BY-SA.
6. MIPS Micro-Architecture
Micro-architecture is a hardware implementation of a particular ISA. Among all existing micro-
architecture, MIPS micro-architecture is one of the most popular and widely adopted for many
commercial processors, such as ARMv7, ARMv8, R-series, and PICs.
MIPS micro-architecture employs Harvard architecture which separates instruction
memory from data memory. The MIPS micro-architecture operates on each of the instructions in
five phases: fetch, decode, execute, memory access, and register write back.
Cache Memory
Cache memory is crucial to the overall performance of the processor. Cache memory can be unified or
split for instruction and data. Cache mapping techniques include direct mapping, associative mapping,
or set-associative mapping. Performance of the processor depends deeply on the hit/miss ratio of the
mapping technique implemented
7. PROPOSED DESIGN
Our proposed design of the processor is thus based on a non-
pipelined version of the MIPS micro-architecture in order to
reduce power consumed overall.
Fig. 2 displays the top-level design of the proposed
processor, which consists of the top clock gating unit, the
control unit, and the data path unit.
8.
9. OPERATION
The processor operates across multiple clock cycles in
three consecutive states as shown in Fig. 4.
In Execute state, the processor performs these following
tasks in order:
This Photo by Unknown author is licensed under CC BY-SA.
10. 1) Decode-- The processor decodes the instruction saved in the instruction
register to extract op code which can be either seven-bit or three-bit long, source
register, destination register, and immediates if specified. Microcode controller
then takes the op code and generates sequence of control signals for the next
three tasks.
2) Arithmetic/Logic Operation ---- The microcode controller
sends the operands, which can be data in registers or
immediates, to the arithmetic/logic unit (ALU) to perform
arithmetic/logic operation according to the op code. There is a
comparator at the entrance of the ALU to compare source
registers for branching instruction.
11. 3) Memory Access --- If the instruction requires a memory access, such as in LOAD or STORE
instructions, the
processor will wait in this state until the operation is finished.
4) Write Back. The result is written back into the destination register specified and the PC is
updated.
Once the Execute state is completed, the processor will enter Standby state if the instruction is
STANDBY
In general, the processor requires six clock cycles for each operation: two for Fetch and Execute,
two for instruction cache hit, and two for data cache hit
12. B. Clock Gating
Clock gating is employed at the top
level to block off the clock feeding
the processor to further reduce the
overall power consumption. When
the clock is blocked, the output will
be held at the last value. Clock
gating circuit, shown in Fig. 5, is
readily available in the technology
library. Its behavior is described by
the function table in Table 1. Fig. 6
is an example of timing diagram for
the involved signals.
This Photo by Unknown author is licensed under CC BY-SA.
13.
14. IMPLEMENTATION
The proposed design
is first implemented and
simulated at the Register Transfer
Level (RTL) using Verilog® to checK
its functionality. The design is
then translated, mapped,
and optimized into 180 nm CMOS
technology level using cell design
library for 1.8 V operating supply.
The resulting transistor level
design is validated with the RTL
level design to ensure correct
functionality.
This Photo by Unknown author is licensed under CC BY-SA.
15. A. Area and Performance
Table 2 is the breakdown of the area taken by each
module and sub-circuit in the processor. From the
table, it can be seen that the register file consumes
the most area on the processor at 56.9%, while the
control unit occupies minimal area at only 1.2%. The
overall area is equivalent to about 7800 gates for the
CMOS technology used.
16. B. Power
Consumption
The processor consumes current
differently when it is in normal
mode (i.e. Fetch and Execute
states) and sleep mode (i.e.
Standby state). In normal mode,
the processor consumes
approximately 189 µA on average,
compared to only 11.1 µA in sleep
mode. With top clock gating, the
processor will use much less
current when it is slept.
17. CONCLUSION
A low power processor for embedded
systems is designed and implemented.
The proposed processor can operate on
RV32E instruction set architecture
using a modified MIPS micro-
architecture. Top level clock gating
technique and Standby state are
introduced to reduce overall
power consumption.
For performance, the proposed
processor can operate at a maximum
clock frequency of 32 MHz, with an
average current consumption of 189 µA
in normal mode and 11.1 µA in sleep
mode, or 5.68 µW/MHz
This Photo by Unknown author is licensed under CC BY-SA-NC.
18. REFERENCES
1] STMicroelectronics, “Ultra-low-power 32-bit MCU Arm®-based Cortex®-
M0+, up to 192KB Flash, 20KB SRAM, 6KB EEPROM, USB, ADC, DACs, AES,”
STM32L082xx datasheet, Sep. 2015 [Revised Sep. 2017].
[2] P. Davide Schiavone et al., "Slow and steady wins the race? A
comparison of ultra-low-power RISC-V cores for Internet-of-Things
applications," 2017 27th International Symposium on Power and Timing
Modeling, Optimization and Simulation (PATMOS), Thessaloniki, 2017, pp.
1-8.
[3] C. Duran et al., “A System-on-Chip Platform for the Internet of Things
featuring a 32-bit RISC-V based Microcontroller”, 2017 IEEE 8th Latin
American Symposium on Circuits & Systems (LASCAS), Bariloche,
Argentina, 2017.
[4] D. K. Dennis et al., “Single cycle RISC-V micro architecture processor
and its FPGA prototype,” 2017 7th International Symposium on Embedded
Computing and System Design (ISED), Durgapur, India, 2017, pp. 1-5.
[5] E. Gür et al.,“FPGA Implementation of 32-bit RISC-V Processor with
Web-Based Assembler-Disassembler”, 2018 International Symposium on
Fundamentals of Electrical Engineering (ISFEE), Bucharest, Romania, 2018.
19. REFERENCES
6] C. Piguet, “Ultra-Low-Power Processor Design,” in High-Performance
Energy-Efficient Microprocessor Design. New York, NY, USA: Springer,
2006, pp. 1–30.
[7] D. A. Patterson and J. L. Hennessy, Computer organization and
design: the hardware/software interface, 5th ed. Amsterdam: Morgan
Kaufmann, 2014.
[8] A. Waterman and K. Asanovic, “The RISC-V Instruction Set Manual,
Volume I: User-Level ISA, Document Version 2.2”, RISC-V Foundation,
May 2017.
[9] RISC-V, “GNU toolchain for RISC-V, including GCC,”. [Online].
Available: https://github.com/riscv/riscv-gnu-toolchain. [Accessed Nov.
22, 2018].
[10] ARM Limited, “Cortex-M0”. [Online]. Available:
https://developer.arm.com/ip-products/processors/cortex-m/cortexm0.
[Accessed Apr. 27, 2019].
[11] N.H.E.Weste and D.F. Harris, CMOS VLSI Design: A circuits and
designs perspective, 3rd ed., Addison Wesley, 2005, pp.137-138.