Evaluating Impact of Soft Errors on Embedded Systems
1. Evaluating Impact of
Soft Errors On
Embedded Systems
T e x a s I n s t r u m e n t s
I n c o r p o r a t e d
6 6 / 3 B a g m a n e T e c h P a r k ,
C V R a m a n N a g a r ,
B a n g a l o r e
6 / 1 1 / 2 0 1 6
Abhishek Kumar Roushan
The report gives an overview of soft errors- a
prominent problem present in Embedded
Systems and Integrated Circuits due to scaling,
and the steps taken by us to mitigate the
impact of those errors which tend to result in
faults generally present at output. The
mechanism shows an assembly of a generic
control system (Motor control) and
experimentation with fault injection to have a
better understanding of coverage of these soft
errors.
2. 1
BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE, PILANI
END SEMESTER THESIS REPORT
ON:
EVALUATING IMPACT OF SOFT ERRORS ON EMBEDDED SYSTEMS
PREPARED BY
ABHISHEK KUMAR
ROUSHAN
2012A8PS309P B.E (Hons.)
ELECTRONICS AND
INSTRUMENTATION
PREPARED IN
PARTIAL FULLFILMENT OF THE COURSE
PRACTICE SCHOOL (PS)-II
AT
TEXAS INSTRUMENTS, INDIA, PVT LTD.
INSTRUCTOR-IN CHARGE:
DR. SATYA SUDHAKAR YEDLAPALLI
3. 2
I. Acknowledgements
Quoting a natural phenomenon-as every plant needs stem, roots
leaves to flourish - it would be demeaning if I won’t mention the
people who have helped me to take this project to a sustainable
level.
I am extremely thankful to Texas Instruments, India to provide me
this wonderful regalia of learning and experimentation here at their
Bengaluru center, which has been the prime drive responsible for
the heyday of the project.
I would like to extend thanks to my mentor Mr. Prasanth V-MCU
(C2000) for his incisiveness and assignment of the project. He has
been at paragon in what he does and has been a tremendous guide
for technicalities.
I would also like to thank Mrs. Jaya Singh, MGTS- MCU unit, for
explaining the impact of Texas Instruments microcontrollers and
microprocessors and making me understand the bigger picture of
the projects Texas Instruments is involved into.
There are some of the team members too who I would like to thank
for all the support and encouragement that they have given and
will continue to give whenever I face some problem. Therefore I
would like to thank Sudha, Rahul, Sonal, Ashish, Rupin, Narandra,
Rithin, and Prashant for all the help and vision they have provided.
I would also acknowledge my PS Faculty in charge Dr. Satya
Sudhakar Yedlapalli for his valuable guidance into running the
project and gracious feedback on the project presentation and
company working environment errands.
4. 3
TABLE OF CONTENTS
I. Abstract…………………………4
Chapter 1 Introduction…………………….5
Chapter 2 Fault Injection Simulation…….9
Chapter 3 Fault Injection Tools…….........12
Chapter 4 3 Phase Motor Drive
Control………………………...13
Chapter 5 Project Proposed Approach with
Results and Future
Work…………………………...14
i. List Of Acronyms………..........18
ii. References……………………..19
5. 4
ABSTRACT
Soft errors are one of the biggest reliability challenges for present day
electronic devices and their contribution to overall device failure rate is
increasing with technology scaling. They have the potential to cause
failures in embedded systems during run-time and are a major concern in
safety critical applications, such as pacemakers.
To address this issue, selective hardening techniques have been
proposed in the past, aiming at providing cost-effective soft error
resilience.
This report proposes that the cost-effectiveness of such a soft error
mitigation technique can be further improved by taking advantage of the
repeated nature of firmware execution in an embedded system and the
firmware’s inherent ability to mask soft errors. This approach is applied
on a module by performing fault injection analysis on the circuit during
application execution.
The main aim of the project design is to inject faults at the module level
of an embedded system- which would act as the simulation of alpha
particle striking embedded systems that are responsible for soft errors.
Thus by observing the behavior of the device under test to the faults
injected, we can map a pattern to which all vulnerable nodes need to be
serviced in the particular system level application.
6. 5
Chapter 1: Introduction
The faults injected in a system are done with a very systematic approach.
Ironically though, the faults in the system can be caused by some
random highly energized particle in the environment, the procedure for
the testing of a system using fault injection is very formal.
The flow chart below describes the step by step implementation of the
fault injection in a particular system. It is not necessary that every fault
injected in the module may map to a certain error in the output of the
module. That is the reason, some faults are called benign, while some
can be manifested into dangerous, the meanings of which are pretty
clear.
Fig 1: Flow chart describing the outcomes of a fault injection to its classification
7. 6
1.1 Soft Errors At A Glance
Embedded systems are used for a variety of purposes in various domains
today. They comprise of applications which run in the form of firmware
executing on embedded hardware inside the system. These systems are
often real-time in nature (e.g. (ECUs) in automobiles) and have distinct
reliability requirements, such as
(i) Resilience to manufacturing and life-time defects.
(ii) Robustness to transient errors (or soft errors) during operation.
For instance, due to the strict quality requirements of
automotive devices, several standards such as AEC-Q100,
IEC61508, AUTOSAR and ISO26262 have come up which
guides the requirement in automobile electronics to detect,
correct or tolerate errors before they cause a failure leading to a
potential hazard.
Manufacturing and life-time defects must be taken care of through use of
DFT techniques and application of a large number of defect screening
and life-time stress tests at manufacturing time whereas, soft errors must
be taken care through design for transient error tolerance techniques,
providing real-time error mitigation/correction during normal operation
of the system.
Some of the most common approaches to address the transient fault-
tolerance involve the use of ECC for memories and providing robustness
in flip-flops, either in the form of new hardened flip-flop structures, such
as DICE cells, or encoding of flip-flop groups into code words.
However, since these mechanisms are based on either spatial
redundancy or temporal redundancy, they incur penalties in terms of
area or speed respectively.
In this work, we are primarily focused on evaluating the impact of
injecting different types of faults and understand their effects to make
the design better.
8. 7
1.2Motivation
The soft errors-commonly referred as errors caused to an
embedded system due to certain foreign particles such as alpha
particles, resulting in failure in output of the system; are a sure
necessity to understand. The lifetime of an IC is given by a bathtub
curve, indicating its Mean time between failures, Mean time to
repair etc. The curve is shown below. To optimize these
parameters and to make sure that the system doesn’t go out of
order unusually during the time it is supposed to functionally do
well, the knowledge of impact of soft errors and different types
enable the manufacturer to serve the customer better.
Fig2. Comparison of reliability bath-tub curve for normal device and soft-errored
device
By mitigating the impact of soft errors, we can increase the useful
life duration of an object and can reduce the slopes of wear out
phase.
9. 8
1.3 Impact of the outcome
Almost each gadget that we use or desire to use contains and
embedded system either in the form of an IC or FPGA based end
purpose running tool. Be it the airbags that are used as safety
equipment in some cars, or for that matter an aircraft safety eject
machine system- we as a customer want that the machine that we
buy should behave safely.
By safe behavior we mean,
i. Either it generates a warning before fatal error and then just
shut down, or
ii. It waits for some mechanism to mitigate the error impact and
take the machine to a safe place where it can work again.
Since the impact of different nature of faults like permanent,
transient, upsets on the system has not been done much
comprehensively, we seek to design an experiment to evaluate the
impact of soft errors on these embedded systems with more
understanding and knowledge to arrive at some fool proof design
models to take the system to a higher safety integrity.
10. 9
Chapter 2: Fault Injection Simulations
The basic requirement of a fault injection simulation is a working
control system using hardware as an embedded system for control.
Errors are deliberately inserted in the module(s) depending on the
multiplicity of fault injection. By multiplicity, we extend the
parallelism of the software intended to inject faults in the system.
Multiplicity of greater than one would surely increase pressure on
the processor, but decrease the fault injection time.
These fault injections act like the soft error alpha particle upsets
caused in the system outputs. If the system behaves correctly- it
should either mask the error (no fault output after strobing or give
fault output as expected on particular strobes. A deviation from the
cases above will tend to classify the errors in certain categories as
permutation of Dangerous & Safe and Detected & Undetected.
There are basically three phases in fault injection- Elaboration,
Good Simulation and Fault Simulation Injection. Each step is
performed one after other and there is no way out of this
systematic approach.
11. 10
2.1 Control System
The control system that we intend to use in our design is closed
loop control of 3 phase BLDC motor using Hall Effect sensors.
The control loop is shown in the figure below.
The PWM module is used to drive the three phase BLDC motor
using inverter circuit. The Hall Effect sensors determine the
position and velocity of the rotor.
Based on the captured spikes of the sensors by CAPTURE module,
the speed is sent to be compared with the reference speed.
The processor controls the PID parameters Kp, Ki, and Kd based
on the error detected. The output is the width of PWM needed for
tolerable operation of the motor.
Fig 3: Three phase BLDC motor control using hall effect sensors.
12. 11
Delving into more technicalities, the tools used by us are as
follows.
The CAPTURE and PWM modules used are part of Texas
Instruments processor’s peripherals and are used as shown. The
motor control i.e PID control and Hall effect sensor readings are
controlled by Texas Instruments 3 phase BLDC motor driver
DRV8312 (available in Texas instruments official website for
purchase).
As stated above, we are controlling a 3 phase (BLDC) motorsand
measuring its output for any fail safe or fail shut mechanism.
13. 12
Chapter 3: Fault Injection Tool
The modules marked in black (fig 3)in the previous chapter are
used as fault injection targets in the system.
We are using Cadence Systems Incorporated (IFSS) to perform all
the phases of fault injection aforementioned in the previous
heading.
The following is an output snippet of the Cadence tool that we are
going to use as fault injection mechanism. The tool is capable of
handling faults with multiplicity greater than one. Also it has some
extra features like logical collapsing and formal verification etc.
All the faults as specified by ISO 26262 are the only handled by
this tool.
14. 13
Chapter 4: 3-Phase Motor Drive Control
The motor is three phase BLDC motor with 3 phase stator
windings and permanent rotor magnets. Thus the commutation of
the rotor coils can be attributed to three phase winding and number
of rotor poles. The Texas Instruments Microprocessor control the 3
phase generation of the waveforms required so that there is no
commutation locking. The waveform generation for 3 phase
control of BLDC motor was taken care of.
Fig 4: Module level three phase BLDC control using three hall sensors
The figure is provided to give an understanding of modular level
control of the 3 phase motor. The processor provides the PWM
signals to drive the IGBT driver, which is then used to drive the
inverter circuit controlling the three phases of the motor. Hall
Effect sensors mounted on the rotors will measure the position and
time, and report back to the processor. This feedback controls the
width of the PWM signals that are further used to drive the circuit.
15. 14
Chapter 5: Project Proposed Approach
with Results and Future Work
Following is the proposed approach for fault injection in the
modules:
● Implement a closed loop motor control.
Assign system output tolerance.
Get the output ranges of modules corresponding to assigned
system output tolerance
● Map the fault outputs of PWM and Capture module to respective
module inputs. This part is done by the Cadence tool mentioned
earlier for fault injection.
● Alter the design (ex. flip flop arrangement) either offline or
mitigating them online such that the dangerous fault nodes become
safe fault nodes.
5.1 Results Expected
Following table is a conventional FV based performance analysis
to have a comparison.
This compares the savings of flip flops for various modules in
different modes. Our aim is to increase the benchmark parameters
like flip flop overhead reduction for circuits.
Thus the performance metrics for my experiment would be:
1. Total number of safe flip flops
2. Reduction in number of fault nodes
16. 15
TableI: Percentage savings and safe flip flops using FV based approach
Table II: Performance Metric Comparison of formal and proposed
approaches
Module Mode Functional
Verification based
approach only
Proposed approach
using Cadence tools and
Texas Instruments
design + test-bench
Percentage comparison of safe flip flops
PWM Up 42.86% 67.48%
Down 42.86% 68.2%
Up-down 35.71% 58.97%
CAP 85.64% 86.38%
Percentage reduction in number of fault nodes to inject faults
PWM - 38% 59.68%
17. 16
Fig 5: Comparison of total number of fault nodes remaining to inject faults
for two approaches.
We know this fact that lesser the number of fault nodes remain to inject
faults; the lesser will be the time and memory overhead. In this manner
the proposed approach performs way better than the FV based approach
as can be seen from the figure above.
0
1
2
3
4
5
6
7
8
1 2 3 4 5 6
FV approach
Proposed
Aproach
Total nodes in system (in 1000’s)
Total
numb
er of
fault
node
s (in
100’s)
18. 17
5.2 Future Work
The future work list in direction of fulfillment of the project would
be as follows:
1. Incorporation of safety against dangerous faults in the system
level by modification in the modular level design.
2. Collective fault injection on other control systems.
3. Modification of safety mechanism in modules for increased
robustness.
4. Implementation of device run time fault injection based
verification.
19. 18
LIST OF ACRONYMS
ECU – Electronic Control Circuits
DFT – Design for Testability
ECC – Error Correcting Code
IC – Integrated Circuits
FPGA – Field Programmable Gate Array
MCU – Micro Controllers Unit
ECAP – Enhanced Capture
EPWM – Enhanced Pulse Width Modulation
PID – Proportional Integral Derivative
BLDC – Brush Less Direct Current
IFSS – Incisive Functional Safety Simulator
ISO – International Standardization Organization
DICE- Dual Interlocked Cell