1. Generic Partial Dynamic Reconfiguration Controller
for Fault Tolerant Designs Based on FPGA
Martin Straka, Jan Kastil, Zdenek Kotasek
Brno University of Technology
Faculty of Information Technology
Bozetechova 2, Brno, 612 66, Czech Republic
{strakam, ikastil, kotasek}@fit.vutbr.cz
Abstract— In recent years, many techniques for self repairing during the PDR process, such as bitstream decompression or
of the systems implemented in FPGA were developed and they can even change bitstream structure [9] [10].
presented. The basic problem of these approaches is bigger In [11], the FT scheme for Xilinx FPGA Virtex4 is pre-
overhead of unit for controlling of the partial reconfiguration
process. Moreover, these solutions generally are not implemented sented. The scheme consists of two parts: the Partially Re-
as fault tolerant system. In this paper, a small and flexible configurable Functional Region (PRFR) with several Partial
generic partial dynamic reconfiguration controller implemented Reconfigurable Module (PRM) and reconfiguration controller
inside FPGA is presented. The basic architecture and usage of which is based on built-in Xilinx PowerPC-405 processor.
the controller in the FPGA-based fault tolerant structure are
described. The implementation of controller as fault tolerant II. M OTIVATION AND G OALS OF THE R ESEARCH
component is described as well. The basic features and synthesis
results of controller for Xilinx FPGA and comparison with Today, many techniques for self repairing of systems imple-
MicroBlaze solution are presented. mented in SRAM-based FPGA were developed and presented.
The basic problem of these approaches is a very large overhead
I. I NTRODUCTION of units controlling the reconfiguration process control. For
this purpose, the embedded processors as PowerPC or MicroB-
Digital system can be implemented on various platforms. laze can be used. The processor controlling the reconfiguration
Field Programable Gate Array (FPGA), like Xilinx SRAM- process is a critical part of the system. If the microprocessor
based FPGA family, is typical example of such platform. Many fails, then the whole system will also fail. Therefore, it is very
of the FPGA based designs are constructed as Fault Tolerant important to ensure that the software in microprocessors will
(FT) systems with the possibility of recovering from errors by always work correctly. This can be assured only by the formal
means of reconfiguration procedures [1]. Different FT archi- verification of all software tools of processors. Moreover, the
tectures are known to improve reliability in digital systems, SEU can impact the functionality of the processor itself. Most
Triple Modular Redundancy (TMR) and duplex systems can of the modern microprocessors are not built as SEU tolerant
serve as examples [2]. and therefore they fail if SEU changes the contents of their
The design of dependable FT systems in SRAM-based registers or memories. MicroBlaze is even more susceptible to
FPGA include three problems: error detection during system faults caused by SEU, since its own structure can be changed
operation, fast fault location and fast repair by means of the as well. To build effective partial dynamic reconfiguration
chip reconfiguration [3]. When errors are detected in any controller based on a microprocessor, the microprocessor
part of the system implemented into FPGA then there exist itself must be implemented as an FT component which will
a possibility to extend its lifetime [4].For this purpose, the cause an additional overhead. Fortunately, only a very limited
Partial Dynamic Reconfiguration (PDR) of FPGA circuit can functionality is required for the PRC itself. Therefore it is
be used [5]. possible to build PDR controller in the FPGA fabric.
In a SRAM-based FPGA, the combinational and sequential
logic is implemented in programmable complex logic blocks, A. Previous Work
which are customized by loading configuration data in the The activities which aim at defining a methodology for FT
SRAM cells of the program memory [6]. When an error systems design into FPGA platforms were presented in [12].
appears in a memory cell in the program memory (possibly The main principles of PDR were described together with
being striked by a charged particle), the effect can produce the FT architectures based on the PRMs in [13]. Several
an inversion in the stored value – this can modify the design architectures use online checkers or other Concurrent Error
functionality and is called Single Event Upset (SEU) [7]. Detection (CED) techniques for error detection. These error
Modern FT architectures using PDR often utilize micropro- detection techniques were described in [12]. If an error is
cessors embedded into FPGA such as PowerPC or MicroB- detected by checker in unit, the Partial Reconfiguration Con-
laze [8]. These microprocessors can perform complex tasks troller (PRC) initiates reconfiguration process of this unit. The
978-1-4244-8971-8/10$26.00 c 2010 IEEE
2. simple structure of methodology for FPGA-based FT system the acknowledgment of storage controller that data sent from
design with PRC inside of FPGA can be seen in Figure 1. memory are correct.
The output signals of GPDRC indicate hard error occurrence
in out in any PRM and its index. Sync output signal provides
FT architecture 1 FT architecture 2 ...
PRM PRM PRM
FT architecture n external synchronization of PRM in FT architecture after the
PRM PRM PRM PRM
errors
reconfiguration process. Bus addr bitstr contains address to the
bus
memory containing the required byte of bitstream.
DYNAMIC PART
Partial Partial
ICAP
Reconfiguration
Reconfiguration valid
in
Others Controller
Controller
unit FPGA #Err-PRMs Hard
STATIC PART Generic
n
Partial
Bitstreams storage bitstream PRM index
out Dynamic
32 Reconfiguration log(n)
Controller
addr_bitstr
Fig. 1. The structure of FT designs in FPGA based on PRMs rst (GPDRC)
x
clk
This paper presents a new approach for driving PDR sync
process by simple Generic Partial Reconfiguration Controller
(GPDRC) implemented into FPGA circuit. The goal of our Fig. 2. Interface of GPDRC for FT system implemented in SRAM-based
research can be defined in the following way: we want to im- FPGA as PRMs
plement GPDRC as small FT unit and verify its functionality
in FT structure. In Figure 3, detailed architecture of GPDRC developed by
The paper is organized as follows. First, the architecture and our team is shown. The architecture of GPDRC consists of
basic features of GPDRC and its role in FT structure - for the five units, one FIFO memory and one FSM which drive each
detection of faulty PRM and the initiation of PRD process of unit.
the faulty module are described (section III). The experiments
and results with the GPDRC and its properties are presented #Err-PRMs
GPDRC
in section IV. Conclusions and ideas for our future research ...
are summarized in section V. PRMs Error Register File
...
III. G ENERIC PARTIAL DYNAMIC R ECONFIGURATION Round Robin Hard
Generic Hard
C ONTROLLER Generic Error
Error Detection
Encoder Unit PRM
Unit
This work assumes that, FT structure for FPGA-based FT index
system design consists of three parts (see Figure 1). Static
part, dynamic part and reconfiguration controller. Components sync
Safety Window
which are not designed as FT are included in the static part. Address FSM
Unit
Look Up
Static part should not be reconfigured. The dynamic part Unit
ICAP
consists of FT architectures which can be reconfigured by interface
means of PDR and therefore must be divided into PRMs with ECC wrapper
= Unit
error output signals. The last part of FPGA contains GPDRC. Addr
Counter
The presented FT structure supports detection and correction
FIFO
of all faults caused by SEUs and hard errors detection (damage
in the chip can not be repaired) in PRM. It is important to addr_bitstr valid bitstream
note that internal configuration access point (ICAP) is used
Partial Bitstreams Strorage for PRMs - (FLASH,ROM)
primarily but the system can be used with other configuration
interfaces. Moreover, ICAP is accessible from the FPGAs logic
and therefore can operate much faster than other configuration Fig. 3. Architecture of GPDRC for FT system implemented in SRAM-based
interfaces. If the full throughput of ICAP is used, the smallest FPGA as PRMs
PRM can be loaded into the configuration memory in the 28µs.
If more than one SEU occurs (more error signals are active),
A. Architecture and Features of GPDRC round robin algorithm chooses one of the PRMs which should
The interface of GPDRC can be seen in Figure 2. The be reconfigured. Generic Error Encoder (GEE) decodes binary
input signals consist of three control signals and two data index of this PRM and sends identification number together
signals. Error signals from all PRMs in FT structure are with error identification signal to Look Up Unit (LUU) and
connected to GPDRC inputs. Bitstream input signal loads Hard Error Unit (HEU). HEU detects hard error in PRMs after
frames of configuration data from external memory (flash or reconfiguration process. If error still exists in PRM, it is seen
ROM) or any other reliable source. Valid signal represents as hard error.
3. From the binary index, the LUU derives starting and ending safety window. The smallest implementation of the safety
address of the corresponding partial bitstream of faulty PRM window is to implement one safety window in the GPDRC -
and LUU starts the readback of bitstream from external mem- SWU in Figure 3. In that case, the size of the window has to be
ory to FIFO. LUU contains address unit which is implemented big enough for all units in the system and therefore units with
as a single counter with simple FSM. When the partial fast synchronization will not be able to repair faster than other
bitstream is loaded from memory into FIFO and if parity of unit. Another extreme would be to implement safety window
every frame of bitstream is correct (ECC unit) then FSM starts in every PRM to guarantee that the next repair process could
the reconfiguration process through ICAP. Safety Window Unit be performed as soon as possible.
(SWU) ensures that all PRMs are synchronised before a new
round of reconfiguration process can start. C. GPDRC Implemented as Fault Tolerant System
After successful reconfiguration, the system will continue on
GPDRC itself can be implemented as fault tolerant. In this
the next error until all of them are repaired. As long as every
case, fault tolerant parts of the GPDRC must be implemented
module operation is attacked by one SEU, it is guaranteed that
into the dynamic part of FPGA and must be divided into PRMs
the whole system operates correctly. If some of errors can not
with error output signals in the same fashion as any other
be repaired by the PDR, round robin arbiter will assure that
FU. FT parts of GPDRC are implemented as TMR system
such errors will not block the repair process of the remaining
or duplex system with comparators. If error is detected in
PRMs.
any PRM of GPDRC than this unit has to be reconfigured
B. Synchronization Problem and Safety Window first. Therefore bitstreams for the GPDRC is located at the
beginning of the bitstream storage and the error signal will
The PDR is able to repair a fault that caused error in a
reset the address counter and start the reconfiguration process,
PRM but the state of the module after reconfiguration process
which will effectively reconfigure part of GPDRC without
is undefined. There are two methods for setting the internal
interrupting functionality of the device.
state of the unit to correct value. The first method copies the
actual state from the other implementation of the unit in the
IV. E XPERIMENTS AND R ESULTS
system. This method is relatively complex but can be used in
any type of the system. The second method synchronizes the The architecture of GPDRC described above was imple-
units before receiving work for the next packet by generating mented for SRAM-based FPGA FT design in VHDL language.
the local reset signal at the end of the actual packet. This For the synthesis XILINX ISE 11.4 was used. ISE 11.4
method has limitations and can be used only in systems with supports PDR in Virtex5 which is available on ML506 de-
the packet processing. The second method should be preferred velopment board. Correct function of error detecting circuitry
whenever it is possible because its implementation is simple was tested on this board in the early phase of this work.
and cheap. For experimenting with methodology and GPDRC, a simple
digital circuit was developed which contains several counters,
Data
decoders, multiplexors and other additional logic. These com-
Loc_reset
ponents of the circuit were implemented as FT architectures
divided into PRMs by our methodology. We compared basic
Error Safety
window features and properties of the GPDRC. The following features
Reconf
were considered:
Sync
• the sizes of functional units of GPDRC (numbers of
Fig. 4. Timing diagram with synchronization of PRMs after reconfiguration LUTs, FlipFlop registers) for 100 PRMs
• the size and overheads of GPDRC implemented as FT
The synchronization of the repaired unit is not done in- system
stantly. If there is only one fail in the system, it is possible • the comparison of GPDRC size (in slices) with the size
that the reconfiguration process tries to repair already repaired of MicroBlaze solution
unit, which is waiting for the synchronization. The unit will not • the parameters of GPDRC (size in slices, frequency of
work correctly after second reconfiguration because it has to design) for different numbers of PRMs
wait for synchronisation. The HEU, however, would evaluate • the probability that GPDRC fails if SEU occurs in the
the error as unrepairable since it was not repaired by two architecture
reconfigurations. To solve this problem, safety window is in- The results of GPDRC and MicroBlaze synthesis into
troduced into the reconfiguration system (see Figure 4). Safety Virtex5 XC5VSX50T and the numbers of resources can be
window is the minimal time required between reconfiguration seen in Table 1. The meaning of the columns is as follows:
of the same unit. The length of the safety window depends on column 1 - the name of component in GPDRC architecture,
the implemented system and on the synchronization method. column 2 - the size of component and the utilization of FPGA
There is tradeoff between implementation complexity and a resources, column 3 (4) - the numbers of LUTs (FlipFlops),
speed of the reconfiguration in the implementation of the column 5 - the size of FT GPDRC and overhead.
4. ML506-Virtex5 Size # LUTs # F/Fs TMR
100 PRMs [slices] [-] [-] [slices]
the relation between GPDRC and its dependability parameters.
Round Robin Unit 91 (1,1%) 101 101 214 (2,4x) Acknowledgments
Error Encoder 74 (1,0%) 110 0 165 (2,2x)
Hard Error Unit 38 (0,6%) 36 103 113 (2,9x) This work was supported by the Grant Agency of the Czech
Safety Window Unit 11 (0,2%) 31 25 98 (9,0x) Republic (GACR) No.102/09/1668 - ”SoC circuits reliability
ECC Unit 7 (0,1%) 8 1 16 (2,3x)
Address Unit 60 (0,8%) 138 21 108 (1,8x) and availability improvement” and by GACR No.102/09/H042
FSM 13 (0,2%) 28 13 29 (2,2x) - ”Mathematical and Engineering Approaches to Developing
GPDRC total 324 (4,0%) 509 262 858 (2,6x) Reliable and Secure Concurrent and Distributed Computer
MicroBlaze IP core 613 (7,5%) 1333 1328 1531 (2,5x)
Systems” and by Research Project No. MSM 0021630528 -
TABLE I
”Security-Oriented Research in Information Technology” and
N UMBERS OF FPGA RESOURCES FOR GPDRC
the grant ”BUT FIT-S-10-1”.
R EFERENCES
[1] K. Kyriakoulakos and D. Pnevmatikatos, “A novel sram-based fpga
architecture for efficient tmr fault tolerance support,” in International
The probability that GPDRC fails if SEU occurs in architec- Conference on Field Programmable Logic and Applications, 2009. FPL
ture is 3.97%, for MicroBlaze IP core it is 7.52%. In Virtex5- 2009. Washington, DC, USA: IEEE Computer Society, 2009, pp. 193–
198.
XC5VSX50T FPGA 204 PRMs can be created. Figure 5 shows [2] S.-Y. Yu and E. J. McCluskey, “On-line testing and recovery in tmr
how the number of slices increases with the number of PRMs. systems for real-time applications,” in ITC ’01: Proceedings of the 2001
The size of GPDRC increases almost linearly with the number IEEE International Test Conference. Washington, DC, USA: IEEE
Computer Society, 2001, p. 240.
of PRMs and frequency of GPDRC balances about 230 MHz. [3] F. G. de Lima Kastensmidt, G. Neuberger, R. F. Hentschke, L. Carro,
The frequency of GPDRC is sufficient because the speed of and R. Reis, “Designing fault-tolerant techniques for sram-based fpgas,”
ICAP interface is 100 MHz. IEEE Des. Test, vol. 21, no. 6, pp. 552–562, 2004.
[4] A. Jacobs, A. George, and G. Cieslewski, “Reconfigurable fault tol-
erance: A framework for environmentally adaptive fault mitigation in
800 800 space,” in International Conference on Field Programmable Logic and
Size o GPDRC
Frequency after synthesis Applications, 2009. FPL 2009. Los Alamitos, CA, USA: IEEE
700 Linear approximation of size 700 Computer Society Press, 2009, pp. 199–204.
Frequency after Synthesis [MHz]
Linear approximation of frequency [5] J. Heiner, B. Sellers, M. Wirthlin, and J. Kalb, “Fpga partial reconfig-
Number of occupied slices
600 600 uration via configuration scrubbing,” in Field Programmable Logic and
Applications, 2009. FPL 2009. International Conference on, aug. 2009,
500 500 pp. 99 –104.
[6] R. Oliveira, A. Jagirdar, and T. J. Chakraborty, “A tmr scheme for seu
400 400 mitigation in scan flip-flops,” in ISQED ’07: Proceedings of the 8th
International Symposium on Quality Electronic Design. Washington,
300 300 DC, USA: IEEE Computer Society, 2007, pp. 905–910.
[7] C. Bolchini, A. Miele, and M. D. Santambrogio, “Tmr and partial
200 200 dynamic reconfiguration to mitigate seu faults in fpgas,” in DFT ’07:
Proceedings of the 22nd IEEE International Symposium on Defect
100 100 and Fault-Tolerance in VLSI Systems. Washington, DC, USA: IEEE
Computer Society, 2007, pp. 87–95.
0 0 [8] L. Sterpone, M. Aguirre, J. Tombs, and H. Guzm´ n-Miranda, “On the
a
0 20 40 60 80 100 120 140 160 180 200 design of tunable fault tolerant circuits on sram-based fpgas for safety
Number of PRMs
critical applications,” in DATE ’08: Proceedings of the conference on
Design, automation and test in Europe. New York, NY, USA: ACM,
Fig. 5. Size and frequency of GPDRC based on #PRMs 2008, pp. 336–341.
[9] D. Fay, S. Campbell, G. Miller, and D. Connors, “Teaching fault tolerant
fpga design for aerospace applications,” International Symposium on
Microelectronics Systems Education, IEEE International Conference
V. C ONCLUSIONS AND F UTURE W ORK on/Multimedia Software Engineering, vol. 0, pp. 61–62, 2007.
In the paper, the basic principle of the methodology for [10] J. Torresen, G. Senland, and K. Glette, “Partial reconfiguration applied
in an on-line evolvable pattern recognition system,” in NORCHIP 2008.
FT system design applying PDR was presented. Our previous Washington, DC, USA: IEEE Computer Society, 2008, pp. 61–64.
experiments have shown that PDR can be used for the design [11] X. Iturbe, M. Azkarate, I. Martinez, J. Perez, and A. Astarloa, “A novel
of FT architectures in SRAM-based FPGAs. The main role of seu, mbu and she handling strategy for xilinx virtex-4 fpgas,” in Inter-
national Conference on Field Programmable Logic and Applications,
PDR controller in FT system is seen in the identification of 2009. FPL 2009. Washington, DC, USA: IEEE Computer Society,
faulty PRM and the fast initiation of reconfiguration process 2009, pp. 569–573.
of the faulty module in FT architectures. The main structure [12] M. Straka, J. Kastil, and Z. Kotasek, “Fault tolerant structure for sram-
based fpga via partial dynamic reconfiguration,” in 13th EUROMICRO
and basic parameters of GPDRC were gained and described Conference on Digital System Design DSD 2010. Washington, DC,
together with the problems of PRMs synchronization after USA: IEEE Computer Society, 2010.
reconfiguration of one of them. The results of synthesis [13] M. Straka, J. Kastil, and Z. Kotasek, “Modern fault tolerant architectures
based on partial dynamic reconfiguration in fpgas,” in 13th IEEE Inter-
demonstrate that GPDRC has lower overhead than controller national Symposium on Design and Diagnostics of Electronic Circuits
implemented as MicroBlaze. and Systems. New York, NY, USA: IEEE Computer Society, 2010, pp.
Future research shall concentrate on more effective imple- 336–341.
mentation of PRMs synchronization in FT architectures and