A major challenge in computing is to leverage multi-core technology to develop energy-efficient high performance systems. This is critical for embedded systems with a very limited energy budget as well as for supercomputers in terms of sustainability. Moreover the efficient programming of multi-core architectures, as we move towards manycores with more than a thousand cores predicted by 2020, remains an unresolved issue. The FlexTiles project will define and develop an energy-efficient yet programmable heterogeneous manycore platform with self-adaptive capabilities.
The manycore will be associated with an innovative virtualisation layer and a dedicated tool-flow to improve programming efficiency, reduce the impact on time to market and reduce the development cost by 20 to 50%.
FlexTiles will raise the accessibility of the manycore technology to industry – from small SMEs to large companies – thanks to its programming efficiency and its ability to adapt to the targeted domain using embedded reconfigurable technologies.
Contract number 288248
Project coordinator THALES
Contact person Dr. Philippe MILLET
THALES R&T
Campus Polytechnique
1 avenue Augustin Fresnel
91767
Palaiseau Cedex, France
Tel: +33 1 69 41 60 49
Fax: +33 1 69 41 60 01
philippe.millet@thalesgroup.com
Project website www.flextiles.eu
The Codex of Business Writing Software for Real-World Solutions 2.pptx
FlexTiles - The Self-Adaptive MultiCore Project - Year 2 Report
1. FLEXTILES CONTRACT 288248
Year 2 Report - Public
2.1 Introduction
The goal of the FlexTiles FP7 project is to develop a dynamically reconfigurable heterogeneous
many-core platform and the tools to program it.
The first year of the project was dedicated to the definition of what the whole platform has to be:
Definition of the technology to use
Hardware and software interfaces and interface between the two.
Definition of the background of each partner
Definition of the way how to build a consistent platform that covers the application needs.
To do so, the whole consortium worked in common to share the same view, defining a solution from
programming down to hardware, which turned out to be harder than expected. We had to work longer
on managing dynamicity in the platform.
During the second year we have solved the issue on managing dynamicity and updated deliverables
according to the technical solutions we then defined.
Managing dynamicity starts by defining the dynamicity we need in the applications and then
proceeding to define how to capture and describe the dynamicity in the tools.
Hardware must also allow the executable code to be moved from one computing unit to another at
run-time. We have had a lot of work finding a solution that allows moving a task of a dataflow from a
computing unit to another. The solution which was adopted requires stopping the data flow, loading
the new computing units with the tasks to be executed, changing the routes of the messages going
through the embedded Network on Chip and starting anew the flow of data.
This second year was mostly dedicated to the refinement of specifications and development. From
the work achieved during the first year and the preliminary specifications jointly defined by the
consortium, we refined these specifications in smaller working groups that were dedicated to the
hardware, the embedded software and the tool chain.
From the hardware point of view, we went deeper in the description of the eFPGA as well as the
many-core architecture. We defined the accelerator interface that allows us to interface any kind of
accelerator to the many-core platform. This accelerator interface allows us, for example, to plug
streaming VHDL accelerators or processors like DSPs to the many-core and to manage these from
the General Purpose Processors (GPPs) implemented in the many-core.
From the embedded software point of view, we went deeper in the definition of the architecture of the
different layers. These were validated on a virtual platform based on the Open Virtual Platform (OVP)
that simulates functionally and temporally the many-core. We also defined how to drive the
accelerators from the GPPs of the many-core.
From the tool chain point of view, we started to define how we would like to capture the application
and what would be the output of the tool chain.
We also prepared the integration plan that will take place during the third year, taking care of
interfaces and the intermediate deliveries between partners to proceed with the integration as
smoothly as possible.
2. FLEXTILES CONTRACT 288248
2.2 Key innovations
During this year, the major challenges were on the implementation of the parts of the platform.
On the hardware, we've specified the Network on Chip (NoC), the Accelerator Interface as well as the
Network Interfaces that are used to interface the NoC with the components plugged on it, e.g. the
General Purpose Processors (GPP), the Accelerator Interface and the Memory banks.
A virtual implementation of a homogeneous many-core was developed based on the Open Virtual
Platform. Next stage is to develop a heterogeneous OVP. The General Purpose Processors of this
platform are MicroBlaze processors from Xilinx and the NoC is the dAElite from TUe. This virtual
implementation allowed us to develop the embedded software, i.e. the kernel and the virtualization
layer that enable applications to run on the platform, delivering run-time binding services to allocate
resources to tasks, monitoring services to probe the architecture and actuators to provide actions
according to what has been monitored. The MicroBlaze was chosen to ease migration between the
OVP and a hardware emulator, the integrated FlexTiles Development Platform (FDP) demonstrator
that was developed by Sundance during Year 2. The FDP embeds two Xilinx Virtex-6 SX475T FPGAs
with multiple Parallel and Serial Interfaces between them.
FlexTiles Development Platform
The FDP was delivered to the partners of the project and a first implementation of the hardware
architecture platform was delivered by TUe. Since this implementation is the same as the one
developed on OVP, the software could be ported very smoothly. We are now able to run the same
application on the two platforms, either on a hardware emulator or on the simulator. This is a major
contribution and break-through.
3. FLEXTILES CONTRACT 288248
A tool was also developed by TUe and KIT in order to have an inside view of what is happening in the
hardware, both on the virtual platform as well as on the FDP. This tool can be used to visualize how
the tasks can be moved from one GPP to another, dynamically, according to requirements.
To complete the hardware platform, two accelerators, i.e. a DSP from CSEM and the eFPGA from
UR1, were developed at register-transfer level (RTL) in order to be plugged to the Accelerator
Interface.
On top of the hardware and embedded software, a programming model was proposed by TRT and
ACE to capture the dynamicity of the applications.
2.3 Technical approach
In order to validate the solutions proposed and developed in FlexTiles, we propose to emulate the 3D
stacked chip with the FlexTiles Development Platform (FDP), embedding two Xilinx Virtex-6 SX475
FPGAs, physically linked one to another by communication means. On this physical emulator, we
propose to implement a homogeneous many-core on one of the two FPGAs and accelerators on the
other one. The Network on Chip is intended to be extended across the two FPGAs through Aurora
serial links, to simulate a single 3D FlexTiles SoC chip implementation.
Since the Open Virtual Platform allows us to easily simulate a homogeneous many-core based on
GPPs, we had a simulation model of the FDP’s first FPGA that allowed us to develop the kernel,
RTOS and the virtualization layer. When the RTL model of the first FPGA was ready, we could directly
use the software developed on the simulation platform to the design running on the first FPGA. On the
simulation model, we simulated sensors in order to test the monitoring and actuator strategies.
We decided not to implement the eFPGA on the FPGA as it would require more FPGA resources than
are available, and it would not bring more information than what we will get from the simulation
models.
The hardware accelerators that would have been implemented on the eFPGA will then be
implemented on the FDP’s second FPGA to emulate those accelerators’ implementation without
emulating the eFPGA underlying structure.
Independently from work done on the FDP, and in order to be able to dynamically charge the bit-
stream with small delays, a new format of bit-stream has been designed by UR1. In order to take as
little memory space as possible to store the bit-streams, we are currently studying compression
techniques at RUB.
To add dynamicity in the platform thus allowing us to deal with dynamic applications based on data
flow, we proposed to be able to dynamically replace atomic parts of the data flow by other atomic
parts, depending on a decision coming from the application itself. One part would run at a time. One
of the difficult issues we had to solve was to define, at reconfiguration time, how to stop the data flow
to the current tasks and reroute that flow to the new tasks. Data flows between computing units (or
tasks) through FIFOs. We decided to isolate the points where the flow can be rerouted by defining
some of these FIFOs as isolation FIFOs and implementing them in shared memory.
2.4 Demonstration and Use
Several implementations were made this year:
A high level simulator based on Open Virtual Platform (OVP) integrating an existing simulator
of the MicroBlaze, against which we linked a library of NoC simulator.
The FlexTiles Development Platform, developed inside the consortium and embedding two
Xilinx Virtex-6 SX475 FPGAs. This FPGA is the biggest matrix available for Virtex-6, on which
4. FLEXTILES CONTRACT 288248
we have space to implement a heterogeneous many-core big enough to experiment re-
mapping techniques.
A first increment of the RTL model of the FlexTiles hardware architecture that is emulated on
the FlexTiles Development Platform. The part that has been developed here is a
homogeneous many-core based on the MicroBlaze GPP. This first model is the base on
which we are going to plug the accelerator interface to turn this homogeneous architecture
into a heterogeneous one.
The software layers (kernel, RTOS, virtualisation layer) running on top of the FlexTiles
hardware architecture. Simple use cases have been developed on top of this software layer to
test and validate the principles and ideas proposed in FlexTiles for self-adaptation of the
mapping of the application at run-time.
An RTL model of the eFPGA has been produced. This allows us to test the virtual bit-stream.
The RTL model of the DSP has been adapted to be able to plug it to the Accelerator
Interface.
2.5 Work already performed and main results
The global approach remains the same as presented last year. The architecture proposed by the
consortium is a 3D stacked chip with two layers.
It has been decided, after considering what was possible to implement through TSVs (Through Silicon
Vias) between the two silicon layers, to implement (1) a heterogeneous many-core layer - with
General Purpose Processors (GPPs) and Digital Signal Processors (DSPs) - on top of which we
implement (2) an embedded FPGA (eFPGA).
This construction allows us to benefit from a bigger eFPGA matrix than if we would have to share the
top die with DSPs. With such an implementation, the eFPGA matrix can be more regular and allows
more flexibility when it is necessary to move IPs from one region of the eFPGA to another region.
This two-layer chip is the result of implementation trade-offs we had to do. Functionally, the platform
is still a heterogeneous many-core built from a homogenous many-core made of GPPs that are linked
to several kinds of accelerators.
We then have two kinds of accelerators: DSPs and VHDL IPs. Both are slaves of the GPPs. The view
we have is that a GPP controls the general application scheduling while the accelerators are
performing heavy computing tasks. Both are connected to the architecture through an Accelerator
Interface. This interface allows the GPP to control the accelerator as a slave, sending control
information.
To run a task on the DSP, we have to provide an executable code to the DSP which is linked against
a library containing some management parts of the virtualization layer to answer the messages and
behave as expected by the GPP.
When implementing an IP on the eFPGA, one has to connect it to the Accelerator Interface so that it
can be controlled by any GPP of the architecture.
We decided to base the communication between the elements of the architecture on a Network on
Chip (NoC). Since two partners of the project have a NoC available (TUe and CEA), we compared
them and challenged them with the requirements. Both NoCs can fulfil the project requirements. For
the first integration we have used dAElite, since it was already integrated with the homogeneous
multicore architecture. We decided to keep the two NoCs and to evaluate them at the end of the
project, as the integration won't cost a lot of effort due to the fact they rely on the same interfaces.
The NoC being like the spinal cord of the architecture, the Accelerator Interface uses it to exchange
data with others and receiving control information from the GPPs.
5. FLEXTILES CONTRACT 288248
A subset of the platform, i.e. the homogeneous GPP many-core, is simulated on a high level simulator
that allowed us to implement the embedded software, drivers and management libraries of the
platform, the kernel and the virtualization layer. These could be tested and validated on a host
workstation prior to being executed on a hardware emulator.
A first implementation of the homogeneous GPP many-core has been made on RTL logic. This
implementation being the same as the simulated one, we could reuse the implemented software
directly on the RTL code. It allows us to check the correctness of both the simulation model and the
RTL model.
From the implemented virtualization layer, we were able to test the re-mapping process by simulating
specific values on the monitored sensors.
Some basic applications are currently running on the platform. A part of the delivered STAP radar
application has already been implemented but more work has to be done to make it run on the
architecture, mostly because of the size of some buffers used in the software implementation which
are too large for hardware implementation.
The tool chain is currently under development. Individual tools have been tested, and integration into
a complete tool chain is planned.
In addition to this technical work, we worked on the MIM Innovation plan, defining how to assess our
innovations and trying it with 3 examples:
1. The tool chain for parallel application development.
2. CoSy compiler development system.
3. Kernel & self-adaptation.
Each partner identified innovations and we looked into the interest in patenting or disseminating these
innovations.
We have issued a survey to ask third parties for their interest in such a platform. So far we received
around 120 replies and the next task is to identify the feedback in relation to the projects visibility and
technical importance to the respective persons who took part in the survey.
Fewer papers were produced this year, as compared to last year, as we are currently under heavy
development and are awaiting results before to produce papers, but lectures were given in university
(RUB) about FlexTiles.
We have proposed FlexTiles workshops in September 2014 at the 24th International Conference on
Field Programmable Logic and Applications (FPL-2014) and at the Adaptive Hardware and Systems
(AHS-2014) in July 2014. Our website (www.flextiles.eu) is continuously updated.
2.6 Scientific, economical and societal impacts
The proposed many-core with its self-adaptive capabilities is a technological breakthrough in
complete match with new applications needing dynamic adaptation and mode swapping at runtime.
Smart systems and Cyber Physical Systems need a high level of computing power but also
embedded intelligence to react to the environment. We speak about autonomy, survivability.
These results are the first steps to define the future systems which will be able to manage dynamic
adaptation nested with static data flow (intensive computing). There is a need for a programming
model to describe a mix of different types of Models of Computation (MoC) while keeping the
possibility to optimize and parallelize the application.
6. FLEXTILES CONTRACT 288248
The impact of the project is mainly in many-core systems to define an evolution from the current
homogeneous many-cores, to propose programming solutions to help master these powerful
heterogeneous many-cores which are also more complex to program than the homogeneous ones.
The project brings new scientific ideas like:
The dynamic data flow.
The solutions to embed dedicated accelerators inside a homogenous many-core through our
accelerator interface and to program them allowing industrials to reduce their time-to-market.
New technology of FPGA used for the eFPGA with virtual bit-stream.
The virtualization layer together with the Kernel and the Resource monitoring that give
solutions for self‐adaptive capabilities at run-time.
The streaming compiler tool which will be used to produce patents, papers and/or lectures.
Consortium & Project Leader:
Dr. Philippe MILLET
THALES R&T,
Campus Polytechnique
1 avenue Augustin Fresnel
91767 Palaiseau Cedex,
France
Contact: flextiles_all@flextiles.eu
FlexTiles Team