- Daniele Valentino de Vincenti, B.Sc. graduate in Biomedical Engineering @Politecnico di Milano
- Lorenzo Farinelli, B.Sc. graduate in Computer Science and Engineering @Politecnico di Milano
Plaster is a multi-layered infrastructure (based on C++) aimed at supporting the development of multi-FPGA systems and the management of large data flows between the nodes. In particular, the goal of the project is to provide the end-user with a set of tools (by the means of a Python library and a C++ service) to easily assign bitstreams to nodes and route data between them, in the context of a PYNQ-based cluster suitable for distributed acceleration of computation-intensive tasks. Using this platform, an abandoned objects detection tool is implemented, designed as a Multi-FPGA distributed system exploiting an hardware accelerated version of the YOLO neural network for image detection.
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
Abandoned Object Detection Using PYNQ-Based Multi-FPGA Cluster
1. PLASTER
PYNQ-based abandoned object detection using a
map-reduce approach on a multi-FPGA cluster
Daniele Valentino De Vincenti danielevalentino.devincenti@mail.polimi.it
Lorenzo Farinelli lorenzo.farinelli@mail.polimi.it
Luca Stornaiuolo luca.stornaiuolo@polimi.it
Rolando Brondolin rolando.brondolin@polimi.it
July 19th 2020
VNGC project presentation
2. Context
2
Neural networks running on
embedded devices are :
▪ Computationally intensive
▪ Strongly memory bound
▪ Resource hungry
▪ Power consuming
https://www.flaticon.com/authors/freepik
3. Our solution
3
▪ PYNQ-based multi-FPGA cluster
for the:
○ Flexibility of the infrastructure
○ Reliability and redundancy
○ Portability and ease of setup
(e.g. events)
○ High computational power
○ Heterogeneous design
○ Embedded system
▪ Abandoned object detection
using accelerated YOLO detector
https://www.flaticon.com/authors/eucalyp
▪ C++ node manager for fast
communication
▪ End-user Python libraries for
ease of use
https://www.flaticon.com/free-icon/purchase-summary_1949624
https://github.com/dhm2013724/yolov2_xilinx_fpga
4. Multi-FPGA cluster
4
▪ Distributed system of accelerators
▪ Self-managed cluster of PYNQ-Z1s
▪ On-the-fly reconfiguration of FPGAs
to start new tasks and jobs
5. Cluster design
5
Rendering of a 3-PYNQ-Z1s configuration of the cluster, with the boards facing outwards to improve heat dissipation.
A fan will be placed on top, and on the side the cluster will be surrounded by a plexiglass pane to force the airflow to
go from bottom to top. In the central hole a network switch will be hosted, to connect together the boards.
6. Application
6
▪ Video input is gathered and split
into frames
▪ Frame chunks are sent to multiple
board for classification
▪ Results from the classification stage
are sent to a second analyzing stage
▪ Final results are aggregated and
sent back to the user
User Cluster
7. Node manager
7
https://creativemarket.com/Becris
▪ Fast underlying communication
layer
▪ Easy reconfigurability of nodes for
different tasks
▪ Simple rearrangement of the cluster
in case of failures
▪ Ease of use through a set of Python
APIs
8. Python libraries
8
▪ Python APIs to build applications
running on the cluster
▪ Easy configuration and assignment
of bit-streams to the boards
▪ Dedicated functions for the
communication and file exchange
between nodes
▪ Transparent C++ management layer
9. Results / user APIs
9
UML class diagrams of the base class representing a task of a distributed application and specifying the
APIs provided by the Python library to interact with the cluster and manage the execution of apps.
When developing a distributed application, a user only has to import PlasterTask in the app’s Python
code and implement a concrete subclass to define a custom behavior for the task.
ExecutorWrapper is used to expose methods to let tasks interact with the cluster (e.g. send/receive data)
10. Results / transfer times
10
File size Plaster transfer time Python transfer time
50 MB 194.094 ms 1.0 s
100 MB 300.680 ms 2.0 s
200 MB 600.392 ms 3.976 s
500 MB 1866.889 ms 9.679 s
Transfer times to move a file of the specified size from one board to another. The values refers to the time between
the 1st
file transfer request (from recipient to owner) and the reception (and disk writing) of the last chunk of the file,
comparing the results obtained by using the communication APIs of the cluster (left) and by explicitly defining a file
transfer function from scratch in Python (right)
11. Thank you
Lorenzo Farinelli
MSc Computer Science & Engineering
lorenzo.farinelli@mail.polimi.it
11
Luca Stornaiuolo
PhD - Computer Science & Engineering
luca.stornaiuolo@polimi.it
Daniele Valentino De Vincenti
Ba Biomedical Engineering
danielevalentino.devincenti@mail.polimi.it
Project code available at: https://bitbucket.org/necst/pynq-cluster/
Rolando Brondolin
PhD - Information Technology
rolando.brondolin@polimi.it