This document describes PyCoRAM, a Python-based implementation of the CoRAM memory architecture for FPGA-based computing. PyCoRAM provides a high-level abstraction for memory management that decouples computing logic from memory access behaviors. It allows defining memory access patterns using Python control threads. PyCoRAM generates an IP core that integrates with standard IP cores on Xilinx FPGAs using the AMBA AXI4 interconnect. It supports parameterized RTL design and achieves high memory bandwidth utilization of over 84% on two FPGA boards in evaluations of an array summation application.
1. Dec 7, 2013
CARL2013@Davis, CA
PyCoRAM
Yet Another Implementation of CoRAM Memory
Architecture for Modern FPGA-based Computing
Shinya Takamaeda-Yamazaki†‡, Kenji Kise†, James C. Hoe*
†Tokyo
Institute of Technology
‡JSPS
Research Fellow
*Carnegie
Mellon University
4. FPGA as SoC
n Put together various components on a single FPGA
l CPU core
FPGA
• Microblaze (Soft-macro)
• Cortex-A9 (Hard-macro)
CPU
HW
Acc
HW
Acc
l Hardware accelerator logic
Interconnect
• Modeled in traditional RTL
– Verilog HDL, VHDL
• Modeled in new modeling tool
Ether
DRAM
I/F
PCI-E
– Bluespec, AutoESL, Chisel, …
l DDRx DRAM interface
l PCI-express
l Ethernet, …
Dec 7, 2013
Shinya T-Y. Tokyo Tech
4
5. Portability Issue of Application Design
n How to support various FPGA platforms?
l Different logic size,
memory interface,
peripherals and I/O propertyL
Digilent Atlys
(Xilinx Spartan-6 LX45)
ScalableCore System (our FPGA system)
(Xilinx Spartan-6 LX16 × 128-node)
Dec 7, 2013
Shinya T-Y. Tokyo Tech
Xilinx ML605
(Xilinx Virtex-6 LX240T)
5
6. IP-core Based System Development
n To build a system, add IP-cores and connect themJ
l IP-cores are connected through a standard on-chip interconnect
l EDK automatically generates an on-chip interconnection and
(some) device-dependent interfaces
• No (or few) annoying steps!
IP-core List
FPGA
CPU
IP-core
Instances
HW
Acc
HW
Acc
Interconnect
Ether
Interconnect
DRAM
I/F
PCI-E
DRAM
Dec 7, 2013
Xilinx Platform Studio (XPS)
Shinya T-Y. Tokyo Tech
6
7. Abstract Memory System for FPGAs
n CoRAM (Connected RAM) [FPGA’11]
l High-level abstraction for memory management
• Decoupling computing logics and memory access behaviors
• Memory access patterns in software model (C language)
Read/Write
Communication
FIFOs (Registers)
CoRAM
Channel
Read/Write
Read
Write
CoRAM
Memory
Abstracted
On-chip Memories
HW Kernels
(Computing Logics)
Dec 7, 2013
Off-chip
Memory
Shinya T-Y. Tokyo Tech
Manage
Control Threads
(Memory Access
Pattern in C)
7
8. What “Runs” CoRAM?
From
CoRAM Tutorial
@FPGA’13
RTL conversion
Core Logic
SRAM
Control
thread
programs
Architecture
Microarchitecture
FPGA
Network-on-Chip
Memory Translation (TLBs)
Memory Interfaces and Caches
6/19/2013
Dec 7, 2013
CoRAM Tutorial
Shinya T-Y. Tokyo Tech
Cluster DMA
Control Logic
Fabric
Cluster DMA
Fabric
Control Logic
High-level synthesis
from C to RTL
using LLVM
CONNECT
NoC generator
18
8
10. Motivation: CoRAM for EDK
n Integration of CoRAM memory architecture for modern
EDK-based development flow with standard IP-cores
Portable application
design with CoRAM
Cooperation with standard IP-cores
Accelerator logic
Standard IP-core
CPU core
CoRAM
Abstraction
Standard On-chip Interconnect
Device-dependent Interfaces
Dec 7, 2013
Shinya T-Y. Tokyo Tech
10
11. PyCoRAM
n Python-based implementation of CoRAM memory
architecture for modern FPGA EDKs
l CoRAM memory abstraction for EDK development flow
n Key features
l Control Thread in Python
• We developed Python-to-Verilog HLS Compiler from scratch
l AMBA AXI4 Interconnect for on-chip interconnect
• For IP-core based development on Xilinx Platform Studio (XPS)
l Parameterized RTL Design Support for User-logic
• Generate-statement and Parameter-statement analyzed by our
original Verilog analysis tool-chain (Pyverilog)
Dec 7, 2013
Shinya T-Y. Tokyo Tech
11
12. Comparison with Original CoRAM
CoRAM
PyCoRAM
Language
for Control-Thread
C
Python
Supported
Memory Operations
(Blocking/Non-Blocking)
Read/Write
(Blocking/Non-Blocking)
Read/Write
On-chip Interconnect
CONNECT NoC [FPGA’12]
AMBA AXI4
FSM Granularity
in Control Thread
LLVM-IR
Python AST Node
Generate Statement
Support for User logics
No
Yes
Supported FPGAs
Xilinx ML605
Altera Terasic DE-4
Any FPGAs
supporting AXI Bus
# Lines of Code
11,682 lines
(w/o CONNECT)
4,922 lines
(w/o Pyverilog)
FSM: Finite State Machine
LLVM-IR: Low Level Virtual Machine Intermediate Representation
AST: Abstract Syntax Tree
Shinya T-Y. Tokyo Tech
Dec 7, 2013
12
13. PyCoRAM Development Flow
n PyCoRAM generates an IP-core package from user-logic
RTLs and control thread scripts in Python
l Each part can be replaced with the original CoRAM’s component
RTL Conversion
User-logic
(Verilog HDL)
Control
Threads
(Python)
Portable
Application
Design
Dec 7, 2013
Logic
Hierarchy
Analysis
Python-toVerilog
Compilation
Control
Signal
Insertion
IP-core
generation
with AXI4
Interface
IP-core
Packing
Control
Signal Port
Addition
PyCoRAM Tool-chain
Python-to-Verilog HLS
Shinya T-Y. Tokyo Tech
(RTL,
.mpd,
and
.pao)
Top design
synthesis with
AXI4
IP-core
Integration
on EDK
Synthesis
FPGA
Bit
File
Vendor EDA Flow
13
14. FPGA Accelerator with PyCoRAM IP-core
FPGA
Other
AXI
IP-core
or
CPU
PyCoRAM IP
(Application)
CoRAM
Memory
DMA
Cluster
HW Kernels
(Computing Logics)
CoRAM
Memory
DMAC
AXI I/F
CoRAM
Channel
CoRAM
Stream
CoRAM
Stream
DMAC
DMAC
DMAC
AXI I/F
AXI I/F
AXI I/F
CoRAM
Memory
DMA
Cluster
Control
Thread
CoRAM
Memory
FSM
AXI4 Interconnect
DRAM Controller
DRAM (Off-chip)
Dec 7, 2013
Shinya T-Y. Tokyo Tech
14
16. PyCoRAM Microarchitecture (Logical View)
GPIO
User
I/O
User Logic
CoRAM
Register
Control
Thread
CoRAM
Channel
CoRAM
Memory
DMAC
Dec 7, 2013
CoRAM
Stream
DMAC
Shinya T-Y. Tokyo Tech
FSM
16
17. PyCoRAM Microarchitecture (Logical View)
Modeled in RTL
(Verilog HDL) User
I/O
GPIO
User Logic
CoRAM
Register
Control
Thread
Memory Access
Pattern
in Python
CoRAM
Channel
CoRAM
Memory
DMAC
Dec 7, 2013
CoRAM
Stream
DMAC
Shinya T-Y. Tokyo Tech
FSM
17
18. PyCoRAM Microarchitecture (Physical View)
GPIO
User
I/O
PyCoRAM IP
User Logic
CoRAM
Register
Control
Thread
CoRAM
Channel
CoRAM
Memory
CoRAM
Stream
DMAC
DMAC
AXI I/F
AXI I/F
FSM
AXI4 Interconnect
FPGA
Dec 7, 2013
DRAM Controller
Shinya T-Y. Tokyo Tech
18
19. PyCoRAM Microarchitecture (Physical View)
GPIO
User
I/O
Control Thread
in Python
PyCoRAM IP
User Logic
CoRAM
Register
Control
Thread
CoRAM
Channel
Parameterized RTL
CoRAM
Design Support
Memory
CoRAM
Stream
DMAC
DMAC
AXI I/F
AXI I/F
FSM
AXI4 Master Interface
AXI4 Interconnect
FPGA
Dec 7, 2013
DRAM Controller
Shinya T-Y. Tokyo Tech
19
20. Control Thread in Python
n Operations for CoRAM objects
l To/from CoRAM Memory
User
I/O
User Logic
• Data movement pattern with DMA operations
between on-chip CoRAM memory and DRAM
l To/from CoRAM Channel
• Token communication action
between user-logic and control thread
Control
Thread
CoRAM
Channel
CoRAM
Memory
FSM
DMAC
0� def calc_sum(times):�
ram = CoramMemory(idx=0, datawidth=32, size=1024)�
1�
channel = CoramChannel(idx=0, datawidth=32)�
2�
addr = 0�
3�
sum = 0�
4�
for i in range(times):�
5�
ram.write(0, addr, 128)� # Transfer (off-chip DRAM to BRAM)
6�
channel.write(addr)�
# Notification to User-logic
7�
sum += channel.read()� # Wait for Notification from User-logic
8�
addr += 128 * (32/8)�
9�
print(‘sum=’, sum)�
# $display Verilog system task
10�
�
11� calc_sum(8)�
Dec 7, 2013
Shinya T-Y. Tokyo Tech
20
21. CoRAM objects in User Logic
n CoRAM objects as standard BRAM or FIFO
l Very similar interface to the standard memory components
l User-logic can use their contents in them in the same way
n Essential parameters to define object characteristics
l Thread name, ID, data width, address length, …
CoramMemory1P�
#(�
.CORAM_THREAD_NAME("thread_name"),�
.CORAM_ID(0),�
.CORAM_ADDR_LEN(ADDR_LEN),�
.CORAM_DATA_WIDTH(DATA_WIDTH)�
)�
inst_memory�
(.CLK(CLK),�
.ADDR(mem_addr),�
.D(mem_d),�
.WE(mem_we),�
.Q(mem_q)�
);�
Dec 7, 2013
CoramChannel�
#(�
.CORAM_THREAD_NAME("thread_name"),�
.CORAM_ID(0),�
.CORAM_ADDR_LEN(CHANNEL_ADDR_LEN),�
.CORAM_DATA_WIDTH(CHANNEL_DATA_WIDTH)�
)�
inst_channel�
(.CLK(CLK),�
.RST(RST),�
.D(comm_d),�
.ENQ(comm_enq),�
.FULL(comm_full),�
.Q(comm_q),�
.DEQ(comm_deq),�
.EMPTY(comm_empty)�
);�
(a) CoRAM Memory
(b) CoRAM Channel
Shinya T-Y. Tokyo Tech
21
23. For Parameterized RTL design support
n Generate-statement support by advanced RTL analyzer
l Not supported by the original CoRAM compiler
Dataflow
n Pyverilog: Python-based Tool-chain
for Verilog HDL Design
l Parser
l Dataflow Analysis
l Optimization
l RTL Code Generation
l Control flow Analysis
l Graphical Output
Dec 7, 2013
State Machine
Shinya T-Y. Tokyo Tech
23
29. Conclusion
n PyCoRAM: Python-based implementation of CoRAM
memory architecture for modern FPGA EDKs
n Future work
l Further evaluation on more realistic applications
l AXI4 slave feature for control thread
l Tutorial slideJ
Portable application
design with CoRAM
Cooperation with standard IP-cores
Accelerator logic
Standard IP-core
CPU core
CoRAM
Abstraction
Standard On-chip Interconnect
Device-dependent Interfaces
Dec 7, 2013
Automatically managed by EDK
Shinya T-Y. Tokyo Tech
29
30. PyCoRAM and Pyverilog are ready for public!
n PyCoRAM (0.7.0-public)
l https://github.com/shtaxxx/PyCoRAM
n Pyverilog (0.6.0-public)
l https://github.com/shtaxxx/Pyverilog
Thanks!
Dec 7, 2013
Shinya T-Y. Tokyo Tech
30