Presenter: Markku-Juhani O. Saarinen
Talk: Design and implementation of the WhirlBob and Keyak/WhirlBob embedded FPGA System-on-Chip co-processor for the second round of the CAESAR competition
Conference: TrustED 2014 - Arizona, USA, 03 November 2014,
http://th.informatik.uni-mannheim.de/trusted-workshop/2014/
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Simple AEAD Hardware Interface SAEHI in a SoC: Implementing an On-Chip Keyak/WhirlBob Coprocessor
1. 2014
1. “Beyond Modes: Building a Secure Record Protocol from a Cryptographic Sponge
Permutation.” CT-RSA 2014. LNCS 8366, pp. 270–285, Springer (2014)
2. “CBEAM: Efficient Authenticated Encryption from Feebly One-Way ϕ Functions.”
CT-RSA 2014. LNCS 8366, pp. 251–269, Springer (2014)
3. “STRIBOB: Authenticated Encryption from GOST R 34.11-2012 LPS
Permutation.” CTCrypt ’14. To appear in Mathematical Aspects of Cryptography,
Steklov Mathematical Institute of RAS (2014)
4. “Simple AEAD Hardware Interface (SÆHI) in a SoC: Implementing an On-Chip
Keyak/WhirlBob Coprocessor.” TrustED 2014, ACMCCS 2014Workshops, 03
November 2014, Scottsdale AZ US. To appear. ACM(2014)
5. “Lighter, Faster, and Constant-Time: WHIRLBOB, the Whirlpool variant of
STRIBOB.” With Billy Bob Brumley. IACR ePrint 2014/501. Submitted (2014)
6. “BRUTUS: Identifying CryptanalyticWeaknesses in CAESAR First Round
Candidates.” IACR ePrint 2014/850. Submitted (2014)
+ Invited Talks. 1/17
2. Simple AEAD Hardware Interface (SÆHI) in a SoC:
Implementing an On-Chip Keyak/WhirlBob Coprocessor
Dr. Markku-JuhaniO. Saarinen
mjos@item.ntnu.no
NORWEGIAN UNIVERSITY OF SCIENCE AND TECHNOLOGY
TrustED ’14 – 03 November 2014 – Scottsdale AZ
2/17
3. Authenticated Encryption with Associated Data
An Authenticated Encryption with Associated Data (AEAD) primitive provides:
▶ Encryption or confidentiality / privacy protection, and
▶ Authentication or integrity protection for encrypted and associated data.
Preferably in a single pass over the data.
Security protocols such as IPSec and SSL/TLS usually required two processing steps
for each packet in 1990’s and 200x’s.
▶ Authentication was handled with a HMAC (Hash Message Authentication Code).
▶ Encryption was provided either with block cipher such as 3DES-CBC or
AES-CBC or a stream cipher such as RC4.
Hardware implementation of such a twin set-up is cumbersome.
Transition to AE has been swift during recent years because of ACM-GCM’s status in
Suite B (Classified COTS) and many attacks like CRIME, LUCKY13, POODLE.
3/17
4. Background: CAESAR project for new AEAD algorithms 2014-2017
NIST - sponsored international Competition for Authenticated
Encryption: Security, Applicability, and Robustness.
http://competitions.cr.yp.to/caesar-call.html
▶ Jan 2013 Announced by Dan Bernstein (secretary)
▶ Mar 2014 Deadline for first-round submissions (57)
▶ May 2014 Deadline for first-round software
▶ Aug 2014 DIAC ’14Workshop, UCSB
▶ Jan 2015 Second round candidates announced
▶ Feb 2015 Second round tweaks (fixes)
▶ Feb 2015 Second round Verilog / VHDL (this talk)
▶ Dec 2015 Third round candidates
▶ Dec 2016 Final round candidates
▶ Dec 2017 Final CAESAR portfolio announcement
4/17
5. Hardware API for Authenticated Encryption
CAESAR candidates came in many shapes and sizes. Here’s a rough breakdown:
8 are clearly based on a SHA3-style Sponge construction.
9 are (somehow) constructed from AES components.
19 are AES modes of operation.
21 are based on other design paradigms or are entirely ad hoc.
We want consistent testing across second round candidates.
Signalling. How to communicate with the hardware ? Can a consistent, high-level
“hardware API” be constructed ?
Memory access. Some prominent proposals (AEZ and SIV) require two passes over the
data, so APIs in the style of hash functions don’t really work.
What to test. Realistic test profiles via operating system and application integration.
5/17
6. System-on-Chip (SoC) Designs
Total global shipments 2014 (million units)
1241.664
314.065
853.829
Android Other mobile PC total
Majority of Internet and
communication devices are
Android Linux - based tablets
or smart phones.
System-on-Chip (SoC) designs integrate all the necessary
components of a computing application on a single chip.
Mobile electronics such as (smart) phones and tablets are
built on SoCs. Also used in found in Internet-of-Things (IoT)
appliances, modems, routers, home media, cars, etc.
Security of transmitted and stored data is even more relevant
to mobile devices than to traditional PC systems.
Limited CPU performance.
Energy efficiency critical.
Coprocessors: Audio and video codecs, RF processing, 3D
display rendering, M7/CCP motion, natural language, etc.
!Our evaluation target.
6/17
7. Zynq-7000 FPGA Artix 7 / ARM Cortex A9 SoC
2x
SPI
2x
I2C
2x
CAN
2x
UART
GPIO
2x SDIO
with DMA
2x USB
with DMA
2x GigE
with DMA
Processing System
AMBA® Interconnect AMBA Interconnect
ARM®CoreSight™Multi-Core Debug and Trace
NEON™DSP/FPUEngine NEONDSP/FPUEngine
Cortex ™- A9 MPCore
Cortex- A9 MPCore
32/32 KB I/D Caches
32/32 KB I/D Caches
EMIO General Purpose ACP
XADC
2x ADC, Mux,
Thermal Sensor
AXI Ports
High Performance
AXI Ports
PCIe Gen2
1-8 Lanes
Security
AES, SHA, RSA
Programmable Logic
(System Gates, DSP, RAM)
Multi-Standard I/Os (3.3V & High-Speed 1.8V) Multi-Gigabit Transceivers
Processor I/O Mux
Flash Controller
NOR, NAND, SRAM, Quad SPI
Multiport DRAM Controller
DDR3, DDR3L, DDR2
Configuration Timers DMA
256 Kbyte
On-Chip
Memory
Snoop
Control
Unit
512 Kbyte L2 Cache
General Interrupt
Controller
Watchdog
Timer
AMBA Interconnect AMBA Interconnect
On a single chip:
▶ Dual-core ARM Cortex A9 CPU@650 MHz.
▶ Artix-7 or Kintex-7 - type FPGA logic fabric.
▶ Can run Linux and Android.
▶ Realistic target for SoC implementations.
▶ Full devkit under $200.
We Study:
▶ Hardware assisted implementations vs.
software vs. hardware implementations.
▶ FPGA and software footprint, speed, power.
▶ Integration in applications e.g. via OpenSSL.
7/17
9. Implementation 2: WhirlBob / Whirlpool Core
WhirlBob Core.
StriBob, WhirlBob, and Whirlpool:
▶ StriBob is a CAESAR proposal by Markku-JuhaniO. Saarinen.
▶ WhirlBob is a 2nd round tweak proposed by M.-J.O. Saarinen and
Billy Bob Brumley (submitted to INSCRYPT ’14).
▶ WhirlBob is based on the permutation of the (ISO) Standardized
Whirlpool 3.0 hash by Paulo Barreto and Vincent Rijmen.
We implemented:
▶ The 512-bit, 1 cycle per round core permutation in Verilog for the
Artix-7 FPGA core of Zynq 7000.
▶ The module can be utilized for both Whirlpool hashing and
WhirlBob authenticated encryption.
9/17
10. Hardware Performance
With one “extra” reloading cycle per block, the maximum theoretical throughput of
these implementations is:
Parameter WhirlBob Keyak
Rounds 12 12
Cycles 13 13
Rate (bits) 256 1344
Speed (bit/clk) 19.7 103.4
Processing speeds are significantly slower when the Keccak core is used in the
24-round SHA3 hashing mode. Speed ranges from 23.0 (SHA3-512) to 47.5
(SHA3-224) bits/clock. Whirlpool, in coparison, is slightly faster thatn WhirlBob.
10/17
11. CAESAR Software API vs. Hardware API
A simple C API was specified by the CAESAR secretariat for reference software
implementations of the first round candidates.
int crypto_aead_encrypt (
uint8_t c , uint64_t clen , // Ciphertext
const uint8_t m, uint64_t mlen , // Message
const uint8_t ad , uint64_t adlen , // Associated Data
const uint8_t nsec , // ( Secret IV )
const uint8_t npub , // Nonce
const uint8_t k ) ; // Secret Key
Decryption and integrity verification can be performed with crypto_aead_decrypt(),
which has an equivalent interface.
SÆHI utilizes the same software API and a simple memory-mapped hardware API. The
software side is essentially a driver suitable for bare metal implementation.
11/17
12. Proposed Baseline Hardware API
Our cryptographic coprocessor has a simple, almost universal memory-mapped
interface. The module or hardware PIN interface is the same as for generic single port
RAM (with optional interrupt request line).
Signal Dir Purpose Diagram
ADDR In Address
ADDR
DI
WE
EN
CLK
AEAD
Core
DO
IRQ
DI In Data Write
WE In Write enable
EN In Enable/Select
CLK In Clock
DO Out Data Read
IRQ Out Interrupt Req.
The signaling between software component and this API is defined by the driver.
Faster (DMA, AXI) alternatives can be used – this is just the baseline interface.
12/17
13. Comparing Implementations
Code lines in our WhirlBob (StriBob) and
Keyak reference implementations:
Component WhirlBob Keyak
Interface Verilog 99 114
Round Verilog 228 129
Driver C 60 60
API Interface C 261 250
Total code 639 553
Post synthesis and route utilization within
Artix-7 FPGA fabric of Xilinx Zynq 7010:
Logic WhirlBob Keyak
LUTs 3,795 4,574
Flip-Flops 1,060 3,237
MUXs 90 159
Other 1 2
Total logic 4,946 7,972
13/17
14. Implementations
We first developed the implementations with a homemade VGA module (not utilizing
CPU at all). The implementations were then integrated into Xillinux and and made
accessible to user space daemons.
14/17
15. What to test?
We hope to measure for each candidate:
A Area. FPGA Slices or ASIC Gate Equivalents.
W Power. Power consumption (Watts = J/s).
R Speed. Ideal throughput (Bytes/Second).
One key goal is to maximize
e = RW
:
Note that doubling Afor factor 2 parallelism will approximately double both RandW
and ewill remain constant.
The same is true for doubling the clock frequency since power consumption is almost
linearly dependent on clock frequency for most (CMOS) circuits.
Hence Bytes/Joule is perhaps the most relevant metric for mobile devices.
15/17
16. Integration path for Linux/Android Testing
System-on-Chip ▶ The dominant underlying API for Linux is based on
CPU Core KERNEL
user space processes
Software SHI
SHI daemon
not available
AEAD Plugin engine
libmyaead.so
OpenSSL Crypto API
libcrypto.so
TLS API
libssl.so
SSH API
libssh.so
Browser
application
SSH
application
utilities
cmd tools
ciphers
protocols
apps
interprocess
communication
Cipher
Daemons
SHI
AEAD 1
SHI
AEAD 2
OpenSSL: libcrypto, libssl. Supported by browsers etc.
▶ OpenSSL supports configurable plugin “engines”.
▶ After recent bugs (heartbleed), new forks:
▶ Google: BoringSSL.
▶ OpenBSD group: LibreSSL (upcoming ressl API).
▶ Since the hardware accelerator is a shared resource,
implement as an user space daemon.
▶ Utilize experimental ciphersuite identifiers in
applications and TLS, SSH, IPSec. Plug-in CAESAR
ciphers to replace AES-GCM.
▶ Measure utilization, power, time, throughput with
realistic usage profiles.
16/17
17. Conclusions
▶ CAESAR is a project to find next-generation Authenticated Encryption algorithms.
▶ Proposed SÆHI, a simple memory-mapped hardware API for CAESAR ciphers.
▶ Realistic hardware target: System-on-Chip with FPGA logic and ARM Cortex A9.
▶ FPGA implementations of Keyak and WhirlBob algorithms.
▶ Integration path for Applications in Android.
next.. a little demo!
17/17