Designing IA for AI - Information Architecture Conference 2024
A Peek into Google's Edge TPU
1. A Peek into Google’s
Edge TPU
Koan-Sin Tan
freedom@computer.org
April 18th, 2019
Hsinchu Coding Serfs Meeting
1
2. Who Am I?
• An old programmer, learned to use “open
source” stuff on VAX-11/780 running 4.3BSD
before the term “open source” was coined
• TensorFlow Contributor
• Search “Koan-Sin" at https://github.com/
tensorflow/tensorflow/releases
• PRs, https://github.com/tensorflow/
tensorflow/pulls?
utf8=%E2%9C%93&q=is%3Apr+author
%3Afreedomtan+
• Contributing to TensorFlow is quite easy.
There are many typos :-)
• Interested in using NN on edge devices. so
learned TFLite
• label_image for TFLite
2
4. Google Edge TPU
• Announced in Google Next
2018 (July, 2018)
• Available to general developers
right before TensorFlow Dev
Summit 2019 (Mar, 2019)
• USB: Coral Accelerator
• Dev Board: Coral Dev Board
• More are coming, e.g., PCI-E
Accelerator and SOM
• Supported framework: TFLite
https://coral.withgoogle.com/products/
4
5. • Updates released on April 11th, 2019
• Compiler: removed the restriction for specific architectures
• New TensorFlow Lite C++ API
• Updated Python API, mainly for multiple Edge TPUs
• Updated Mendel OS and Mendel Management Tool (MDT) tool
• Environmental Sensor Board, https://coral.withgoogle.com/products/
environmental/
https://developers.googleblog.com/2019/04/updates-from-coral-new-compiler-and.html
https://coral.withgoogle.com/news/updates-04-2019/
!5
7. Coral USB Accelerator
• USB 3.1 (gen 1) port and
cable (SuperSpeed, 5Gb/s
transfer speed)
• MobileNet V1 1.0 224
quantized: ~ 4.3 MiB,
• Recommended operating
conditions
•
• https://coral.withgoogle.com/tutorials/accelerator-datasheet/
• https://coral.withgoogle.com/tutorials/accelerator/
4.3 * 106
* 8/(5 * 109
) ≈ 70μs
Operating frequency Max ambient temperature
Default 35°C
Maximum 25°C
• Software environment
• Linux computer with a USB Port
• Debian 6.0 or higher, or any
derivative thereof (such as Ubuntu
10.0+)
• System architecture of either x86_64
or ARM64 with ARMv8 instruction
set
• Some caveats
• USB 2.0 hurts
• With newer Ubuntu, you have to
modify the installation script
• actually, ARMv7 also works
7
10. Coral Dev Board
• Edge TPU Module (SOM)
◦ NXP i.MX 8M SOC (Quad-core
Cortex-A53, plus Cortex-M4F)
◦ Google Edge TPU ML accelerator
coprocessor
◦ Cryptographic coprocessor
◦ Wi-Fi 2x2 MIMO (802.11b/g/n/ac
2.4/5GHz)
◦ Bluetooth 4.1
◦ 8GB eMMC
◦ 1GB LPDDR4
• USB connections
◦ USB Type-C power port (5V DC)
◦ USB 3.0 Type-C OTG port
◦ USB 3.0 Type-A host port
◦ USB 2.0 Micro-B serial console port
• Audio connections
◦ 3.5mm audio jack (CTIA compliant)
◦ Digital PDM microphone (x2)
◦ 2.54mm 4-pin terminal for stereo speakers
• Video connections
◦ HDMI 2.0a (full size)
◦ 39-pin FFC connector for MIPI DSI
display (4-lane)
◦ 24-pin FFC connector for MIPI CSI-2
camera (4-lane)
• MicroSD card slot
• Gigabit Ethernet port
• 40-pin GPIO expansion header
• Supports Mendel Linux (derivative of Debian)
https://coral.withgoogle.com/tutorials/devboard-datasheet/
https://www.blog.google/products/google-cloud/bringing-intelligence-to-the-edge-with-cloud-iot/10
11. Mendel Linux?
• https://pypi.org/project/
mendel-development-tool/
• https://
coral.googlesource.com/
mdt.git
• 404, several weeks ago
• now it’s there
• actually, there are lots more
information at https://
coral.googlesource.com/, let’s
look at them later
https://pypi.org/project/mendel-development-tool/
11
12. Mendel Linux
• It’s Debian-based one, apt tools can tell us many things
• And take a look at /etc/apt/sources.list. Yup, it’s there
• https://packages.cloud.google.com/apt/dists/mendel-bsp-
enterprise-beaker/main
• https://packages.cloud.google.com/apt/dists/mendel-
beaker/main
!12
15. Let’s start from the first
demo
• USB getting started guide:
• https://coral.withgoogle.com/tutorials/accelerator/
• BasicEngine->{ClassificationEngine, DetectionEngine}, ImprintingEngine
• BasicEngine is single line
• from edgetpu.swig.edgetpu_cpp_wrapper import BasicEngine
• swig: yes, the > 20 yo SWIG
• _edgetpu_cpp_wrapper.so
!15
ClassificationEngine DetectionEngine
BasicEngine ImprintingEngine
16. ClassifyWithImage(img, threshold=0.1, top_k=3, resample=Image.NEAREST)
ClassifyWithInputTensor(input_tensor, threshold=0.0, top_k=3)
__dict__
…
ClassificationEngine
RunInference(input)
get_input_tensor_shape()
get_all_output_tensors_sizes()
get_num_of_output_tensors()
get_output_tensor_size()
required_input_array_size()
total_output_array_size()
model_path()
get_raw_output()
get_inference_time()
device_path()
__dict__
…
BasicEngine
What are in Engines
• BasicEngine
• input and output related
• Classification
• still I/O related
• classification specific:
resizing input image and
what to output
16
17. performance!
• no existing way to reproduce those numbers
• classify_image.py uses
ClassificationEngine.ClassifyWithImage()
• ClassifyWithImage() —>
ClassifyWithInputTensors() —>
RunInference()
• preprocessing: image resize time
• post-processing: top_k and finding labels/
classes
• BasicEngine.get_inference_time() returns
something I cannot understand
• modified label_image.py (and
object_detection) for TFLite
• quite close
https://github.com/freedomtan/edge_tpu_python_scripts
17
22. Edge TPU’s canned model
• What do you mean by single
custom op
The compiler creates a single custom op for all Edge TPU
compatible ops; anything else stays the same
https://coral.withgoogle.com/docs/edgetpu/models-intro/
22
MobileNet V1 1×224×224×3
1×1001
edgetpu-custom-op
input
Softmax
1×300×300×3
1×1917×91
1×10×4 1×10 1×10 1
edgetpu-custom-op
TFLite_Detection_PostProcess
3 1917×4
normalized_input_image_tensor
TFLite_Detection_PostProcess TFLite_Detection_PostProcess:1 TFLite_Detection_PostProcess:2 TFLite_Detection_PostProcess:3
SSD MobileNet V1
23. Beyond Python
• _edgetpu_cpp_wrapper.so
• TensorFlow Lite runtime and others
• let’s take a look at _wrap_new_BasicEngine: aiy::BasicEngine::BasicEngine()
• aiy::BasicEngine::RunInference() —>
aiy::BasicEngine::RunInferenceHelper() —>
tflite::Interpreter::Invoke()
• unresolved edgetpu::EdgeTpuManager::GetSingleton()
• libedgetpu.so
• OpenSSL, Edge TPU context, communicating with the Edge TPU via USB or PCI
• edgetpu::EdgeTpuManager::GetSingleton()
• platforms::darwinn::tflite::EdgeTpuManagerDirect::GetSingleton()
!23
24. Edge TPU C++ API
• Released on April 11th, 2019
• binaries for x86_64, aarch64, and armeabi-v7a
• a simple header file
• two simple examples
• some doc at https://coral.withgoogle.com/docs/edgetpu/api-cpp/
• Native build on Dev Board
• the Dev Board is a quad-CA53 board, surely we can build code on it
• a small aarch64 patch https://github.com/tensorflow/tensorflow/commit/5520a9d82e5,
https://github.com/tensorflow/tensorflow/pull/16175
• https://github.com/freedomtan/edgetpu-native, label_image for tflite ported
!24
25. Edge TPU C++ API
•class EdgeTpuManager
•static EdgeTpuManager* GetSingleton();
•3 different
std::unique_ptr<EdgeTpuContext>
NewEdgeTpuContext()
•std::vector<DeviceEnumerationRecord>
EnumerateEdgeTpu()
•TfLiteStatus SetVerbosity(int verbosity)
•std::string Version()
• let’s take a look at ‘-v’ logs
• https://drive.google.com/
drive/folders/1-
MhGIgWHuhbKM6XrhPqyuLJ
DzoLD1t2g?usp=sharing
• in short, USB ones seem to
have more overhead
25
https://github.com/freedomtan/edgetpu-native/blob/label_image/libedgetpu/edgetpu.h#L110-
L158
26. 1×224×224×3
1×1×1×1024
1×1×1×1024
1×1×1×5
1×5
1×5
edgetpu-custom-op
L2Normalization
Conv2D
weights 5×1×1×1024
bias 5
Reshape
Softmax
input
Output
Imprinting Engine
• Yes, let’s check what it is
• The Imprinting Engine implements a low-shot learning technique
called ‘Imprinted Weights’ [1][2]
• Can be used to retrain classifiers on-device (either on USB
Accelerator or Dev Board), no back-propagation gradient involved.
• Why?
• Transfer-learning happens on-device, at near-realtime speed.
• You don't need to recompile the model.
• Limitations
• Training data size is limited to a max of 200 images per class.
• It is most suitable only for datasets that have a small inner
class variation.
• The last fully-connected layer runs on the CPU, not the Edge
TPU. So it will be slightly less efficient than running a pre-
compiled on Edge TPU.
• if you are interested in it, check the paper and
aiy::learn::imprinting::ImprintingEngine::Train(un
signed char const*, int, int)
26
[1] https://coral.withgoogle.com/docs/edgetpu/retrain-classification-ondevice/
[2] https://arxiv.org/abs/1712.07136
1×224×224×3
1×1×1×1024
edgetpu-custom-op
input
AvgPool
27. PCIe device?
• it’s Linux
• `uname -a`: Linux hopeful-nexus 4.9.51-imx #1 SMP
PREEMPT Thu Jan 31 01:58:26 UTC 2019 aarch64
GNU/Linux
• there is /proc/config.gz
• $ zcat /proc/config.gz | grep -i
edge
• CONFIG_SND_GOOGLE_EDGETPU_CARD=y
!27
28. PCIe Device
• apex driver is in gasket
• https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/
drivers/staging/gasket
• It’s upstreamed last year already
• https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/
drivers/staging/gasket/apex_driver.c
!28
29. Global Unichip Corp
USB Vendor id 0x1a6e = “Global Unichip Corp”
PCI Vendor id 0x1ac1 = “Global Unichip Corp”
!29
31. MCU on USB Accelerator
!31
https://www.seeedstudio.com/Coral-USB-Accelerator-p-2899.html
32. Power Consumption of the
USB Accelerator
• 4.94 x 0.18 ~= 0.9 W
• running Mobilenet-SSD
https://twitter.com/exsiva/status/1108692847719407616
32
33. Architecture of Edge TPU?
• Nope, I didn’t read it. Just
FYR
• https://patents.google.com/
patent/US20190050717A1/
33
34. Concluding Remarks
• Edge TPU is quite good for small models that you can converted to canned
ones
• Quantized UINT8
• not so good for some common larger models, e.g., Inception V3 and
ResNet 50
• your USB and CPU could be problems
• on-device re-training looks promising
• NCS 2 supports much more models for now
• How about NVIDIA Jetson Nano? Dunno, let’s wait and see. I don’t believe
GPU will win in the on long run.
!34