8. AutoML mobile models : mnasnet_0.5_224.tflite
Edge TPU Compiler version 1.0.249710469
INFO: Initialized TensorFlow Lite runtime.
Invalid model: mnasnet_0.5_224.tflite
Model not quantized
量子化されていないのは、ダメ!
=> quantization-aware training
Quantization and Training of Neural Networks for Efficient
Integer-Arithmetic-Only Inference
11. Parameter data caching
引用
The Edge TPU has roughly 8 MB of SRAM that
can cache the model's parameter data.
However, a small amount of the RAM is first reserved for the
model's inference executable, so the parameter data uses
whatever space remains after that.
14. 続く
Naturally, saving the parameter data on the Edge
TPU RAM enables faster inferencing speed
compared to fetching the parameter data from
external memory.
=> たぶん、ホスト側のシステムメモリ
25. InterpreterBuilder::operator()
for (int subgraph_index = 0; subgraph_index < subgraphs->Length();
++subgraph_index) {
const tflite::SubGraph* subgraph = (*subgraphs)[subgraph_index];
tflite::Subgraph* modified_subgraph =
(*interpreter)->subgraph(subgraph_index);
auto operators = subgraph->operators();
auto tensors = subgraph->tensors();
途中略
// Finally setup nodes and tensors
if (ParseNodes(operators, modified_subgraph) != kTfLiteOk)
return cleanup_and_error();
if (ParseTensors(buffers, tensors, modified_subgraph) != kTfLiteOk)
return cleanup_and_error();
グラフをSubGraphに分割
SubGraphの中のノードをパース
26. InterpreterBuilder::ParseNodes
TfLiteStatus InterpreterBuilder::ParseNodes(
const flatbuffers::Vector<flatbuffers::Offset<Operator>>* operators,
Subgraph* subgraph) {
途中略
for (int i = 0; i < operators->Length(); ++i) {
const auto* op = operators->Get(i) ;
途中略
if (op->custom_options()) {
subgraph->AddNodeWithParameters (
FlatBufferIntArrayToVector( op->inputs()),
FlatBufferIntArrayToVector (op->outputs()),
reinterpret_cast<const char*>( op->custom_options()->data() ),
op->custom_options()->size() , nullptr, registration);
} else {
以降略
Opが Custon Op の場合
27. tensorflow/lite/schema/schema_v0.fbs
// An operator takes tensors as inputs and outputs. The type of operation being
// performed is determined by an index into the list of valid OperatorCodes,
// while the specifics of each operations is configured using builtin_options
// or custom_options.
table Operator {
// Index into the operator_codes array. Using an integer here avoids
// complicate map lookups.
opcode_index:int;
inputs:[int];
outputs:[int];
builtin_options:BuiltinOptions;
custom_options:[ubyte];
}
Opが Custon Op の場合は、これが設定されている!
30. Subgraph::OpInit
void* OpInit(const TfLiteRegistration& op_reg ,
const char* buffer, size_t length) {
if (op_reg.init == nullptr) return nullptr;
return op_reg.init(context_, buffer, length) ;
}
edgetpu.h
// Returns pointer to an instance of TfLiteRegistration to handle
// EdgeTPU custom ops, to be used with
// tflite::ops::builtin::BuiltinOpResolver::AddCustom
TfLiteRegistration* RegisterCustomOp();
edge TPU の場合は、RegisterCustionOp() にて獲得した OP
31. TfLiteRegistraion (その1)
// Initializes the op from serialized data.
// If a built-in op:
// `buffer` is the op's params data (TfLiteLSTMParams*).
// `length` is zero.
// If custom op:
// `buffer` is the op's `custom_options`.
// `length` is the size of the buffer.
//
// Returns a type-punned (i.e. void*) opaque data (e.g. a primitive pointer
// or an instance of a struct).
// The returned pointer will be stored with the node in the `user_data` field,
// accessible within prepare and invoke functions below.
// NOTE: if the data is already in the desired format, simply implement this
// function to return `nullptr` and implement the free function to be a no-op.
void* (*init)(TfLiteContext* context, const char* buffer, size_t length);
32. TfLiteRegistraion (その2)
// The pointer `buffer` is the data previously returned by an init invocation.
void (*free)(TfLiteContext* context, void* buffer);
// prepare is called when the inputs this node depends on have been resized.
// context->ResizeTensor() can be called to request output tensors to be
// resized.
//
// Returns kTfLiteOk on success.
TfLiteStatus (*prepare)(TfLiteContext* context, TfLiteNode* node);
// Execute the node (should read node->inputs and output to node->outputs).
// Returns kTfLiteOk on success.
TfLiteStatus (*invoke)(TfLiteContext* context, TfLiteNode* node);
33. std::unique_ptr<tflite::Interpreter> BuildEdgeTpuInterpreter(
const tflite::FlatBufferModel& model,
edgetpu::EdgeTpuContext* edgetpu_context) {
tflite::ops::builtin::BuiltinOpResolver resolver;
resolver.AddCustom (edgetpu::kCustomOp, edgetpu::RegisterCustomOp());
std::unique_ptr<tflite::Interpreter> interpreter;
if (tflite::InterpreterBuilder(model, resolver)(&interpreter) != kTfLiteOk) {
std::cerr << "Failed to build interpreter." << std::endl;
}
// Bind given context with interpreter.
interpreter->SetExternalContext(kTfLiteEdgeTpuContext, edgetpu_context);
interpreter->SetNumThreads(1);
if (interpreter->AllocateTensors() != kTfLiteOk) {
std::cerr << "Failed to allocate tensors." << std::endl;
}
return interpreter;
}
https://coral.googlesource.com/edgetpu-native/+/refs/heads/release-chef/edgetpu/cpp/examples/utils.cc#181
37. Model successfully compiled but not all operations are
supported by the Edge TPU. A percentage of the model will
instead run on the CPU, which is slower. If possible, consider
updating your model to use only operations supported by the
Edge TPU.
For details, visit g.co/coral/model-reqs.
Number of operations that will run on Edge TPU: 63
Number of operations that will run on CPU: 1
38. detect_edgetpu.log
DEPTHWISE_CONV_2D 13 Mapped to Edge TPU
RESHAPE 13 Mapped to Edge TPU
LOGISTIC 1 Mapped to Edge TPU
CUSTOM 1 Operation is working on an
unsupported data type
CONCATENATION 2 Mapped to Edge TPU
CONV_2D 34 Mapped to Edge TPU
Currently, the Edge TPU compiler cannot partition the model more than once, so
as soon as an unsupported operation occurs, that operation and everything after it
executes on the CPU, even if supported operations occur later.
46. まとめ
Google Edge TPU は、
・TensorFlow Lite の Custom Op を利用している
・Custom Op は、edgetpu_custom_op である
・edgetpu_custon_opに渡すデータは、
どうやら flatbuffer フォーマットの模様