Using our tutorial, you will learn how to get a convolutional neural network model for the detection of object keypoints in images. For our tutorial we’ve chosen license plates as the object to be detected in images. However this tutorial can be used as a guide for detecting other objects.
This tutorial is an addition to the publication "Convolutional Neural Networks for Object Detection" at our website: http://rnd.azoft.com/convolutional-neural-networks-for-object-detection/
2. Using our tutorial, you will learn how to get a
convolutional neural network model for the
detection of object keypoints in images. For
our tutorial we’ve chosen license plates as the
object to be detected in images. However this
tutorial can be used as a guide for detecting
other objects.
About the project
1. Choosing images for neural network training
2. Labeling the keypoints
3. Data augmentation
4. Packing a dataset in HDF5
5. Training a convolutional neural network
6. Conclusion
7. Related links
Overview
rnd.azoft.com
3. rnd.azoft.com 1. Choosing images for neural network training
1. Choosing images for neural network training
The training dataset has to have somewhere from a few hundred to a few thousand original (not augmented)
images in total. The more, the better.
4. 2. Labeling the keypoints
2. Labeling the keypoints
If the keypoints in the original image were not labeled, then you need to label them. This means you need to save
keypoint coordinates in **.txt or **.csv file format. Each coordinate has two values for the horizontal axis and
for the vertical axis.
Remark: If you decided to label several keypoints, then you should label them in one sequence. For example - labeling
a license plate: you might label the left side upper plate’s angle by the first dot, the right side upper plate’s angle by
the second dot, the left side lower plate’s angle by the third dotand the right side lower plate’s angle by the fourth dot.
So in future, you should keep the same sequence.
rnd.azoft.com
5. rnd.azoft.com 3. Data augmentation
3. Data augmentation
For effective training you need to get a dataset with several thousand to tens of thousands of images.
If the initial dataset is not enough, you should apply augmentation of the images.
Remark: Before starting with augmentation split your database into training and control parts. This is required to
guarantee that images received by augmentation of one picture will be in the training as well as in the control part. If
you miss this step, you can barely follow the retraining of the model.
6. rnd.azoft.com 3. Data augmentation
Here are the transformations that can be implemented for the augmentation:
●
Rotations relative to the center
●
Perspective distortion
●
Resize
●
Shifts
●
Salt-and-pepper noise
●
Blurring and sharpening
●
Erosion and dilation
7. rnd.azoft.com 4. Packing a dataset in HDF5
4. Packing a dataset in HDF5
In order to use the Caffe framework, you need to pack the dataset into the file format HDF5.
You should normalize pixel values from 0 to 1 and coordinate values from -1 to 1.
Remark:
●
If you implemented augmentation for the initial images, the images in HDF5 have to follow in random order.
●
After packing the dataset in HDF5, you should check the received file using the utility HDF5 Viewer. The data of
pixels have to be from 0 to 1, whereas coordinates have to be from -1 to 1, and images must not be distorted.
10. rnd.azoft.com 5. Training the convolutional neural network
5. Training the convolutional neural network
We recommend using the optimization method ADAM to begin training a neural network.
The input layer should look like this: layer {
name: "data"
type: "HDF5Data"
top: "data"
top: "label"
hdf5_data_param {
source: "/home/user/caffe/examples/regression/regression_train.txt"
batch_size: 256
}
}
11. rnd.azoft.com 5. Training the convolutional neural network
The number of outputs at the output layer have to be
equal to the number of coordinate values. It’s better to use
the layer of error EuclideanLoss.
layer {
name: "ipout"
type: "InnerProduct"
bottom: "ip01"
top: "ipout"
inner_product_param {
num_output: 8
weight_filler {
type: "msra"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "loss"
type: "EuclideanLoss"
bottom: "ipout"
bottom: "label"
top: "loss"
}
12. It seems quite complicated to train a qualified
and quick neural network with just several
attempts. We made about 20 trials before we
got the appropriate outcome. If you have
some questions regarding the idea, the
experiment implementation, or the code, we’ll
be glad to answer you in comments below.
Conclusion
We have used these works as the base of the
experiment:
1. Using convolutional neural nets to detect facial keypoints
2. Сaffe-regression examples ,Kaggle face keypoint detection
Related links
rnd.azoft.com
Read the Detailed Convolutional Neural Networks for Object Detection Project Overview