SlideShare una empresa de Scribd logo
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Machine Learning
Speaker:
ANCA CIURTE - AI Team Lead at Softvision-
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Outline
● Why machine learning on Android?
● Mostly:
○ Some insights about Object Detection algorithms
○ Practical example in Tensorflow
○ Data gathering and labeling
○ Model training
● Hopefully:
○ It will inspire you to deeg deeper
○ It won’t confuse you too much :)
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Machine learning
Why machine learning on Android?
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Why machine learning on Android?
● Object detection
○ Is a very common Computer Vision problem
○ Identifies the objects in the image and
provides their precise location
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Why machine learning on Android?
● Object detection
○ Is a very common Computer Vision problem
○ Identifies the objects in the image and
provides their precise location
● Why is it useful?
○ StreetView,
○ Self-driving cars etc.
E.g.: Street view - face
blurring
E.g.: Self driving cars - pedestrian
detection
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Why machine learning on Android?
● Object detection
○ Is a very common Computer Vision problem
○ Identifies the objects in the image and
provides their precise location
● Why is it useful?
○ StreetView,
○ Self-driving cars etc.
● Object detection: impact of deep learning
○ Deep convnets significantly increased
accuracy and processing time
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Why machine learning on Android?
● Object detection
○ Is a very common Computer Vision problem
○ Identifies the objects in the image and
provides their precise location
● Why is it useful?
○ StreetView,
○ Self-driving cars etc.
● Object detection: impact of deep learning
○ Deep convnets significantly increased
accuracy and processing time
● Why on Android?
○ We are living in the era when mobile took over
○ Running on mobile makes it possible to
deliver interactive and real time applications
○ Latest released phones have great computing
power
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Machine learning
Some insights about Object Detection
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Image classification with convnets
● Dataset
○ e.g. Cifar-10 dataset:
■ consists of 60000 32x32 colour images in 10 classes,
with 6000 images per class.
■ There are 50000 training images and 10000 test images.
● Training phase
○ e.g. VGG 16 network
○ input: labeled images (x,y)
Forward propagation (Given wl , compute predictions )
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Intuition about the convolution
Convolution Kernel
(weights)
Input image
* =
Another way to
understand the
convolution operation:
or: Convolution layer
or: Feature Map
or: Network’s parameters
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Image classification with convnets
● Dataset
○ e.g. Cifar-10 dataset:
■ consists of 60000 32x32 colour images in 10 classes,
with 6000 images per class.
■ There are 50000 training images and 10000 test images.
● Training phase
○ e.g. VGG 16 network
○ input: labeled images (x,y)
● Testing phase
○ Use the trained model to classify new instances
○ Detection output: predicted class
Forward propagation (Given wl , compute predictions )
Loss function:
Backward propagation (compute wl+1 by minimizing the loss)
Repeat until
convergence
=> w*
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Relation between classification and object detection
● We have an accurate way of classifying images
○ e.g.: does this image contain a pedestrian?
● But how can we say WHERE is this pedestrian?
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Relation between classification and object detection
● We have an accurate way of classifying images
○ e.g.: does this image contain a pedestrian?
● But how can we say WHERE is this pedestrian?
Solution:
● Sliding window
○ strategy:
■ splits into fragments and classify them independently
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Relation between classification and object detection
● We have an accurate way of classifying images
○ e.g.: does this image contain a pedestrian?
● But how can we say WHERE is this pedestrian?
Solution:
● Sliding window
○ strategy:
■ splits into fragments and classify them independently
Classified as pedestrian:All fragments:
...
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
● We have an accurate way of classifying images
○ e.g.: does this image contain a pedestrian?
● But how can we say WHERE is this pedestrian?
Solution:
● Sliding window
○ strategy:
■ splits into fragments and classify them independently
○ challenges :
■ how to deal with: various object size, various aspect ratio, object overlap or multiple responses
Relation between classification and object detection
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
● We have an accurate way of classifying images
○ e.g.: does this image contain a pedestrian?
● But how can we say WHERE is this pedestrian?
Solution:
● Sliding window
○ strategy:
■ splits into fragments and classify them independently
○ challenges :
■ how to deal with: various object size, various aspect ratio, object overlap or multiple responses
○ problem: need to apply CNN to huge number of locations and scales, very computationally expensive!!
Relation between classification and object detection
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
R-CNN (Region-based convolutional neural network)
Two steps:
● Select object proposals: Selective Search Algorithm
○ it has very low precision to be used as object
detector, but it works fine as a first step in the
detection pipeline
● Apply strong CNN classifier to select proposal
Girshick et al, “Rich feature hierarchies for accurate object detection and semantic segmentation”, CVPR 2014
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
R-CNN (Region-based convolutional neural network)
Two steps:
● Select object proposal: Selective Search Algorithm
○ it has very low precision to be used as object
detector, but it works fine as a first step in the
detection pipeline
● Apply strong CNN classifier to select proposal
It outperforms all the previous object detection algorithms
R-CNN
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
R-CNN (Region-based convolutional neural network)
Two steps:
● Select object proposal: Selective Search Algorithm
○ it has very low precision to be used as object
detector, but it works fine as a first step in the
detection pipeline
● Apply strong CNN classifier to select proposal
It outperforms all the previous object detection algorithms
Limitations:
● Depend on external algorithm hypothesis
● Need to rescale object proposals to fixed resolution
● Redundant computation - all features are
independently computed even for overlapped
proposal regions
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Fast R-CNN
From R-CNN to Fast R-CNN:
● input: image + region proposals
● region pooling on “conv5” feature map for feature
extraction
● softmax classifier instead of SVM classifier
● End to end multi-task training:
○ the last FC layer branch into two sibling
output layers:
■ one that produces softmax
probability estimates over K object
classes
■ another layer that outputs the
bounding box coordinates for each
object.
Girshick, “Fast R-CNN”, ICCV 2015
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Fast R-CNN
From R-CNN to Fast R-CNN:
● input: image + region proposals
● region pooling on “conv5” feature map for feature
extraction
● softmax classifier instead of SVM classifier
● End to end multi-task training:
○ the last FC layer branch into two sibling
output layers:
■ one that produces softmax
probability estimates over K object
classes
■ another layer that outputs the
bounding box coordinates for each
object.
Advantages:
● Higher detection quality (mAP) than R-CNN
● Training is single-stage
● Training can update all network layers at once
● No disk storage is required for feature caching
Girshick, “Fast R-CNN”, ICCV 2015
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Faster R-CNN
Faster R-CNN = Fast R-CNN + RPN (Region Proposal
Network)
● RPN
○ removes dependency from external hypothesis
ROI generation method
○ is a convolutional network trained end-to-end
○ generates a list of high-quality region proposal
(bbox coordinates + objectness scores)
● Then RPN + Fast R-CNN are merged into a single
network by sharing their convolutional features
○ predicts the class of the objects + a refined bbox
position
○ shared convolutional features enables nearly cost-
free region proposals
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun, “Faster R-CNN: Towards
Real-Time Object Detection with Region Proposal Networks”, NIPS 2015
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
SSD (Single shot detector)
● Extra feature layers
○ additional convolutional feature layers of different sizes are placed at
the end of base net
○ each added feature layer produce a set of detection predictions,
allowing predictions at multiple scales
○ this design lead to simple end-to-end training
Wei Liu et al., SSD: Single Shot MultiBox Detector, ECCV 2016
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
SSD (Single shot detector)
● Extra feature layers
○ additional convolutional feature layers of different sizes are placed at
the end of base net
○ each added feature layer produce a set of detection predictions,
allowing predictions at multiple scales
○ this design lead to simple end-to-end training
● ROIs proposal
○ output space of region proposals contains a fixed set of default boxes
over different aspect ratios and scales per feature map location
○ for each default bounding box, predict
○ the shape offsets Δ(cx, cy, w, h) and
○ the confidence for all object categories (c1, …, cp)
● Non-Maxima suppression
4x4 feature map
Wei Liu et al., SSD: Single Shot MultiBox Detector, ECCV 2016
8x8 feature map
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Compare modern convolutional object detectors
Lots of variables to set up ...
● base net:
○ VGG16
○ ResNet101
○ InceptionV2
○ InceptionV3
○ ResNet
○ MobileNet
● Object detection architecture:
○ R-CNN
○ Fast R-CNN
○ Faster R-CNN
○ SSD
● Input image resolution
● Number of region proposal
● Frozen weights - for fine tuning
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Lots of variables to set up ...
● base net:
○ VGG16
○ ResNet101
○ InceptionV2
○ InceptionV3
○ ResNet
○ MobileNet
● Object detection architecture:
○ R-CNN
○ Fast R-CNN
○ Faster R-CNN
○ SSD
● Input image resolution
● Number of region proposal
● Frozen weights - for fine tuning
Jonathan Huang et al., Speed/accuracy trade-offs for modern convolutional object detectors, CVPR 2017
Speed/accuracy trade-offs
Compare modern convolutional object detectors
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Lots of variables to set up ...
● base net:
○ VGG16
○ ResNet101
○ InceptionV2
○ InceptionV3
○ ResNet
○ MobileNet
● Object detection architecture:
○ R-CNN
○ Fast R-CNN
○ Faster R-CNN
○ SSD
● Input image resolution
● Number of region proposal
● Frozen weights - for fine tuning
Takeaways:
● Faster R-CNN is slower but more accurate
● SSD is much faster but not as accurate (therefore is a good choice for mobile apps)
Jonathan Huang et al., Speed/accuracy trade-offs for modern convolutional object detectors, CVPR 2017
Speed/accuracy trade-offs
Compare modern convolutional object detectors
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Coding time
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Coding time
Problem to solve:
- a mobile app for real time clothes detection
- class categories: Top, Pants, Shorts, Skirt and Dress
Frameworks:
● Tensorflow Object Detection API
- made by GOOGLE
- an open source framework built on top of TensorFlow that
makes it easy to construct, train and deploy object detection
models
- input: images + labels
- output: inference graph (.pb format)
● LabelImg
- an open source graphical image annotation tool
- annotations are saved as XML files in PASCAL VOC format,
the format used by ImageNet dataset
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Coding time: step by step
● Create dataset and split it into: train (70%) and test (30%) folders
● Label images with LabelImg tool (output: .xml files for each image in dataset)
● Convert .xml to .csv (use dataset/xml_to_csv.py script; output: train.csv, test.csv)
● Convert to TFRecord format
○ set paths (from ../models/research):
export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/object_detection
export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim
○ edit generate_tfrecord.py file and change the label map + path to the train/test folder:
○ finally execute the generate_tfrecord.py script in Terminal:
python generate_tfrecord.py --csv_input=data/train_labels.csv --output_path=data/train.record
python generate_tfrecord.py --csv_input=data/test_labels.csv --output_path=data/test.record
○ output: train.record, test.record
● Training
○ create a label map: label_map.pbtxt
○ optional, but recommended :), choose a pretrained model from here
○ prepare the .config file: .../models/research/object_detection/samples/configs/ssd_mobilenet_v2_coco.config
○ run training script (from ../models/research/object_detection):
python legacy/train.py --logtostderr --train_dir=training/ --pipeline_config_path=Ssd_mobilenet_v1_pets.config
● Export inference graph:
python export_inference_graph.py --input_type image_tensor --pipeline_config_path pipeline.config 
--trained_checkpoint_prefix=training/model.ckpt-10750 --output_directory=inference_graph
output: the model in .pb format
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
e-mail: anca.ciurte@softvision.ro
Q&A
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Integrating with Android
Speaker:
MIHALY NAGY - Android Community Influencer at Softvision
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Android + TensorFlow
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Android + TensorFlow
● Model File
● [Labels File]
● tensorflow-android dependency
● Boilerplate
● Integrate TF to process each frame
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Android + TensorFlow
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Android + TensorFlow
Bitmap
Recognition
each Frame
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Android + TensorFlow
Follow Along:
http://goo.gl/SYHSb7
https://github.com/code-twister/tf_example
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Coding time
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Thank You!
DroidCon Cluj 2018 - Hands on machine learning on android

Más contenido relacionado

Similar a DroidCon Cluj 2018 - Hands on machine learning on android

Content Based Image Retrieval (CBIR)
Content Based Image Retrieval (CBIR)Content Based Image Retrieval (CBIR)
Content Based Image Retrieval (CBIR)
Behzad Shomali
 
KNN Algorithm Using R | Edureka
KNN Algorithm Using R | EdurekaKNN Algorithm Using R | Edureka
KNN Algorithm Using R | Edureka
Edureka!
 
Pelee: a real time object detection system on mobile devices Paper Review
Pelee: a real time object detection system on mobile devices Paper ReviewPelee: a real time object detection system on mobile devices Paper Review
Pelee: a real time object detection system on mobile devices Paper Review
LEE HOSEONG
 
Artificial intelligence use cases for International Dating Apps. iDate 2018. ...
Artificial intelligence use cases for International Dating Apps. iDate 2018. ...Artificial intelligence use cases for International Dating Apps. iDate 2018. ...
Artificial intelligence use cases for International Dating Apps. iDate 2018. ...
Lluis Carreras
 
Automatism System Using Faster R-CNN and SVM
Automatism System Using Faster R-CNN and SVMAutomatism System Using Faster R-CNN and SVM
Automatism System Using Faster R-CNN and SVM
IRJET Journal
 
Virtual Environments as Driving Schools for Deep Learning Vision-Based Sensor...
Virtual Environments as Driving Schools for Deep Learning Vision-Based Sensor...Virtual Environments as Driving Schools for Deep Learning Vision-Based Sensor...
Virtual Environments as Driving Schools for Deep Learning Vision-Based Sensor...
Artur Filipowicz
 
Computer vision for transportation
Computer vision for transportationComputer vision for transportation
Computer vision for transportation
Wanjin Yu
 
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...
gdgsurrey
 
Avihu Efrat's Viola and Jones face detection slides
Avihu Efrat's Viola and Jones face detection slidesAvihu Efrat's Viola and Jones face detection slides
Avihu Efrat's Viola and Jones face detection slides
wolf
 
Machine learning ( Part 2 )
Machine learning ( Part 2 )Machine learning ( Part 2 )
Machine learning ( Part 2 )
Sunil OS
 
Wapid and wobust active online machine leawning with Vowpal Wabbit
Wapid and wobust active online machine leawning with Vowpal Wabbit Wapid and wobust active online machine leawning with Vowpal Wabbit
Wapid and wobust active online machine leawning with Vowpal Wabbit
Antti Haapala
 
深度學習在AOI的應用
深度學習在AOI的應用深度學習在AOI的應用
深度學習在AOI的應用
CHENHuiMei
 
VIBE: Video Inference for Human Body Pose and Shape Estimation
VIBE: Video Inference for Human Body Pose and Shape EstimationVIBE: Video Inference for Human Body Pose and Shape Estimation
VIBE: Video Inference for Human Body Pose and Shape Estimation
Arithmer Inc.
 
Object Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning FrameworkObject Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning Framework
Nader Karimi
 
Object Detection for Autonomous Cars using AI/ML
Object Detection for Autonomous Cars using AI/MLObject Detection for Autonomous Cars using AI/ML
Object Detection for Autonomous Cars using AI/ML
IRJET Journal
 
Fa19_P1.pptx
Fa19_P1.pptxFa19_P1.pptx
Fa19_P1.pptx
Md Abul Hayat
 
Advanced deep learning based object detection methods
Advanced deep learning based object detection methodsAdvanced deep learning based object detection methods
Advanced deep learning based object detection methods
Brodmann17
 
Rapid object detection using boosted cascade of simple features
Rapid object detection using boosted  cascade of simple featuresRapid object detection using boosted  cascade of simple features
Rapid object detection using boosted cascade of simple features
Hirantha Pradeep
 
Automatic image moderation in classifieds, Jarosław Szymczak
Automatic image moderation in classifieds, Jarosław SzymczakAutomatic image moderation in classifieds, Jarosław Szymczak
Automatic image moderation in classifieds, Jarosław Szymczak
Pôle Systematic Paris-Region
 
Automatic image moderation in classifieds
Automatic image moderation in classifiedsAutomatic image moderation in classifieds
Automatic image moderation in classifieds
Jaroslaw Szymczak
 

Similar a DroidCon Cluj 2018 - Hands on machine learning on android (20)

Content Based Image Retrieval (CBIR)
Content Based Image Retrieval (CBIR)Content Based Image Retrieval (CBIR)
Content Based Image Retrieval (CBIR)
 
KNN Algorithm Using R | Edureka
KNN Algorithm Using R | EdurekaKNN Algorithm Using R | Edureka
KNN Algorithm Using R | Edureka
 
Pelee: a real time object detection system on mobile devices Paper Review
Pelee: a real time object detection system on mobile devices Paper ReviewPelee: a real time object detection system on mobile devices Paper Review
Pelee: a real time object detection system on mobile devices Paper Review
 
Artificial intelligence use cases for International Dating Apps. iDate 2018. ...
Artificial intelligence use cases for International Dating Apps. iDate 2018. ...Artificial intelligence use cases for International Dating Apps. iDate 2018. ...
Artificial intelligence use cases for International Dating Apps. iDate 2018. ...
 
Automatism System Using Faster R-CNN and SVM
Automatism System Using Faster R-CNN and SVMAutomatism System Using Faster R-CNN and SVM
Automatism System Using Faster R-CNN and SVM
 
Virtual Environments as Driving Schools for Deep Learning Vision-Based Sensor...
Virtual Environments as Driving Schools for Deep Learning Vision-Based Sensor...Virtual Environments as Driving Schools for Deep Learning Vision-Based Sensor...
Virtual Environments as Driving Schools for Deep Learning Vision-Based Sensor...
 
Computer vision for transportation
Computer vision for transportationComputer vision for transportation
Computer vision for transportation
 
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...
 
Avihu Efrat's Viola and Jones face detection slides
Avihu Efrat's Viola and Jones face detection slidesAvihu Efrat's Viola and Jones face detection slides
Avihu Efrat's Viola and Jones face detection slides
 
Machine learning ( Part 2 )
Machine learning ( Part 2 )Machine learning ( Part 2 )
Machine learning ( Part 2 )
 
Wapid and wobust active online machine leawning with Vowpal Wabbit
Wapid and wobust active online machine leawning with Vowpal Wabbit Wapid and wobust active online machine leawning with Vowpal Wabbit
Wapid and wobust active online machine leawning with Vowpal Wabbit
 
深度學習在AOI的應用
深度學習在AOI的應用深度學習在AOI的應用
深度學習在AOI的應用
 
VIBE: Video Inference for Human Body Pose and Shape Estimation
VIBE: Video Inference for Human Body Pose and Shape EstimationVIBE: Video Inference for Human Body Pose and Shape Estimation
VIBE: Video Inference for Human Body Pose and Shape Estimation
 
Object Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning FrameworkObject Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning Framework
 
Object Detection for Autonomous Cars using AI/ML
Object Detection for Autonomous Cars using AI/MLObject Detection for Autonomous Cars using AI/ML
Object Detection for Autonomous Cars using AI/ML
 
Fa19_P1.pptx
Fa19_P1.pptxFa19_P1.pptx
Fa19_P1.pptx
 
Advanced deep learning based object detection methods
Advanced deep learning based object detection methodsAdvanced deep learning based object detection methods
Advanced deep learning based object detection methods
 
Rapid object detection using boosted cascade of simple features
Rapid object detection using boosted  cascade of simple featuresRapid object detection using boosted  cascade of simple features
Rapid object detection using boosted cascade of simple features
 
Automatic image moderation in classifieds, Jarosław Szymczak
Automatic image moderation in classifieds, Jarosław SzymczakAutomatic image moderation in classifieds, Jarosław Szymczak
Automatic image moderation in classifieds, Jarosław Szymczak
 
Automatic image moderation in classifieds
Automatic image moderation in classifiedsAutomatic image moderation in classifieds
Automatic image moderation in classifieds
 

Último

Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
David Brossard
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 

Último (20)

Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 

DroidCon Cluj 2018 - Hands on machine learning on android

  • 1.
  • 2.
  • 3. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com Machine Learning Speaker: ANCA CIURTE - AI Team Lead at Softvision-
  • 4. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com Outline ● Why machine learning on Android? ● Mostly: ○ Some insights about Object Detection algorithms ○ Practical example in Tensorflow ○ Data gathering and labeling ○ Model training ● Hopefully: ○ It will inspire you to deeg deeper ○ It won’t confuse you too much :)
  • 5. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com Machine learning Why machine learning on Android?
  • 6. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com Why machine learning on Android? ● Object detection ○ Is a very common Computer Vision problem ○ Identifies the objects in the image and provides their precise location
  • 7. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com Why machine learning on Android? ● Object detection ○ Is a very common Computer Vision problem ○ Identifies the objects in the image and provides their precise location ● Why is it useful? ○ StreetView, ○ Self-driving cars etc. E.g.: Street view - face blurring E.g.: Self driving cars - pedestrian detection
  • 8. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com Why machine learning on Android? ● Object detection ○ Is a very common Computer Vision problem ○ Identifies the objects in the image and provides their precise location ● Why is it useful? ○ StreetView, ○ Self-driving cars etc. ● Object detection: impact of deep learning ○ Deep convnets significantly increased accuracy and processing time
  • 9. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com Why machine learning on Android? ● Object detection ○ Is a very common Computer Vision problem ○ Identifies the objects in the image and provides their precise location ● Why is it useful? ○ StreetView, ○ Self-driving cars etc. ● Object detection: impact of deep learning ○ Deep convnets significantly increased accuracy and processing time ● Why on Android? ○ We are living in the era when mobile took over ○ Running on mobile makes it possible to deliver interactive and real time applications ○ Latest released phones have great computing power
  • 10. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com Machine learning Some insights about Object Detection
  • 11. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com Image classification with convnets ● Dataset ○ e.g. Cifar-10 dataset: ■ consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. ■ There are 50000 training images and 10000 test images. ● Training phase ○ e.g. VGG 16 network ○ input: labeled images (x,y) Forward propagation (Given wl , compute predictions )
  • 12. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com Intuition about the convolution Convolution Kernel (weights) Input image * = Another way to understand the convolution operation: or: Convolution layer or: Feature Map or: Network’s parameters
  • 13. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com Image classification with convnets ● Dataset ○ e.g. Cifar-10 dataset: ■ consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. ■ There are 50000 training images and 10000 test images. ● Training phase ○ e.g. VGG 16 network ○ input: labeled images (x,y) ● Testing phase ○ Use the trained model to classify new instances ○ Detection output: predicted class Forward propagation (Given wl , compute predictions ) Loss function: Backward propagation (compute wl+1 by minimizing the loss) Repeat until convergence => w*
  • 14. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com Relation between classification and object detection ● We have an accurate way of classifying images ○ e.g.: does this image contain a pedestrian? ● But how can we say WHERE is this pedestrian?
  • 15. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com Relation between classification and object detection ● We have an accurate way of classifying images ○ e.g.: does this image contain a pedestrian? ● But how can we say WHERE is this pedestrian? Solution: ● Sliding window ○ strategy: ■ splits into fragments and classify them independently
  • 16. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com Relation between classification and object detection ● We have an accurate way of classifying images ○ e.g.: does this image contain a pedestrian? ● But how can we say WHERE is this pedestrian? Solution: ● Sliding window ○ strategy: ■ splits into fragments and classify them independently Classified as pedestrian:All fragments: ...
  • 17. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com ● We have an accurate way of classifying images ○ e.g.: does this image contain a pedestrian? ● But how can we say WHERE is this pedestrian? Solution: ● Sliding window ○ strategy: ■ splits into fragments and classify them independently ○ challenges : ■ how to deal with: various object size, various aspect ratio, object overlap or multiple responses Relation between classification and object detection
  • 18. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com ● We have an accurate way of classifying images ○ e.g.: does this image contain a pedestrian? ● But how can we say WHERE is this pedestrian? Solution: ● Sliding window ○ strategy: ■ splits into fragments and classify them independently ○ challenges : ■ how to deal with: various object size, various aspect ratio, object overlap or multiple responses ○ problem: need to apply CNN to huge number of locations and scales, very computationally expensive!! Relation between classification and object detection
  • 19. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com R-CNN (Region-based convolutional neural network) Two steps: ● Select object proposals: Selective Search Algorithm ○ it has very low precision to be used as object detector, but it works fine as a first step in the detection pipeline ● Apply strong CNN classifier to select proposal Girshick et al, “Rich feature hierarchies for accurate object detection and semantic segmentation”, CVPR 2014
  • 20. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com R-CNN (Region-based convolutional neural network) Two steps: ● Select object proposal: Selective Search Algorithm ○ it has very low precision to be used as object detector, but it works fine as a first step in the detection pipeline ● Apply strong CNN classifier to select proposal It outperforms all the previous object detection algorithms R-CNN
  • 21. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com R-CNN (Region-based convolutional neural network) Two steps: ● Select object proposal: Selective Search Algorithm ○ it has very low precision to be used as object detector, but it works fine as a first step in the detection pipeline ● Apply strong CNN classifier to select proposal It outperforms all the previous object detection algorithms Limitations: ● Depend on external algorithm hypothesis ● Need to rescale object proposals to fixed resolution ● Redundant computation - all features are independently computed even for overlapped proposal regions
  • 22. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com Fast R-CNN From R-CNN to Fast R-CNN: ● input: image + region proposals ● region pooling on “conv5” feature map for feature extraction ● softmax classifier instead of SVM classifier ● End to end multi-task training: ○ the last FC layer branch into two sibling output layers: ■ one that produces softmax probability estimates over K object classes ■ another layer that outputs the bounding box coordinates for each object. Girshick, “Fast R-CNN”, ICCV 2015
  • 23. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com Fast R-CNN From R-CNN to Fast R-CNN: ● input: image + region proposals ● region pooling on “conv5” feature map for feature extraction ● softmax classifier instead of SVM classifier ● End to end multi-task training: ○ the last FC layer branch into two sibling output layers: ■ one that produces softmax probability estimates over K object classes ■ another layer that outputs the bounding box coordinates for each object. Advantages: ● Higher detection quality (mAP) than R-CNN ● Training is single-stage ● Training can update all network layers at once ● No disk storage is required for feature caching Girshick, “Fast R-CNN”, ICCV 2015
  • 24. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com Faster R-CNN Faster R-CNN = Fast R-CNN + RPN (Region Proposal Network) ● RPN ○ removes dependency from external hypothesis ROI generation method ○ is a convolutional network trained end-to-end ○ generates a list of high-quality region proposal (bbox coordinates + objectness scores) ● Then RPN + Fast R-CNN are merged into a single network by sharing their convolutional features ○ predicts the class of the objects + a refined bbox position ○ shared convolutional features enables nearly cost- free region proposals Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks”, NIPS 2015
  • 25. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com SSD (Single shot detector) ● Extra feature layers ○ additional convolutional feature layers of different sizes are placed at the end of base net ○ each added feature layer produce a set of detection predictions, allowing predictions at multiple scales ○ this design lead to simple end-to-end training Wei Liu et al., SSD: Single Shot MultiBox Detector, ECCV 2016
  • 26. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com SSD (Single shot detector) ● Extra feature layers ○ additional convolutional feature layers of different sizes are placed at the end of base net ○ each added feature layer produce a set of detection predictions, allowing predictions at multiple scales ○ this design lead to simple end-to-end training ● ROIs proposal ○ output space of region proposals contains a fixed set of default boxes over different aspect ratios and scales per feature map location ○ for each default bounding box, predict ○ the shape offsets Δ(cx, cy, w, h) and ○ the confidence for all object categories (c1, …, cp) ● Non-Maxima suppression 4x4 feature map Wei Liu et al., SSD: Single Shot MultiBox Detector, ECCV 2016 8x8 feature map
  • 27. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com Compare modern convolutional object detectors Lots of variables to set up ... ● base net: ○ VGG16 ○ ResNet101 ○ InceptionV2 ○ InceptionV3 ○ ResNet ○ MobileNet ● Object detection architecture: ○ R-CNN ○ Fast R-CNN ○ Faster R-CNN ○ SSD ● Input image resolution ● Number of region proposal ● Frozen weights - for fine tuning
  • 28. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com Lots of variables to set up ... ● base net: ○ VGG16 ○ ResNet101 ○ InceptionV2 ○ InceptionV3 ○ ResNet ○ MobileNet ● Object detection architecture: ○ R-CNN ○ Fast R-CNN ○ Faster R-CNN ○ SSD ● Input image resolution ● Number of region proposal ● Frozen weights - for fine tuning Jonathan Huang et al., Speed/accuracy trade-offs for modern convolutional object detectors, CVPR 2017 Speed/accuracy trade-offs Compare modern convolutional object detectors
  • 29. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com Lots of variables to set up ... ● base net: ○ VGG16 ○ ResNet101 ○ InceptionV2 ○ InceptionV3 ○ ResNet ○ MobileNet ● Object detection architecture: ○ R-CNN ○ Fast R-CNN ○ Faster R-CNN ○ SSD ● Input image resolution ● Number of region proposal ● Frozen weights - for fine tuning Takeaways: ● Faster R-CNN is slower but more accurate ● SSD is much faster but not as accurate (therefore is a good choice for mobile apps) Jonathan Huang et al., Speed/accuracy trade-offs for modern convolutional object detectors, CVPR 2017 Speed/accuracy trade-offs Compare modern convolutional object detectors
  • 30. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com Coding time
  • 31. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com Coding time Problem to solve: - a mobile app for real time clothes detection - class categories: Top, Pants, Shorts, Skirt and Dress Frameworks: ● Tensorflow Object Detection API - made by GOOGLE - an open source framework built on top of TensorFlow that makes it easy to construct, train and deploy object detection models - input: images + labels - output: inference graph (.pb format) ● LabelImg - an open source graphical image annotation tool - annotations are saved as XML files in PASCAL VOC format, the format used by ImageNet dataset
  • 32. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com Coding time: step by step ● Create dataset and split it into: train (70%) and test (30%) folders ● Label images with LabelImg tool (output: .xml files for each image in dataset) ● Convert .xml to .csv (use dataset/xml_to_csv.py script; output: train.csv, test.csv) ● Convert to TFRecord format ○ set paths (from ../models/research): export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/object_detection export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim ○ edit generate_tfrecord.py file and change the label map + path to the train/test folder: ○ finally execute the generate_tfrecord.py script in Terminal: python generate_tfrecord.py --csv_input=data/train_labels.csv --output_path=data/train.record python generate_tfrecord.py --csv_input=data/test_labels.csv --output_path=data/test.record ○ output: train.record, test.record ● Training ○ create a label map: label_map.pbtxt ○ optional, but recommended :), choose a pretrained model from here ○ prepare the .config file: .../models/research/object_detection/samples/configs/ssd_mobilenet_v2_coco.config ○ run training script (from ../models/research/object_detection): python legacy/train.py --logtostderr --train_dir=training/ --pipeline_config_path=Ssd_mobilenet_v1_pets.config ● Export inference graph: python export_inference_graph.py --input_type image_tensor --pipeline_config_path pipeline.config --trained_checkpoint_prefix=training/model.ckpt-10750 --output_directory=inference_graph output: the model in .pb format
  • 33. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com e-mail: anca.ciurte@softvision.ro Q&A
  • 34. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com Integrating with Android Speaker: MIHALY NAGY - Android Community Influencer at Softvision
  • 35. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com Android + TensorFlow
  • 36. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com Android + TensorFlow ● Model File ● [Labels File] ● tensorflow-android dependency ● Boilerplate ● Integrate TF to process each frame
  • 37. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com Android + TensorFlow
  • 38. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com Android + TensorFlow Bitmap Recognition each Frame
  • 39. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com Android + TensorFlow Follow Along: http://goo.gl/SYHSb7 https://github.com/code-twister/tf_example
  • 40. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com Coding time
  • 41. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com Thank You!

Notas del editor

  1. Running on mobile makes it possible to deliver interactive and real time applications in a way that’s not possible when depending on the internet connection
  2. https://ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets/
  3. multile scales and aspect ratios are handles by search windows of different size and aspect, or by image scaling
  4. From R-CNN to Fast R-CNN: region pooling on “conv5” feature map for deature extraction softmax classifier instead of SVM classifier Multitask training: the last fc layer branch into two sibling output layers: one that produces softmax probability estimates over K object classes another layer that outputs the bounding box coordinates for each object. First, a CNN is applied on the whole original image with several convolutional (conv) and max pooling layers to produce a conv feature map. Then, for each object proposal a region of interest (RoI) pooling layer extracts a fixed-length feature vector from the feature map and fed into a sequence of fully connected (fc) layers. fc layers finally branch into two sibling output layers: one that produces softmax probability estimates over K object classes another layer that outputs the bounding box coordinates for each object.
  5. From R-CNN to Fast R-CNN: region pooling on “conv5” feature map for deature extraction softmax classifier instead of SVM classifier Multitask training: the last fc layer branch into two sibling output layers: one that produces softmax probability estimates over K object classes another layer that outputs the bounding box coordinates for each object. First, a CNN is applied on the whole original image with several convolutional (conv) and max pooling layers to produce a conv feature map. Then, for each object proposal a region of interest (RoI) pooling layer extracts a fixed-length feature vector from the feature map and fed into a sequence of fully connected (fc) layers. fc layers finally branch into two sibling output layers: one that produces softmax probability estimates over K object classes another layer that outputs the bounding box coordinates for each object.
  6. A Region Proposal Network (RPN) takes an image (of any size) as input and outputs a set of rectangular object proposals, each with an objectness score.
  7. SSD approach: produces a fixed-size collection of bounding boxes and scores for the presence of object class instances in those boxes followed by a non-maximum suppression step to produce the final detections. Network generates scores for each default box Wei Liu et al., SSD: Single Shot MultiBox Detector, ECCV 2016
  8. SSD discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location Wei Liu et al., SSD: Single Shot MultiBox Detector, ECCV 2016
  9. There are several algorithms of Object detection The question is: how well they compete to each other? We define several meta parameters that influence detectors performance Critical points on the curve that can be identified: mAP = mean average precision [Huang et al.] measured the influence of these metaparams on accuracy and speed Jonathan Huang et al., Speed/accuracy trade-offs for modern convolutional object detectors, CVPR 2017
  10. There are several algorithms of Object detection The question is: how well they compete to each other? We define several meta parameters that influence detectors performance Critical points on the curve that can be identified: mAP = mean average precision [Huang et al.] measured the influence of these metaparams on accuracy and speed Jonathan Huang et al., Speed/accuracy trade-offs for modern convolutional object detectors, CVPR 2017
  11. There are several algorithms of Object detection The question is: how well they compete to each other? We define several meta parameters that influence detectors performance Critical points on the curve that can be identified: mAP = mean average precision [Huang et al.] measured the influence of these metaparams on accuracy and speed Jonathan Huang et al., Speed/accuracy trade-offs for modern convolutional object detectors, CVPR 2017
  12. Recognition refers to the objects detected not the process