1. The speaker will demonstrate object detection on Android using TensorFlow and the SSD model.
2. SSD is well-suited for mobile as it is faster than other models like Faster R-CNN while maintaining reasonable accuracy.
3. The example will involve gathering image data, labeling objects, training an SSD model in TensorFlow, and integrating it into an Android app for real-time clothes detection on mobile.
The document summarizes Assaf Mushinsky's presentation at CVPR 2017. Some key points:
- He discussed state-of-the-art research in object detection, segmentation, pose estimation, and network architectures from papers presented at CVPR 2017.
- Papers presented efficient object detection methods that improved speed and accuracy trade-offs like YOLO9000 and Feature Pyramid Networks. Mask R-CNN was discussed for instance segmentation and pose estimation.
- New network architectures like Densely Connected Networks, Xception, and ResNeXt were covered that improved accuracy and efficiency over ResNet and Inception.
- The presentation highlighted recent advances in computer vision from the CVPR conference but did not cover older
This is an intensive meetup at Samsung Next IL covering most interesting papers that were presented in CVPR 2017 last month. It is a good opportunity to have an overview of recent advancements in the field of Deep Learning with applications to Computer-Vision.
The following topics are covered:
• Object detection
• Pose estimation
• Efficient networks
1) The document presents DAVE, a unified framework using two CNNs (FVPN and ALN) for fast vehicle detection and annotation of attributes like pose, color, and type.
2) The FVPN is a shallow fully convolutional network that efficiently generates vehicle proposals. The ALN is based on GoogLeNet and extended with additional layers for multi-task learning of vehicle attributes.
3) The two networks are jointly trained using a large vehicle dataset, with the FVPN providing proposals to the ALN for attribute annotation. Experiments show DAVE outperforms other methods on vehicle detection and annotation tasks.
Automatic Image Cropping - A journey from a Master Thesis to ProductionAlexey Grigorev
The document discusses developing a neural network model for automatic image cropping. It proposes that properly cropped images can improve listing performance on online classifieds sites. The model uses a Deeply Supervised Salient Network (DSS) which improves on fully convolutional networks with deep supervision and short connections. Experiments were conducted on an online classifieds site by manually evaluating cropped images in different categories. The best performing category of engagement rings was selected for initial deployment. The system architecture includes components for image enhancement, cropping, hosting, and a frontend interface.
This document discusses deep learning techniques for object detection and recognition. It provides an overview of computer vision tasks like image classification and object detection. It then discusses how crowdsourcing large datasets from the internet and advances in machine learning, specifically deep convolutional neural networks (CNNs), have led to major breakthroughs in object detection. Several state-of-the-art CNN models for object detection are described, including R-CNN, Fast R-CNN, Faster R-CNN, SSD, and YOLO. The document also provides examples of applying these techniques to tasks like face detection and detecting manta rays from aerial videos.
Deep reinforcement learning framework for autonomous drivingGopikaGopinath5
Motivated by the successful demonstrations of learning of Atari games and Go by Google DeepMind, it is possible to propose a framework for autonomous driving using deep reinforcement learning.
It incorporates Recurrent Neural Networks for information integration, enabling the car to handle partially observable scenarios.
Presented by Mr. Dinesh KS
Software Developer, Livares Technologies
Introduction
Object detection is a computer technology related to computer vision and image processing that
deals with detecting instances of semantic objects of a certain class (such as humans, buildings, or
cars) in digital images and videos.
Face detection is a computer technology being used in a variety of applications that identifies
human faces in digital images.
The document discusses implementing deep learning algorithms for object detection and scene perception in self-driving cars. It compares the YOLO and Faster R-CNN models, finding that Faster R-CNN has higher accuracy (mAP of 41.8) but lower speed (17.1 FPS), while YOLO has lower accuracy (mAP of 18.6) but higher speed (212.4 FPS). The authors conclude that achieving both high accuracy and high speed remains a goal for future work, which could explore using newer versions of YOLO or other models.
The document summarizes Assaf Mushinsky's presentation at CVPR 2017. Some key points:
- He discussed state-of-the-art research in object detection, segmentation, pose estimation, and network architectures from papers presented at CVPR 2017.
- Papers presented efficient object detection methods that improved speed and accuracy trade-offs like YOLO9000 and Feature Pyramid Networks. Mask R-CNN was discussed for instance segmentation and pose estimation.
- New network architectures like Densely Connected Networks, Xception, and ResNeXt were covered that improved accuracy and efficiency over ResNet and Inception.
- The presentation highlighted recent advances in computer vision from the CVPR conference but did not cover older
This is an intensive meetup at Samsung Next IL covering most interesting papers that were presented in CVPR 2017 last month. It is a good opportunity to have an overview of recent advancements in the field of Deep Learning with applications to Computer-Vision.
The following topics are covered:
• Object detection
• Pose estimation
• Efficient networks
1) The document presents DAVE, a unified framework using two CNNs (FVPN and ALN) for fast vehicle detection and annotation of attributes like pose, color, and type.
2) The FVPN is a shallow fully convolutional network that efficiently generates vehicle proposals. The ALN is based on GoogLeNet and extended with additional layers for multi-task learning of vehicle attributes.
3) The two networks are jointly trained using a large vehicle dataset, with the FVPN providing proposals to the ALN for attribute annotation. Experiments show DAVE outperforms other methods on vehicle detection and annotation tasks.
Automatic Image Cropping - A journey from a Master Thesis to ProductionAlexey Grigorev
The document discusses developing a neural network model for automatic image cropping. It proposes that properly cropped images can improve listing performance on online classifieds sites. The model uses a Deeply Supervised Salient Network (DSS) which improves on fully convolutional networks with deep supervision and short connections. Experiments were conducted on an online classifieds site by manually evaluating cropped images in different categories. The best performing category of engagement rings was selected for initial deployment. The system architecture includes components for image enhancement, cropping, hosting, and a frontend interface.
This document discusses deep learning techniques for object detection and recognition. It provides an overview of computer vision tasks like image classification and object detection. It then discusses how crowdsourcing large datasets from the internet and advances in machine learning, specifically deep convolutional neural networks (CNNs), have led to major breakthroughs in object detection. Several state-of-the-art CNN models for object detection are described, including R-CNN, Fast R-CNN, Faster R-CNN, SSD, and YOLO. The document also provides examples of applying these techniques to tasks like face detection and detecting manta rays from aerial videos.
Deep reinforcement learning framework for autonomous drivingGopikaGopinath5
Motivated by the successful demonstrations of learning of Atari games and Go by Google DeepMind, it is possible to propose a framework for autonomous driving using deep reinforcement learning.
It incorporates Recurrent Neural Networks for information integration, enabling the car to handle partially observable scenarios.
Presented by Mr. Dinesh KS
Software Developer, Livares Technologies
Introduction
Object detection is a computer technology related to computer vision and image processing that
deals with detecting instances of semantic objects of a certain class (such as humans, buildings, or
cars) in digital images and videos.
Face detection is a computer technology being used in a variety of applications that identifies
human faces in digital images.
The document discusses implementing deep learning algorithms for object detection and scene perception in self-driving cars. It compares the YOLO and Faster R-CNN models, finding that Faster R-CNN has higher accuracy (mAP of 41.8) but lower speed (17.1 FPS), while YOLO has lower accuracy (mAP of 18.6) but higher speed (212.4 FPS). The authors conclude that achieving both high accuracy and high speed remains a goal for future work, which could explore using newer versions of YOLO or other models.
Content-based image retrieval (CBIR) uses computer vision techniques to search for and retrieve images from large databases based on visual similarities. CBIR systems typically extract features from images and measure similarities to return images matching a query image. Popular applications include Google Images, eBay, and Pinterest. Evaluation of CBIR systems focuses on precision and recall metrics, as precision alone is insufficient without also considering recall. Training siamese networks for CBIR requires loss functions that pull similar images closer together and push dissimilar images farther apart.
YouTube: https://youtu.be/XSoau_q0kz8
** Data Science Certification using R: https://www.edureka.co/data-science **
This Edureka PPT on "KNN algorithm using R", will help you learn about the KNN algorithm in depth, you'll also see how KNN is used to solve real-world problems. Below are the topics covered in this module:
Introduction to Machine Learning
What is KNN Algorithm?
KNN Use Case
KNN Algorithm step by step
Hands - On
Introduction to Machine Learning
What is KNN Algorithm?
KNN Use Case
KNN Algorithm step by step
Hands - On
Blog Series: http://bit.ly/data-science-blogs
Data Science Training Playlist: http://bit.ly/data-science-playlist
Follow us to never miss an update in the future.
YouTube: https://www.youtube.com/user/edurekaIN
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
Pelee: a real time object detection system on mobile devices Paper ReviewLEE HOSEONG
This document summarizes the Pelee object detection system which uses the PeleeNet efficient feature extraction network for real-time object detection on mobile devices. PeleeNet improves on DenseNet with two-way dense layers, a stem block, dynamic bottleneck layers, and transition layers without compression. Pelee uses SSD with PeleeNet, selecting fewer feature maps and adding residual prediction blocks for faster, more accurate detection compared to SSD and YOLO. The document concludes that PeleeNet and Pelee achieve real-time classification and detection on devices, outperforming existing models in speed, cost and accuracy with simple code.
Artificial intelligence use cases for International Dating Apps. iDate 2018. ...Lluis Carreras
As Andrew Ng says, AI is the new electricity and it will transform many industries, therefore dating is going to be transform by the use of AI.
With the experience of having learned AI during the university several years ago, and having updated it since 2016, plus 10 years working experience in the dating industry, this presentation shows the evolution of AI during these last years, and shows some AI examples already used in Dating services.
Then shows where AI can be applied to dating services, what is needed, which models can be used, shows the building process, and how can be done.
Automatism System Using Faster R-CNN and SVMIRJET Journal
The document describes a proposed system to automatically manage vacant parking spaces using computer vision techniques. The system would use existing surveillance cameras installed in parking lots. It detects vehicles in images using a Faster R-CNN object detection model. This model uses a Region Proposal Network to quickly detect objects. An SVM classifier is then used to classify detected objects as free or occupied parking spaces. The goal is to assist drivers in finding available spaces more efficiently.
Virtual Environments as Driving Schools for Deep Learning Vision-Based Sensor...Artur Filipowicz
This presentation explores the interaction between virtual reality simulation and Deep Learning which may develop computer vision that rivals human vision. The specific problem considered is detection and localization of a stop object, the stop sign, based on an image. A video game, Grand Theft Auto 5, is used to collect over half a million images and corresponding ground truth labels with and without stop signs in various lighting and weather conditions. A deep convolutional neural network trained on this data and fine tuned on real world data achieves accuracy in stop sign detection of over 95% within 20 meters of the stop sign and has a false positive rate of 4% on test data from the real world. Additionally, the physical constraints on this problem are analysed, and a framework for the use of simulators is developed.
The document provides an introduction to computer vision concepts including neural network structures, activation functions, convolution operators, pooling layers, and batch normalization. It then discusses image classification, including popular datasets, classification networks from LeNet to DLA, and experiments on car brand classification. Finally, it covers object detection, comparing region-based methods like R-CNN, Fast R-CNN, Faster R-CNN, and R-FCN to region-free methods like YOLO.
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...gdgsurrey
Dive into the essentials of ML model development, processes, and techniques to combat underfitting and overfitting, explore distributed training approaches, and understand model explainability. Enhance your skills with practical insights from a seasoned expert.
Avihu Efrat's Viola and Jones face detection slideswolf
The document summarizes the Viola-Jones object detection framework. It uses a cascade of classifiers with increasingly more complex features trained with AdaBoost to rapidly detect objects. Integral images allow for very fast feature evaluations. The framework was applied to face detection, achieving very fast average detection speeds of 270 microseconds per sub-window while maintaining low false positive rates.
Supervised learning involves using a training dataset to learn a target function that can be used to predict class labels or attribute values. The document discusses supervised learning and classification, including types of supervised learning problems like classification and regression. It provides examples of classification algorithms like K-nearest neighbors, decision trees, naive Bayes, and support vector machines. It also gives examples of how to implement classification algorithms using scikit-learn and discusses evaluating classification model performance based on accuracy.
Wapid and wobust active online machine leawning with Vowpal Wabbit Antti Haapala
Vowpal Wabbit is a machine learning library that provides fast, scalable, and online learning algorithms. It can handle large datasets with millions of features efficiently using hashing and sparse representations. Unlike other libraries, Vowpal Wabbit is designed for online and active learning, allowing the model to be updated continuously as new data is processed. It performs linear learning rapidly using stochastic gradient descent and has been shown to scale to billions of examples and trillions of features.
This document discusses using fully convolutional neural networks for defect inspection. It begins with an agenda that outlines image segmentation using FCNs and defect inspection. It then provides details on data preparation including labeling guidelines, data augmentation, and model setup using techniques like deconvolution layers and the U-Net architecture. Metrics for evaluating the model like Dice score and IoU are also covered. The document concludes with best practices for successful deep learning projects focusing on aspects like having a large reusable dataset, feasibility of the problem, potential payoff, and fault tolerance.
VIBE: Video Inference for Human Body Pose and Shape EstimationArithmer Inc.
The document describes the VIBE approach for 3D human pose and shape estimation from video. VIBE uses an adversarial learning framework with a temporal encoder network that incorporates self-attention. It regresses pose and shape parameters from video frames. A motion discriminator is trained to distinguish real from generated poses, enforcing kinematically plausible poses without 3D ground truth labels. Results show VIBE generates accurate 3D poses and shapes from in-the-wild videos.
Object Detection for Autonomous Cars using AI/MLIRJET Journal
The document discusses using machine learning and computer vision techniques for object detection in autonomous vehicles. Specifically, it proposes using the Single Shot Detector (SSD) algorithm to identify and classify objects around a self-driving car from camera images. The SSD model was trained on a dataset to detect common objects like cars, people, buses etc. and estimate bounding boxes around detected objects. The methodology uses OpenCV and TensorFlow to implement SSD on images from a webcam in real-time. While bounding boxes were sometimes inconsistent in dense traffic, detection was more accurate for objects closer to the camera or in less crowded scenarios. The goal is to demonstrate how computer vision allows autonomous vehicles to perceive their surroundings.
The document summarizes Md Abul Hayat's research on image segmentation using deep neural networks. It discusses using various CNN architectures like autoencoders, fully convolutional networks, U-Net, ResNet, and DenseNet for segmenting OCT images of skin. It presents experimental results comparing the DCU-Net and U-Net models on fingertip and palm image datasets, finding that DCU-Net achieved better performance for segmentation and potential for transfer learning across datasets. Future work could include training on larger datasets, accounting for temporal variations, generalizing to other body parts, using 3D models, and collecting more annotations.
Rapid object detection using boosted cascade of simple featuresHirantha Pradeep
1. The document presents the seminal work of Viola and Jones on rapid object detection using boosted cascades of simple features.
2. It introduces integral images for fast feature evaluation and uses AdaBoost for feature selection and classifier training in a cascade structure.
3. The cascade approach combines classifiers such that earlier ones rapidly reject negatives while later ones focus on positives, achieving real-time detection rates.
Jaroslaw Szymczak presented an approach for automatic image moderation in classified listings. The approach uses machine learning techniques including convolutional neural networks (CNNs) to extract image features and eXtreme Gradient Boosting (XGBoost) to combine image and listing features. To address class imbalance between acceptable and unacceptable images, the training data was undersampled from a 99:1 ratio to a 9:1 ratio. Key evaluation metrics for the imbalanced data include ROC AUC, PR AUC, and precision or recall at fixed thresholds of the other. The trained models are deployed into a live service using Flask, containerized with Docker, and monitored for performance using Grafana.
The document discusses automatic image moderation in classified ads. It outlines an approach using machine learning to classify images as appropriate or inappropriate. Key aspects include using convolutional neural networks to extract image features, combining image and listing metadata, dealing with class imbalance, developing batch processing pipelines, and monitoring a live classification system. The overall goal is to automatically moderate millions of images uploaded daily to classified ad platforms.
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
Más contenido relacionado
Similar a DroidCon Cluj 2018 - Hands on machine learning on android
Content-based image retrieval (CBIR) uses computer vision techniques to search for and retrieve images from large databases based on visual similarities. CBIR systems typically extract features from images and measure similarities to return images matching a query image. Popular applications include Google Images, eBay, and Pinterest. Evaluation of CBIR systems focuses on precision and recall metrics, as precision alone is insufficient without also considering recall. Training siamese networks for CBIR requires loss functions that pull similar images closer together and push dissimilar images farther apart.
YouTube: https://youtu.be/XSoau_q0kz8
** Data Science Certification using R: https://www.edureka.co/data-science **
This Edureka PPT on "KNN algorithm using R", will help you learn about the KNN algorithm in depth, you'll also see how KNN is used to solve real-world problems. Below are the topics covered in this module:
Introduction to Machine Learning
What is KNN Algorithm?
KNN Use Case
KNN Algorithm step by step
Hands - On
Introduction to Machine Learning
What is KNN Algorithm?
KNN Use Case
KNN Algorithm step by step
Hands - On
Blog Series: http://bit.ly/data-science-blogs
Data Science Training Playlist: http://bit.ly/data-science-playlist
Follow us to never miss an update in the future.
YouTube: https://www.youtube.com/user/edurekaIN
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
Pelee: a real time object detection system on mobile devices Paper ReviewLEE HOSEONG
This document summarizes the Pelee object detection system which uses the PeleeNet efficient feature extraction network for real-time object detection on mobile devices. PeleeNet improves on DenseNet with two-way dense layers, a stem block, dynamic bottleneck layers, and transition layers without compression. Pelee uses SSD with PeleeNet, selecting fewer feature maps and adding residual prediction blocks for faster, more accurate detection compared to SSD and YOLO. The document concludes that PeleeNet and Pelee achieve real-time classification and detection on devices, outperforming existing models in speed, cost and accuracy with simple code.
Artificial intelligence use cases for International Dating Apps. iDate 2018. ...Lluis Carreras
As Andrew Ng says, AI is the new electricity and it will transform many industries, therefore dating is going to be transform by the use of AI.
With the experience of having learned AI during the university several years ago, and having updated it since 2016, plus 10 years working experience in the dating industry, this presentation shows the evolution of AI during these last years, and shows some AI examples already used in Dating services.
Then shows where AI can be applied to dating services, what is needed, which models can be used, shows the building process, and how can be done.
Automatism System Using Faster R-CNN and SVMIRJET Journal
The document describes a proposed system to automatically manage vacant parking spaces using computer vision techniques. The system would use existing surveillance cameras installed in parking lots. It detects vehicles in images using a Faster R-CNN object detection model. This model uses a Region Proposal Network to quickly detect objects. An SVM classifier is then used to classify detected objects as free or occupied parking spaces. The goal is to assist drivers in finding available spaces more efficiently.
Virtual Environments as Driving Schools for Deep Learning Vision-Based Sensor...Artur Filipowicz
This presentation explores the interaction between virtual reality simulation and Deep Learning which may develop computer vision that rivals human vision. The specific problem considered is detection and localization of a stop object, the stop sign, based on an image. A video game, Grand Theft Auto 5, is used to collect over half a million images and corresponding ground truth labels with and without stop signs in various lighting and weather conditions. A deep convolutional neural network trained on this data and fine tuned on real world data achieves accuracy in stop sign detection of over 95% within 20 meters of the stop sign and has a false positive rate of 4% on test data from the real world. Additionally, the physical constraints on this problem are analysed, and a framework for the use of simulators is developed.
The document provides an introduction to computer vision concepts including neural network structures, activation functions, convolution operators, pooling layers, and batch normalization. It then discusses image classification, including popular datasets, classification networks from LeNet to DLA, and experiments on car brand classification. Finally, it covers object detection, comparing region-based methods like R-CNN, Fast R-CNN, Faster R-CNN, and R-FCN to region-free methods like YOLO.
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...gdgsurrey
Dive into the essentials of ML model development, processes, and techniques to combat underfitting and overfitting, explore distributed training approaches, and understand model explainability. Enhance your skills with practical insights from a seasoned expert.
Avihu Efrat's Viola and Jones face detection slideswolf
The document summarizes the Viola-Jones object detection framework. It uses a cascade of classifiers with increasingly more complex features trained with AdaBoost to rapidly detect objects. Integral images allow for very fast feature evaluations. The framework was applied to face detection, achieving very fast average detection speeds of 270 microseconds per sub-window while maintaining low false positive rates.
Supervised learning involves using a training dataset to learn a target function that can be used to predict class labels or attribute values. The document discusses supervised learning and classification, including types of supervised learning problems like classification and regression. It provides examples of classification algorithms like K-nearest neighbors, decision trees, naive Bayes, and support vector machines. It also gives examples of how to implement classification algorithms using scikit-learn and discusses evaluating classification model performance based on accuracy.
Wapid and wobust active online machine leawning with Vowpal Wabbit Antti Haapala
Vowpal Wabbit is a machine learning library that provides fast, scalable, and online learning algorithms. It can handle large datasets with millions of features efficiently using hashing and sparse representations. Unlike other libraries, Vowpal Wabbit is designed for online and active learning, allowing the model to be updated continuously as new data is processed. It performs linear learning rapidly using stochastic gradient descent and has been shown to scale to billions of examples and trillions of features.
This document discusses using fully convolutional neural networks for defect inspection. It begins with an agenda that outlines image segmentation using FCNs and defect inspection. It then provides details on data preparation including labeling guidelines, data augmentation, and model setup using techniques like deconvolution layers and the U-Net architecture. Metrics for evaluating the model like Dice score and IoU are also covered. The document concludes with best practices for successful deep learning projects focusing on aspects like having a large reusable dataset, feasibility of the problem, potential payoff, and fault tolerance.
VIBE: Video Inference for Human Body Pose and Shape EstimationArithmer Inc.
The document describes the VIBE approach for 3D human pose and shape estimation from video. VIBE uses an adversarial learning framework with a temporal encoder network that incorporates self-attention. It regresses pose and shape parameters from video frames. A motion discriminator is trained to distinguish real from generated poses, enforcing kinematically plausible poses without 3D ground truth labels. Results show VIBE generates accurate 3D poses and shapes from in-the-wild videos.
Object Detection for Autonomous Cars using AI/MLIRJET Journal
The document discusses using machine learning and computer vision techniques for object detection in autonomous vehicles. Specifically, it proposes using the Single Shot Detector (SSD) algorithm to identify and classify objects around a self-driving car from camera images. The SSD model was trained on a dataset to detect common objects like cars, people, buses etc. and estimate bounding boxes around detected objects. The methodology uses OpenCV and TensorFlow to implement SSD on images from a webcam in real-time. While bounding boxes were sometimes inconsistent in dense traffic, detection was more accurate for objects closer to the camera or in less crowded scenarios. The goal is to demonstrate how computer vision allows autonomous vehicles to perceive their surroundings.
The document summarizes Md Abul Hayat's research on image segmentation using deep neural networks. It discusses using various CNN architectures like autoencoders, fully convolutional networks, U-Net, ResNet, and DenseNet for segmenting OCT images of skin. It presents experimental results comparing the DCU-Net and U-Net models on fingertip and palm image datasets, finding that DCU-Net achieved better performance for segmentation and potential for transfer learning across datasets. Future work could include training on larger datasets, accounting for temporal variations, generalizing to other body parts, using 3D models, and collecting more annotations.
Rapid object detection using boosted cascade of simple featuresHirantha Pradeep
1. The document presents the seminal work of Viola and Jones on rapid object detection using boosted cascades of simple features.
2. It introduces integral images for fast feature evaluation and uses AdaBoost for feature selection and classifier training in a cascade structure.
3. The cascade approach combines classifiers such that earlier ones rapidly reject negatives while later ones focus on positives, achieving real-time detection rates.
Jaroslaw Szymczak presented an approach for automatic image moderation in classified listings. The approach uses machine learning techniques including convolutional neural networks (CNNs) to extract image features and eXtreme Gradient Boosting (XGBoost) to combine image and listing features. To address class imbalance between acceptable and unacceptable images, the training data was undersampled from a 99:1 ratio to a 9:1 ratio. Key evaluation metrics for the imbalanced data include ROC AUC, PR AUC, and precision or recall at fixed thresholds of the other. The trained models are deployed into a live service using Flask, containerized with Docker, and monitored for performance using Grafana.
The document discusses automatic image moderation in classified ads. It outlines an approach using machine learning to classify images as appropriate or inappropriate. Key aspects include using convolutional neural networks to extract image features, combining image and listing metadata, dealing with class imbalance, developing batch processing pipelines, and monitoring a live classification system. The overall goal is to automatically moderate millions of images uploaded daily to classified ad platforms.
Similar a DroidCon Cluj 2018 - Hands on machine learning on android (20)
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
OpenID AuthZEN Interop Read Out - AuthorizationDavid Brossard
During Identiverse 2024 and EIC 2024, members of the OpenID AuthZEN WG got together and demoed their authorization endpoints conforming to the AuthZEN API
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceIndexBug
Imagine a world where machines not only perform tasks but also learn, adapt, and make decisions. This is the promise of Artificial Intelligence (AI), a technology that's not just enhancing our lives but revolutionizing entire industries.
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxSitimaJohn
Ocean Lotus cyber threat actors represent a sophisticated, persistent, and politically motivated group that poses a significant risk to organizations and individuals in the Southeast Asian region. Their continuous evolution and adaptability underscore the need for robust cybersecurity measures and international cooperation to identify and mitigate the threats posed by such advanced persistent threat groups.
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on integration of Salesforce with Bonterra Impact Management.
Interested in deploying an integration with Salesforce for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
DroidCon Cluj 2018 - Hands on machine learning on android
1.
2.
3. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Machine Learning
Speaker:
ANCA CIURTE - AI Team Lead at Softvision-
4. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Outline
● Why machine learning on Android?
● Mostly:
○ Some insights about Object Detection algorithms
○ Practical example in Tensorflow
○ Data gathering and labeling
○ Model training
● Hopefully:
○ It will inspire you to deeg deeper
○ It won’t confuse you too much :)
5. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Machine learning
Why machine learning on Android?
6. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Why machine learning on Android?
● Object detection
○ Is a very common Computer Vision problem
○ Identifies the objects in the image and
provides their precise location
7. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Why machine learning on Android?
● Object detection
○ Is a very common Computer Vision problem
○ Identifies the objects in the image and
provides their precise location
● Why is it useful?
○ StreetView,
○ Self-driving cars etc.
E.g.: Street view - face
blurring
E.g.: Self driving cars - pedestrian
detection
8. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Why machine learning on Android?
● Object detection
○ Is a very common Computer Vision problem
○ Identifies the objects in the image and
provides their precise location
● Why is it useful?
○ StreetView,
○ Self-driving cars etc.
● Object detection: impact of deep learning
○ Deep convnets significantly increased
accuracy and processing time
9. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Why machine learning on Android?
● Object detection
○ Is a very common Computer Vision problem
○ Identifies the objects in the image and
provides their precise location
● Why is it useful?
○ StreetView,
○ Self-driving cars etc.
● Object detection: impact of deep learning
○ Deep convnets significantly increased
accuracy and processing time
● Why on Android?
○ We are living in the era when mobile took over
○ Running on mobile makes it possible to
deliver interactive and real time applications
○ Latest released phones have great computing
power
10. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Machine learning
Some insights about Object Detection
11. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Image classification with convnets
● Dataset
○ e.g. Cifar-10 dataset:
■ consists of 60000 32x32 colour images in 10 classes,
with 6000 images per class.
■ There are 50000 training images and 10000 test images.
● Training phase
○ e.g. VGG 16 network
○ input: labeled images (x,y)
Forward propagation (Given wl , compute predictions )
12. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Intuition about the convolution
Convolution Kernel
(weights)
Input image
* =
Another way to
understand the
convolution operation:
or: Convolution layer
or: Feature Map
or: Network’s parameters
13. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Image classification with convnets
● Dataset
○ e.g. Cifar-10 dataset:
■ consists of 60000 32x32 colour images in 10 classes,
with 6000 images per class.
■ There are 50000 training images and 10000 test images.
● Training phase
○ e.g. VGG 16 network
○ input: labeled images (x,y)
● Testing phase
○ Use the trained model to classify new instances
○ Detection output: predicted class
Forward propagation (Given wl , compute predictions )
Loss function:
Backward propagation (compute wl+1 by minimizing the loss)
Repeat until
convergence
=> w*
14. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Relation between classification and object detection
● We have an accurate way of classifying images
○ e.g.: does this image contain a pedestrian?
● But how can we say WHERE is this pedestrian?
15. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Relation between classification and object detection
● We have an accurate way of classifying images
○ e.g.: does this image contain a pedestrian?
● But how can we say WHERE is this pedestrian?
Solution:
● Sliding window
○ strategy:
■ splits into fragments and classify them independently
16. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Relation between classification and object detection
● We have an accurate way of classifying images
○ e.g.: does this image contain a pedestrian?
● But how can we say WHERE is this pedestrian?
Solution:
● Sliding window
○ strategy:
■ splits into fragments and classify them independently
Classified as pedestrian:All fragments:
...
17. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
● We have an accurate way of classifying images
○ e.g.: does this image contain a pedestrian?
● But how can we say WHERE is this pedestrian?
Solution:
● Sliding window
○ strategy:
■ splits into fragments and classify them independently
○ challenges :
■ how to deal with: various object size, various aspect ratio, object overlap or multiple responses
Relation between classification and object detection
18. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
● We have an accurate way of classifying images
○ e.g.: does this image contain a pedestrian?
● But how can we say WHERE is this pedestrian?
Solution:
● Sliding window
○ strategy:
■ splits into fragments and classify them independently
○ challenges :
■ how to deal with: various object size, various aspect ratio, object overlap or multiple responses
○ problem: need to apply CNN to huge number of locations and scales, very computationally expensive!!
Relation between classification and object detection
19. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
R-CNN (Region-based convolutional neural network)
Two steps:
● Select object proposals: Selective Search Algorithm
○ it has very low precision to be used as object
detector, but it works fine as a first step in the
detection pipeline
● Apply strong CNN classifier to select proposal
Girshick et al, “Rich feature hierarchies for accurate object detection and semantic segmentation”, CVPR 2014
20. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
R-CNN (Region-based convolutional neural network)
Two steps:
● Select object proposal: Selective Search Algorithm
○ it has very low precision to be used as object
detector, but it works fine as a first step in the
detection pipeline
● Apply strong CNN classifier to select proposal
It outperforms all the previous object detection algorithms
R-CNN
21. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
R-CNN (Region-based convolutional neural network)
Two steps:
● Select object proposal: Selective Search Algorithm
○ it has very low precision to be used as object
detector, but it works fine as a first step in the
detection pipeline
● Apply strong CNN classifier to select proposal
It outperforms all the previous object detection algorithms
Limitations:
● Depend on external algorithm hypothesis
● Need to rescale object proposals to fixed resolution
● Redundant computation - all features are
independently computed even for overlapped
proposal regions
22. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Fast R-CNN
From R-CNN to Fast R-CNN:
● input: image + region proposals
● region pooling on “conv5” feature map for feature
extraction
● softmax classifier instead of SVM classifier
● End to end multi-task training:
○ the last FC layer branch into two sibling
output layers:
■ one that produces softmax
probability estimates over K object
classes
■ another layer that outputs the
bounding box coordinates for each
object.
Girshick, “Fast R-CNN”, ICCV 2015
23. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Fast R-CNN
From R-CNN to Fast R-CNN:
● input: image + region proposals
● region pooling on “conv5” feature map for feature
extraction
● softmax classifier instead of SVM classifier
● End to end multi-task training:
○ the last FC layer branch into two sibling
output layers:
■ one that produces softmax
probability estimates over K object
classes
■ another layer that outputs the
bounding box coordinates for each
object.
Advantages:
● Higher detection quality (mAP) than R-CNN
● Training is single-stage
● Training can update all network layers at once
● No disk storage is required for feature caching
Girshick, “Fast R-CNN”, ICCV 2015
24. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Faster R-CNN
Faster R-CNN = Fast R-CNN + RPN (Region Proposal
Network)
● RPN
○ removes dependency from external hypothesis
ROI generation method
○ is a convolutional network trained end-to-end
○ generates a list of high-quality region proposal
(bbox coordinates + objectness scores)
● Then RPN + Fast R-CNN are merged into a single
network by sharing their convolutional features
○ predicts the class of the objects + a refined bbox
position
○ shared convolutional features enables nearly cost-
free region proposals
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun, “Faster R-CNN: Towards
Real-Time Object Detection with Region Proposal Networks”, NIPS 2015
25. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
SSD (Single shot detector)
● Extra feature layers
○ additional convolutional feature layers of different sizes are placed at
the end of base net
○ each added feature layer produce a set of detection predictions,
allowing predictions at multiple scales
○ this design lead to simple end-to-end training
Wei Liu et al., SSD: Single Shot MultiBox Detector, ECCV 2016
26. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
SSD (Single shot detector)
● Extra feature layers
○ additional convolutional feature layers of different sizes are placed at
the end of base net
○ each added feature layer produce a set of detection predictions,
allowing predictions at multiple scales
○ this design lead to simple end-to-end training
● ROIs proposal
○ output space of region proposals contains a fixed set of default boxes
over different aspect ratios and scales per feature map location
○ for each default bounding box, predict
○ the shape offsets Δ(cx, cy, w, h) and
○ the confidence for all object categories (c1, …, cp)
● Non-Maxima suppression
4x4 feature map
Wei Liu et al., SSD: Single Shot MultiBox Detector, ECCV 2016
8x8 feature map
27. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Compare modern convolutional object detectors
Lots of variables to set up ...
● base net:
○ VGG16
○ ResNet101
○ InceptionV2
○ InceptionV3
○ ResNet
○ MobileNet
● Object detection architecture:
○ R-CNN
○ Fast R-CNN
○ Faster R-CNN
○ SSD
● Input image resolution
● Number of region proposal
● Frozen weights - for fine tuning
28. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Lots of variables to set up ...
● base net:
○ VGG16
○ ResNet101
○ InceptionV2
○ InceptionV3
○ ResNet
○ MobileNet
● Object detection architecture:
○ R-CNN
○ Fast R-CNN
○ Faster R-CNN
○ SSD
● Input image resolution
● Number of region proposal
● Frozen weights - for fine tuning
Jonathan Huang et al., Speed/accuracy trade-offs for modern convolutional object detectors, CVPR 2017
Speed/accuracy trade-offs
Compare modern convolutional object detectors
29. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Lots of variables to set up ...
● base net:
○ VGG16
○ ResNet101
○ InceptionV2
○ InceptionV3
○ ResNet
○ MobileNet
● Object detection architecture:
○ R-CNN
○ Fast R-CNN
○ Faster R-CNN
○ SSD
● Input image resolution
● Number of region proposal
● Frozen weights - for fine tuning
Takeaways:
● Faster R-CNN is slower but more accurate
● SSD is much faster but not as accurate (therefore is a good choice for mobile apps)
Jonathan Huang et al., Speed/accuracy trade-offs for modern convolutional object detectors, CVPR 2017
Speed/accuracy trade-offs
Compare modern convolutional object detectors
30. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Coding time
31. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Coding time
Problem to solve:
- a mobile app for real time clothes detection
- class categories: Top, Pants, Shorts, Skirt and Dress
Frameworks:
● Tensorflow Object Detection API
- made by GOOGLE
- an open source framework built on top of TensorFlow that
makes it easy to construct, train and deploy object detection
models
- input: images + labels
- output: inference graph (.pb format)
● LabelImg
- an open source graphical image annotation tool
- annotations are saved as XML files in PASCAL VOC format,
the format used by ImageNet dataset
32. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Coding time: step by step
● Create dataset and split it into: train (70%) and test (30%) folders
● Label images with LabelImg tool (output: .xml files for each image in dataset)
● Convert .xml to .csv (use dataset/xml_to_csv.py script; output: train.csv, test.csv)
● Convert to TFRecord format
○ set paths (from ../models/research):
export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/object_detection
export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim
○ edit generate_tfrecord.py file and change the label map + path to the train/test folder:
○ finally execute the generate_tfrecord.py script in Terminal:
python generate_tfrecord.py --csv_input=data/train_labels.csv --output_path=data/train.record
python generate_tfrecord.py --csv_input=data/test_labels.csv --output_path=data/test.record
○ output: train.record, test.record
● Training
○ create a label map: label_map.pbtxt
○ optional, but recommended :), choose a pretrained model from here
○ prepare the .config file: .../models/research/object_detection/samples/configs/ssd_mobilenet_v2_coco.config
○ run training script (from ../models/research/object_detection):
python legacy/train.py --logtostderr --train_dir=training/ --pipeline_config_path=Ssd_mobilenet_v1_pets.config
● Export inference graph:
python export_inference_graph.py --input_type image_tensor --pipeline_config_path pipeline.config
--trained_checkpoint_prefix=training/model.ckpt-10750 --output_directory=inference_graph
output: the model in .pb format
33. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
e-mail: anca.ciurte@softvision.ro
Q&A
34. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Integrating with Android
Speaker:
MIHALY NAGY - Android Community Influencer at Softvision
35. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Android + TensorFlow
36. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Android + TensorFlow
● Model File
● [Labels File]
● tensorflow-android dependency
● Boilerplate
● Integrate TF to process each frame
37. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Android + TensorFlow
38. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Android + TensorFlow
Bitmap
Recognition
each Frame
39. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Android + TensorFlow
Follow Along:
http://goo.gl/SYHSb7
https://github.com/code-twister/tf_example
40. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Coding time
41. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Thank You!
Notas del editor
Running on mobile makes it possible to deliver interactive and real time applications in a way that’s not possible when depending on the internet connection
multile scales and aspect ratios are handles by search windows of different size and aspect, or by image scaling
From R-CNN to Fast R-CNN:
region pooling on “conv5” feature map for deature extraction
softmax classifier instead of SVM classifier
Multitask training:
the last fc layer branch into two sibling output layers:
one that produces softmax probability estimates over K object classes
another layer that outputs the bounding box coordinates for each object.
First, a CNN is applied on the whole original image with several convolutional (conv) and max pooling layers to produce a conv feature map.
Then, for each object proposal a region of interest (RoI) pooling layer extracts a fixed-length feature vector from the feature map and fed into a sequence of fully connected (fc) layers.
fc layers finally branch into two sibling output layers:
one that produces softmax probability estimates over K object classes
another layer that outputs the bounding box coordinates for each object.
From R-CNN to Fast R-CNN:
region pooling on “conv5” feature map for deature extraction
softmax classifier instead of SVM classifier
Multitask training:
the last fc layer branch into two sibling output layers:
one that produces softmax probability estimates over K object classes
another layer that outputs the bounding box coordinates for each object.
First, a CNN is applied on the whole original image with several convolutional (conv) and max pooling layers to produce a conv feature map.
Then, for each object proposal a region of interest (RoI) pooling layer extracts a fixed-length feature vector from the feature map and fed into a sequence of fully connected (fc) layers.
fc layers finally branch into two sibling output layers:
one that produces softmax probability estimates over K object classes
another layer that outputs the bounding box coordinates for each object.
A Region Proposal Network (RPN) takes an image
(of any size) as input and outputs a set of rectangular
object proposals, each with an objectness score.
SSD approach:
produces a fixed-size collection of bounding boxes and scores for the presence of object class instances in those boxes
followed by a non-maximum suppression step to produce the final detections.
Network generates scores for each default box
Wei Liu et al., SSD: Single Shot MultiBox Detector, ECCV 2016
SSD discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location
Wei Liu et al., SSD: Single Shot MultiBox Detector, ECCV 2016
There are several algorithms of Object detection
The question is: how well they compete to each other?
We define several meta parameters that influence detectors performance
Critical points on the curve that can be identified:
mAP = mean average precision
[Huang et al.] measured the influence of these metaparams on accuracy and speed
Jonathan Huang et al., Speed/accuracy trade-offs for modern convolutional object detectors, CVPR 2017
There are several algorithms of Object detection
The question is: how well they compete to each other?
We define several meta parameters that influence detectors performance
Critical points on the curve that can be identified:
mAP = mean average precision
[Huang et al.] measured the influence of these metaparams on accuracy and speed
Jonathan Huang et al., Speed/accuracy trade-offs for modern convolutional object detectors, CVPR 2017
There are several algorithms of Object detection
The question is: how well they compete to each other?
We define several meta parameters that influence detectors performance
Critical points on the curve that can be identified:
mAP = mean average precision
[Huang et al.] measured the influence of these metaparams on accuracy and speed
Jonathan Huang et al., Speed/accuracy trade-offs for modern convolutional object detectors, CVPR 2017
Recognition refers to the objects detected not the process