Andrii Belas "Overview of object detection approaches: cases, algorithms and software"
1.
2. Андрей Белас Data Scientist, SMART business
Эксперт в области машинного обучения, публичный
спикер.
Создатель и ментор SMART Data Science Academy, отвечаю
за техническое развитие data science команды и
архитектуру всех data science проектов SMART business.
Microsoft Certified Professional в направлениях:
Big Data and Advanced Analytics
Cloud Data Science with Azure Machine Learning
Developing SQL Data Models.
Опыт работы:
Deep Learning
Computer Vision
AI in Forecasting
AI in Marketing
Risk management
Business Intelligence
11. Бизнес процесс
1. Тегирование текущего ассортимента SKU Roshen/Конкурентов
(500 SKU)
2. Тегирование нового SKU Roshen/Конкурента
Обучение нейронной сети
распознавания (4-5 часов)
Передача
модели на
устройства
мерчандайзеров
• Контроль соответствия
планограмме
• Контроль невыкладки
• Аудит конкурентных цен,
промо и планограмм
Отчеты для менеджмента в
реальном времени
Оценка и прогноз влияния планограмм
ROSHEN и конкурентов на продажи
12. Примеры показателей для менеджмента
Оценка доли полки
Появление нового продукта на
всех ТП, ценников и промо
Соответствие доли полки и
собственные стойки в ТП
Контроль соответствия
планограмм, ценников
и промо
Корреляция и прогноз влияния
планограмм, промо и конкурентов
на продажи в ТП (Расширенная
прогнозная аналитика)
Рейтинг точек продаж по
показателям
Рейтинг команд
мерчандайзинга по
показателям
17. КWhere to begin
• Data
• Detection algorithm
• Evaluation approach
• Deployment
18. Tips
Train on data like ones you’ll see in production
Label your data well (don’t miss anything)
Avoid detecting very tiny objects in the image
https://github.com/Microsoft/VoTT
20. КEvaluation
• Compute average precision (AP) separately for each class, then average over classes A detection
is a true positive if it has IoU (Interception over Union) with a ground-truth box greater than
some threshold (usually 0.5) (AP@0.5)
24. КThe first efficient Face Detector (Viola-Jones Algorithm, 2001)
•Their demo showed faces being detected in real time on a webcam feed, was the most
stunning demonstration of computer vision and its potential at the time.
•Soon, it was implemented in OpenCV & face detection became synonymous with Viola and
Jones algorithm.
•Hand-coded features (eyes, nose, locations and interactions)
•Bad results for non-frontal/ideal faces
25. КMuch more efficient detection technique (Histograms of Oriented
Gradients, 2005)
• Navneet Dalal and Bill Triggs invented "HOG" for pedestrian detection
• Their feature descriptor, Histograms of Oriented Gradients (HOG), significantly
outperformed existing algorithms in this task
• Handcoded features, just like before
• For every single pixel, we want to look at the pixels that directly surrounding it
26. КMuch more efficient detection technique (Histograms of Oriented
Gradients, 2005)
• Goal is, how dark is current pixel compared to surrounding pixels?
• We will then draw an arrow showing in which direction the image is getting darker:
• We repeat that process for every single pixel in the image
• Every pixel is replaced by an arrow. These arrows are called gradients
29. КBruteforce approach
• We can take a classifier like VGGNet or Inception and turn it into an object detector by sliding a
small window across the image
• At each step you run the classifier to get a prediction of what sort of object is inside the current
window.
• Using a sliding window gives several hundred or thousand predictions for that image, but you only
keep the ones the classifier is the most certain about.
• This approach works but it’s obviously going to be very slow, since you need to run the classifier
many times.
30. КA better approach, R-CNN (2015)
• R-CNN creates bounding boxes, or region proposals, using a process called Selective Search
• At a high level, Selective Search looks at the image through windows of different sizes, and for each
size tries to group together adjacent pixels by texture, color, or intensity to identify objects.
34. КYOLO (2016)
• YOLO takes a completely different approach.
• It’s not a traditional classifier that is repurposed to be an object detector.
• YOLO actually looks at the image just once (hence its name: You Only Look Once) but in a clever way.
• YOLO divides up the image into a grid of 13 by 13 cells
35. КYOLO (2016)
• Each of these cells is responsible for predicting 5 bounding boxes.
• A bounding box describes the rectangle that encloses an object.
• YOLO also outputs a confidence score that tells us how certain it is that the predicted bounding box
actually encloses some object.
• This score doesn’t say anything about what kind of object is in the box, just if the shape of the box is
any good.
36. КYOLO (2016)
• For each bounding box, the cell also predicts a class.
• The confidence score for the bounding box and the class prediction are combined into one final score
that tells us the probability that this bounding box contains a specific type of object.
• For example, the big fat yellow box on the left is 85% sure it contains the object “dog”:
37. КYOLO (2016)
• Since there are 13×13 = 169 grid cells and each cell predicts 5 bounding boxes, we end up with 845
bounding boxes in total.
• It turns out that most of these boxes will have very low confidence scores, so we only keep the boxes
whose final score is 30% or more (you can change this threshold depending on how accurate you want
the detector to be).
39. КYOLO (2016)
• You Only Look Once
• So we end up with 125 channels for every grid cell.
• x, y, width, height for the bounding box’s rectangle
• the confidence score
• the probability distribution over the classes