IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
Image search at facebook - making sense of one of the largest image databases in the world
1. Image Search at Facebook:
Making sense of one of the
largest image databases in the
world
Fedor Borisyuk, engineering leader at Facebook
2. A bit about me
• Fedor Borisyuk
• At Facebook since April 2017
• Lead ML teams in the domains
• Computer vision
3. Agenda
1. Photo Search product
2. Photo Search at FB
3. Deep dive: Large scale image classification
4. Deep dive: Optical character recognition
5. Q & A
5. Photo Search at Facebook
•Social Photos – posted by
friends
•Public photos – posted by
people to be publicly visible
•Over a billion images uploaded
every day
6. What people are searching for
https://unsplash.com/photos/eIvu9C94UfY
https://unsplash.com/photos/c9H7UzXK7uk
https://unsplash.com/photos/yihlaRCCvd4
https://unsplash.com/photos/UWw9OD3pIMo
https://unsplash.com/photos/4V07cUP8Sxc
https://unsplash.com/photos/FBXuXp57eM0
https://unsplash.com/photos/PGnqT0rXWLs
Friends photos
Celebrities
Products
Memes
https://unsplash.com/photos/EzH46XCDQRY
Recipes
Music/Movies
Places
Sport events
News
https://www.nps.gov/locations/alaska/news.htm
7. What people are searching for
https://unsplash.com/photos/yihlaRCCvd4
Query: running dog meme
https://unsplash.com/photos/DIZBFTl7c-A
Query: child pink skirt
https://en.wikipedia.org/wiki/Strelitzia#/media/File:Strelitzia_larger.jpg
Query: strelitzia
11. Overview ML Technologies
• CNNs for large scale image classification
• Ranking
• Neural networks
• GBDTs
• Features based on:
• Image clustering
• Image tagging
• Image quality
• Multimodal relationship between Query and Image
• Optical character recognition
12. Modeling similarity between query and image
• Multilingual query embeddings trained using Fasttext (https://github.com/facebookresearch/fastText)
• Image embeddings trained on ResNeXt
13. Extending Photos with textual description
Publication: Multi-model similarity propagation and its application for web image retrieval, Xin-Jing Wang at el.
Photos are coming from:
https://unsplash.com/photos/3WhQe8sEBZU
https://unsplash.com/photos/ie8giTVBVxE
https://unsplash.com/photos/9FWfFy4N4R8
https://unsplash.com/photos/a90WklNaPBM
https://unsplash.com/photos/9EwxGJdTJNo
17. Large scale Image
classification
• Labels collision
• utilize WordNet to merge some
hashtags into a single canonical form
(e.g., #brownbear and #ursusarctos
are merged)
• Skewed label distribution
• Square root sampling
23. Text Detection Model
• Faster R-CNN performs detection and object recognition by:
Learn
CNN Image
Representati
on
Learn region
proposal network
to produce
bounding boxes
Learn classifier to
recognize if box
contains text
Remove duplicate
overlapping boxes
Learn regression to
refine boxes
coordinates
24. • CNN ResNet-18 architecture
• Cast as sequence prediction problem:
• Input: the image containing the text
• Output: sequence of characters
• Use Connectionist Temporal Classification (CTC) loss to train
Text Recognition Model
25. • Recognition model inference:
• in linear time by greedily taking the most likely
character at every position
• recognize words of arbitrary length and out-of-
vocabulary words
Text Recognition Model
26. • CTC model harder to train as model consistently diverged
• Curriculum learning – start easy:
• short words <= 5 characters
• low learning rate so the model doesn’t diverge
Curriculum learning training