In this code-level talk, Julien will show you how to quickly build and deploy computer vision applications based on Transformer models. Along the way, you'll learn about the portfolio of open source and commercial Hugging Face solutions, and how they can help you deliver high-quality solutions faster than ever before.
Powerful Google developer tools for immediate impact! (2023-24 C)
An introduction to computer vision with Hugging Face
1. An Introduc
ti
on to Computer Vision
with Hugging Face
Julien Simon, Chief Evangelist, Hugging Face
julsimon@huggingface.co
2. Computer Vision put Deep Learning on the map
Image classification Object detection
Semantic segmentation
Instance segmentation
Pose estimation
Depth prediction
Source: GluonCV
3. 1998-2021 : Convolutional Neural Networks
Source: Wikipedia
CNNs extract features with learned filters.
A lot of pixels are discarded along the way.
4. 2021 : The Vision Transformer (Google)
"An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale" https://arxiv.org/abs/2010.11929
ViT breaks an image into patches,
which are flattened and processed
as token sequences.
+ State-of-the-art accuracy
+ 4x less compute required for training
+ Transfer learning
Source: research paper
6. The Hugging Face Hub: The Github of Machine Learning
110K models
18K datasets
25+ ML libraries: Keras, spaCY,
Scikit-Learn, fastai, etc.
10K organiza
ti
ons
100K+ users daily
1M+ downloads daily
h
tt
ps://huggingface.co
7. 4,000+ models for Computer Vision
1. PyTorch Image models (
ti
mm)
2. CV Transformers
3. Mul
ti
-modal Transformers
4. Genera
ti
ve CV: Di
ff
users
8. 1. PyTorch Image Models (aka timm)
h
tt
ps://github.com/rwightman/pytorch-image-models
• Models, scripts, pretrained weights
ResNet, ResNeXT, E
ffi
cientNet,
E
ffi
cientNetV2, NFNet, Vision
Transformer, MixNet, MobileNet-V3/V2,
RegNet, DPN, CSPNet, and more
• Now available on the Hugging Face hub
300+ models
h
tt
ps://huggingface.co/
ti
mm
h
tt
ps://huggingface.co/docs/hub/
ti
mm
9. 2. CV Transformers: image and video classification
openai/clip-vit-base-patch32
google/vit-base-patch16-224
https://huggingface.co/spaces/juliensimon/battle_of_image_classifiers
10. 3. CV Transformers: detection and segmentation
facebook/maskformer-swin-large-ade
facebook/detr-resnet-101
12. 3. Multi-modal CV Transformers
Image cap
ti
oning
h
tt
ps://huggingface.co/spaces/nielsr/comparing-cap
ti
oning-models
Zero-shot segmenta
ti
on with text prompt
h
tt
ps://huggingface.co/spaces/nielsr/CLIPSeg
Audio classi
fi
ca
ti
on with spectrogram
h
tt
ps://huggingface.co/spaces/juliensimon/keyword-spo
tti
ng
15. Training and deploying models with Hugging Face
Model in
produc
ti
on
18,000+ datasets
on the hub
110,000+ models
on the hub
No-code AutoML
Managed
Inference on AWS
and Azure
Hosted ML applica
ti
ons
HW-accelerated
training & inference
Amazon SageMaker
Deploy
anywhere
Datasets
Models
Hugging Face Endpoints
for Azure
Transformers
Accelerate
Optimum
Diffusers
Evaluate