An introduction to computer vision with Hugging Face

•

2 likes•904 views

In this code-level talk, Julien will show you how to quickly build and deploy computer vision applications based on Transformer models. Along the way, you'll learn about the portfolio of open source and commercial Hugging Face solutions, and how they can help you deliver high-quality solutions faster than ever before.

Technology

An Introduc
ti
on to Computer Vision
with Hugging Face
Julien Simon, Chief Evangelist, Hugging Face
julsimon@huggingface.co

Computer Vision put Deep Learning on the map
Image classification Object detection
Semantic segmentation
Instance segmentation
Pose estimation
Depth prediction
Source: GluonCV

1998-2021 : Convolutional Neural Networks
Source: Wikipedia
CNNs extract features with learned filters.
A lot of pixels are discarded along the way.

2021 : The Vision Transformer (Google)
"An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale" https://arxiv.org/abs/2010.11929
ViT breaks an image into patches,
which are flattened and processed
as token sequences.
+ State-of-the-art accuracy
+ 4x less compute required for training
+ Transfer learning
Source: research paper

Research on CV Transformers: 11x in 2 years

The Hugging Face Hub: The Github of Machine Learning
110K models
18K datasets
25+ ML libraries: Keras, spaCY,
Scikit-Learn, fastai, etc.
10K organiza
ti
ons
100K+ users daily
1M+ downloads daily
h
tt
ps://huggingface.co

4,000+ models for Computer Vision
1. PyTorch Image models (
ti
mm)
2. CV Transformers
3. Mul
ti
-modal Transformers
4. Genera
ti
ve CV: Di
ff
users

1. PyTorch Image Models (aka timm)
h
tt
ps://github.com/rwightman/pytorch-image-models
• Models, scripts, pretrained weights
ResNet, ResNeXT, E
ffi
cientNet,
E
ffi
cientNetV2, NFNet, Vision
Transformer, MixNet, MobileNet-V3/V2,
RegNet, DPN, CSPNet, and more
• Now available on the Hugging Face hub
300+ models
h
tt
ps://huggingface.co/
ti
mm
h
tt
ps://huggingface.co/docs/hub/
ti
mm

2. CV Transformers: image and video classification
openai/clip-vit-base-patch32
google/vit-base-patch16-224
https://huggingface.co/spaces/juliensimon/battle_of_image_classifiers

3. CV Transformers: detection and segmentation
facebook/maskformer-swin-large-ade
facebook/detr-resnet-101

State-of-the-art prediction with 2 lines of Python
[{'score': 0.9985879063606262, 'label': 'motorcycle',
'box': {'xmin': 240, 'ymin': 185, 'xmax': 890, 'ymax': 593}},
{'score': 0.9886626601219177, 'label': 'backpack',
'box': {'xmin': 453, 'ymin': 87, 'xmax': 570, 'ymax': 220}},
{'score': 0.9997599720954895, 'label': 'person',
'box': {'xmin': 456, 'ymin': 28, 'xmax': 684, 'ymax': 551}}]

3. Multi-modal CV Transformers
Image cap
ti
oning
h
tt
ps://huggingface.co/spaces/nielsr/comparing-cap
ti
oning-models
Zero-shot segmenta
ti
on with text prompt
h
tt
ps://huggingface.co/spaces/nielsr/CLIPSeg
Audio classi
fi
ca
ti
on with spectrogram
h
tt
ps://huggingface.co/spaces/juliensimon/keyword-spo
tti
ng

4. Generative models: text-to-image
https://github.com/huggingface/diffusers/
https://huggingface.co/spaces/stabilityai/stable-diffusion

4. Generative models: image inpainting
https://huggingface.co/spaces/multimodalart/stable-diffusion-inpainting

Training and deploying models with Hugging Face
Model in
produc
ti
on
18,000+ datasets
on the hub
110,000+ models
on the hub
No-code AutoML
Managed
Inference on AWS
and Azure
Hosted ML applica
ti
ons
HW-accelerated
training & inference
Amazon SageMaker
Deploy
anywhere
Datasets
Models
Hugging Face Endpoints
for Azure
Transformers
Accelerate
Optimum
Diffusers
Evaluate

https://huggingface.co/tasks
https://huggingface.co/course
https://huggingface.co/docs/{datasets, transformers, diffusers}
https://github.com/huggingface/{datasets, transformers, diffusers}
https://discuss.huggingface.co/
https://huggingface.co/support
Getting started Stay in touch!
@julsimon
julsimon.medium.com
youtube.com/c/juliensimonfr

What's hot

The deep learning AI revolution has been sweeping the world for a decade now. Deep neural nets are routinely used for tasks like translation, fraud detection, and image classification. PwC estimates that they will create $15.7 trillion/year of value by 2030. But most current networks are "discriminative" in that they directly map inputs to predictions. This type of model requires lots of training examples, doesn't generalize well outside of its training set, creates inscrutable representations, is subject to adversarial examples, and makes knowledge transfer difficult. People, in contrast, can learn from just a few examples, generalize far beyond their experience, and can easily transfer and reuse knowledge. In recent years, new kinds of "generative" AI models have begun to exhibit these desirable human characteristics. They represent the causal generative processes by which the data is created and can be compositional, compact, and directly interpretable. Generative AI systems that assist people can model their needs and desires and interact with empathy. Their adaptability to changing circumstances will likely be required by rapidly changing AI-driven business and social systems. Generative AI will be the engine of future AI innovation.

The Future of AI is Generative not Discriminative 5/26/2021

Steve Omohundro

Generative models

Birger Moell

Session 1 👉This first session will cover an introduction to Generative AI & harnessing the power of large language models. The following topics will be discussed: Introduction to Generative AI & harnessing the power of large language models. What’s generative AI & what’s LLM. How are we using it in our document understanding & communication mining models? How to develop a trustworthy and unbiased AI model using LLM & GenAI. Personal Intelligent Assistant Speakers: 📌George Roth - AI Evangelist at UiPath 📌Sharon Palawandram - Senior Machine Learning Consultant @ Ashling Partners & UiPath MVP 📌Russel Alfeche - Technology Leader RPA @qBotica & UiPath MVP

AI and ML Series - Introduction to Generative AI and LLMs - Session 1

DianaGray10

This talk delves into the extraordinary capabilities of the emerging technology of generative AI, outlining its recent history and emphasizing its growing influence on scientific endeavors. Through a series of practical examples tailored for researchers, we will explore the transformative influence of these powerful tools on scientific tasks such as writing, coding, data wrangling and literature review.

Let's talk about GPT: A crash course in Generative AI for researchers

Steven Van Vaerenbergh

Fine tuning large LMs

SylvainGugger

From Data Science to MLOps

Carl W. Handlin

How Does Generative AI Actually Work? (a quick semi-technical introduction to...

ssuser4edc93

LLMs_talk_March23.pdf

ChaoYang81

MLOps (a compound of “machine learning” and “operations”) is a practice for collaboration and communication between data scientists and operations professionals to help manage the production machine learning lifecycle. Similar to the DevOps term in the software development world, MLOps looks to increase automation and improve the quality of production ML while also focusing on business and regulatory requirements. MLOps applies to the entire ML lifecycle - from integrating with model generation (software development lifecycle, continuous integration/continuous delivery), orchestration, and deployment, to health, diagnostics, governance, and business metrics. To watch the full presentation click here: https://info.cnvrg.io/mlopsformachinelearning In this webinar, we’ll discuss core practices in MLOps that will help data science teams scale to the enterprise level. You’ll learn the primary functions of MLOps, and what tasks are suggested to accelerate your teams machine learning pipeline. Join us in a discussion with cnvrg.io Solutions Architect, Aaron Schneider, and learn how teams use MLOps for more productive machine learning workflows. - Reduce friction between science and engineering - Deploy your models to production faster - Health, diagnostics and governance of ML models - Kubernetes as a core platform for MLOps - Support advanced use-cases like continual learning with MLOps

MLOps for production-level machine learning

cnvrg.io AI OS - Hands-on ML Workshops

For this plenary talk at the Charlotte AI Institute for Smarter Learning, Dr. Cori Faklaris introduces her fellow college educators to the exciting world of generative AI tools. She gives a high-level overview of the generative AI landscape and how these tools use machine learning algorithms to generate creative content such as music, art, and text. She then shares some examples of generative AI tools and demonstrate how she has used some of these tools to enhance teaching and learning in the classroom and to boost her productivity in other areas of academic life.

An Introduction to Generative AI - May 18, 2023

CoriFaklaris1

Looking to build a robust machine learning infrastructure to streamline MLOps? Learn from Provectus experts how to ensure the success of your MLOps initiative by implementing Data QA components in your ML infrastructure. For most organizations, the development of multiple machine learning models, their deployment and maintenance in production are relatively new tasks. Join Provectus as we explain how to build an end-to-end infrastructure for machine learning, with a focus on data quality and metadata management, to standardize and streamline machine learning life cycle management (MLOps). Agenda - Data Quality and why it matters - Challenges and solutions of Data Testing - Challenges and solutions of Model Testing - MLOps pipelines and why they matter - How to expand validation pipelines for Data Quality

MLOps and Data Quality: Deploying Reliable ML Models in Production

Provectus

Introduction to Transformers for NLP - Olga Petrova

Alexey Grigorev

Generative AI models, such as ChatGPT and Stable Diffusion, can create new and original content like text, images, video, audio, or other data from simple prompts, as well as handle complex dialogs and reason about problems with or without images. These models are disrupting traditional technologies, from search and content creation to automation and problem solving, and are fundamentally shaping the future user interface to computing devices. Generative AI can apply broadly across industries, providing significant enhancements for utility, productivity, and entertainment. As generative AI adoption grows at record-setting speeds and computing demands increase, on-device and hybrid processing are more important than ever. Just like traditional computing evolved from mainframes to today’s mix of cloud and edge devices, AI processing will be distributed between them for AI to scale and reach its full potential. In this presentation you’ll learn about: - Why on-device AI is key - Full-stack AI optimizations to make on-device AI possible and efficient - Advanced techniques like quantization, distillation, and speculative decoding - How generative AI models can be run on device and examples of some running now - Qualcomm Technologies’ role in scaling on-device generative AI

Generative AI at the edge.pdf

Qualcomm Research

ChatGPT, Foundation Models and Web3.pptx

Jesus Rodriguez

LangChain is an advanced framework that allows developers to create language model-powered applications. It provides a set of tools, components, and interfaces that make building LLM-based applications easier. With LangChain, managing interactions with language models, chaining together various components, and integrating resources like APIs and databases is a breeze. The platform includes a set of APIs that can be integrated into applications, allowing developers to add language processing capabilities without having to start from scratch.

Build an LLM-powered application using LangChain.pdf

StephenAmell4

Generative-AI-in-enterprise-20230615.pdf

Liming Zhu

An overview of the most important AI capabilities in marketing, advertising and content creation. I made this presentation to inform, educate and inspire people in the creative industries to familiarise themselves with the incredible toolsets that are already here and in development. I also explain how generative Ai works explore some possible new roles and business models for agencies. Hope you enjoy it!

The Creative Ai storm

Leandro Righini

Large Language Models Bootcamp

Data Science Dojo

This session was presented at the AWS Community Day in Munich (September 2023). It's for builders that heard the buzz about Generative AI but can’t quite grok it yet. Useful if you are eager to connect the dots on the Generative AI terminology and get a fast start for you to explore further and navigate the space. This session is largely product agnostic and meant to give you the fundamentals to get started.

Generative AI for the rest of us

Massimo Ferre'

What is MLOps

Henrik Skogström

What's hot (20)

The Future of AI is Generative not Discriminative 5/26/2021

Generative models

AI and ML Series - Introduction to Generative AI and LLMs - Session 1

Let's talk about GPT: A crash course in Generative AI for researchers

Fine tuning large LMs

From Data Science to MLOps

How Does Generative AI Actually Work? (a quick semi-technical introduction to...

LLMs_talk_March23.pdf

MLOps for production-level machine learning

An Introduction to Generative AI - May 18, 2023

MLOps and Data Quality: Deploying Reliable ML Models in Production

Introduction to Transformers for NLP - Olga Petrova

Generative AI at the edge.pdf

ChatGPT, Foundation Models and Web3.pptx

Build an LLM-powered application using LangChain.pdf

Generative-AI-in-enterprise-20230615.pdf

The Creative Ai storm

Large Language Models Bootcamp

Generative AI for the rest of us

What is MLOps

Similar to An introduction to computer vision with Hugging Face

A talk at Google Workshops in Ai Everything conference https://events.withgoogle.com/google-ai-everything/ About the talk: This talk will give an overview of deep convolutional neural networks (ConvNets). We will describe the general architecture and its different types of layers. Then, we will discuss several applications of ConvNets for computer vision. At the end, a practical demo will be presented using Tensorflow/Keras libraries.

Deep convolutional neural networks and their many uses for computer vision

Fares Al-Qunaieer

AI - Media Art. 인공지능과 미디어아트

Tae wook kang

Introduction talk to Computer Vision

Chen Sagiv

Introduction to the Artificial Intelligence and Computer Vision revolution

Darian Frajberg

Multi-modal embeddings: from discriminative to generative models and creative ai

Roelof Pieters

Ai use cases

Sparsh Agarwal

Deep Learning AtoC with Image Perspective

Dong Heon Cho

Koss 6 a17_deepmachinelearning_mariocho_r10

Mario Cho

The Opportunities and Challenges of Putting the Latest Computer Vision and De...

Albert Y. C. Chen

Mirko Lucchese - Deep Image Processing

MeetupDataScienceRoma

Illustrative Introductory CNN

YasutoTamura1

Deep neural networks have revolutionized the data analytics scene by improving results in several and diverse benchmarks with the same recipe: learning feature representations from data. These achievements have raised the interest across multiple scientific fields, especially in those where large amounts of data and computation are available. This change of paradigm in data analytics has several ethical and economic implications that are driving large investments, political debates and sounding press coverage under the generic label of artificial intelligence (AI). This talk will present the fundamentals of deep learning through the classic example of image classification, and point at how the same principal has been adopted for several tasks. Finally, some of the forthcoming potentials and risks for AI will be pointed.

Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020

Universitat Politècnica de Catalunya

Deep Learning and the state of AI / 2016

Grigory Sapunov

보다 유연한 이미지 변환을 하려면?

광희 이

Deep Learning Hardware: Past, Present, & Future

Rouyun Pan

Crowdsourcing & Gamification

Yefeng Liu

UX for Artificial Intelligence / UXcamp Europe '17 / Berlin / Jan Korsanke

Jan Korsanke

AI in Finance: Moving forward!

Adrian Hornsby

Performance evaluation of GANs in a semisupervised OCR use case

Florian Wilhelm

Online vehicle marketplaces are embracing artificial intelligence to ease the process of selling a vehicle on their platform. The tedious work of copying information from the vehicle registration document into some web form can be automated with the help of smart text-spotting systems, in which the seller takes a picture of the document, and the necessary information is extracted automatically. Florian Wilhelm details the components of a text-spotting system, including the subtasks of object detection and optical character recognition (OCR). Florian elaborates on the challenges of OCR in documents with various distortions and artifacts, which rule out off-the-shelf products for this task. After offering an overview of semisupervised learning based on generative adversarial networks (GANs), Florian evaluates the performance gains of this method compared to supervised learning. More specifically, for a varying amount of labeled data, he compares the accuracy of a convolution neural network (CNN) to a GANthat uses additional unlabeled data during the training phase, showing that GANs significantly outperform classical CNNs in use cases with a lack of labeled data. What you'll learn: Understand how semisupervised learning with GANs works Explore beneficial semisupervised methods based on GANs for use cases with a limited amount of labeled data Gain insight into an interesting OCR use case of an online vehicle marketplace Event: O'Reilly Artificial Intelligence Conference, London, 11.10.2018 Speaker: Dr. Florian Wilhelm Mehr Tech-Vorträge: www.inovex.de/vortraege Mehr Tech-Artikel: www.inovex.de/blog

Performance evaluation of GANs in a semisupervised OCR use case

inovex GmbH

Similar to An introduction to computer vision with Hugging Face (20)

Deep convolutional neural networks and their many uses for computer vision

AI - Media Art. 인공지능과 미디어아트

Introduction talk to Computer Vision

Introduction to the Artificial Intelligence and Computer Vision revolution

Multi-modal embeddings: from discriminative to generative models and creative ai

Ai use cases

Deep Learning AtoC with Image Perspective

Koss 6 a17_deepmachinelearning_mariocho_r10

The Opportunities and Challenges of Putting the Latest Computer Vision and De...

Mirko Lucchese - Deep Image Processing

Illustrative Introductory CNN

Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020

Deep Learning and the state of AI / 2016

보다 유연한 이미지 변환을 하려면?

Deep Learning Hardware: Past, Present, & Future

Crowdsourcing & Gamification

UX for Artificial Intelligence / UXcamp Europe '17 / Berlin / Jan Korsanke

AI in Finance: Moving forward!

Performance evaluation of GANs in a semisupervised OCR use case

More from Julien SIMON

Building Machine Learning Models Automatically (June 2020)

Julien SIMON

Starting your AI/ML project right (May 2020)

Julien SIMON

Scale Machine Learning from zero to millions of users (April 2020)

Julien SIMON

An Introduction to Generative Adversarial Networks (April 2020)

Julien SIMON

AIM410R1 Deep learning applications with TensorFlow, featuring Fannie Mae (De...

Julien SIMON

AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)

Julien SIMON

AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...

Julien SIMON

A pragmatic introduction to natural language processing models (October 2019)

Julien SIMON

Building smart applications with AWS AI services (October 2019)

Julien SIMON

Build, train and deploy ML models with SageMaker (October 2019)

Julien SIMON

The Future of AI (September 2019)

Julien SIMON

Talk at OSCON, Portland, 18/07/2019 Real-life Machine Learning applications require more than a single model. Data may need pre-processing: normalization, feature engineering, dimensionality reduction, etc. Predictions may need post-processing: filtering, sorting, combining, etc. Our goal: build scalable ML pipelines with open source (Spark, Scikit-learn, XGBoost) and managed services (Amazon EMR, AWS Glue, Amazon SageMaker)

Building Machine Learning Inference Pipelines at Scale (July 2019)

Julien SIMON

Train and Deploy Machine Learning Workloads with AWS Container Services (July...

Julien SIMON

Optimize your Machine Learning Workloads on AWS (July 2019)

Julien SIMON

Deep Learning on Amazon Sagemaker (July 2019)

Julien SIMON

Automate your Amazon SageMaker Workflows (July 2019)

Julien SIMON

Build, train and deploy ML models with Amazon SageMaker (May 2019)

Julien SIMON

Build, train and deploy Machine Learning models on Amazon SageMaker (May 2019)

Julien SIMON

Become a Machine Learning developer with AWS services (May 2019)

Julien SIMON

Scaling Machine Learning from zero to millions of users (May 2019)

Julien SIMON

More from Julien SIMON (20)