SlideShare a Scribd company logo
1 of 28
Download to read offline
Image to Prompts
Stable Diffusion - Image to Prompts
Featured Code Competition, Kaggle
Pin-Yen Liu, Po-Chun Huang, Po-Chuan Chen
June 2, 2023
1 / 28
Image to Prompts
Table of contents
1 Motivation
2 Dataset
3 Method
4 Analysis
5 Discussion
6 Reference
2 / 28
Image to Prompts
Motivation
Table of contents
1 Motivation
2 Dataset
3 Method
4 Analysis
5 Discussion
6 Reference
3 / 28
Image to Prompts
Motivation
Motivation
This project is a Kaggle competition called Image to Prompts [1].
In this competition, we need to reverse the typical direction of a
generative text-to-image model: instead of generating an image from
a text prompt.
We want to create a model which can predict the text prompt given a
generated image. And making predictions on a dataset containing a
wide variety of (prompt, image) pairs generated by Stable Diffusion
2.0, to understand how reversible the latent relationship is.
4 / 28
Image to Prompts
Motivation
Here has an example, if we have an image like this below
We may need to generate a prompt ultrasaurus holding a black bean
taco and a piece of cheese in the woods for this image.
5 / 28
Image to Prompts
Dataset
Table of contents
1 Motivation
2 Dataset
3 Method
4 Analysis
5 Discussion
6 Reference
6 / 28
Image to Prompts
Dataset
Dataset
Because our method is to use pre-trained model. So there has no
dataset for our method.
But, there have some datasets that can be used in this competition.
The dataset is build in (image, prompt) pairs. For example:
DiffusuinDB [8] : 14 million image-text pairs
COCO (Microsoft Common Objects in Context) [6]
: 2.5 million labeled instances in 328k images
Laion2B-en [10] : 2.3 billion image-text pairs
7 / 28
Image to Prompts
Method
Table of contents
1 Motivation
2 Dataset
3 Method
ViT
CLIP interrogator
OFA
4 Analysis
5 Discussion
8 / 28
Image to Prompts
Method
Method
Our method is to ensemble the CLIP Interrogator, OFA model,and
ViT model.
Here’s the ratio for three different model
1 Vision Transformer (ViT) model: 74.88%
2 CLIP Interrogator: 21.12%
3 OFA model fine-tuned for image captioning: 4%
9 / 28
Image to Prompts
Method
ViT
ViT
The Vision Transformer (ViT) [3] is a transformer encoder model
(BERT-like), and pretrained on ImageNet-21k (14 million images,
21,843 classes) at resolution 224x224, and fine-tuned on ImageNet
2012 (1 million images, 1,000 classes) at resolution 224x224.
10 / 28
Image to Prompts
Method
CLIP interrogator
CLIP interrogator
The CLIP Interrogator is a prompt engineering tool that combines
OpenAI’s CLIP [7] and Salesforce’s BLIP [5] to optimize text
prompts to match a given image.
Figure: CLIP interrogator : A tool on Hugging Face Space
11 / 28
Image to Prompts
Method
CLIP interrogator
CLIP interrogator pipeline
Figure: CLIP Interrogator pipeline
12 / 28
Image to Prompts
Method
CLIP interrogator
BLIP
Figure: Framework of BLIP
13 / 28
Image to Prompts
Method
OFA
OFA
OFA[9] is a unified multimodal pretrained model that unifies
modalities (i.e., cross-modality, vision, language) and tasks (e.g.,
image generation, visual grounding, image captioning, image
classification, text generation, etc.) to a simple sequence-to-sequence
learning framework.
In here, we use OFA-large-caption for image captioning for
pre-trained model, it uses COCO dataset for training the model.
14 / 28
Image to Prompts
Analysis
Table of contents
1 Motivation
2 Dataset
3 Method
4 Analysis
Sentence Transformer
ViT
CLIP interrogator
OFA
5 Discussion
15 / 28
Image to Prompts
Analysis
Sentence Transformer
For the competition, we need to calculate Stable Diffusion Prompt
Embeddings.
Input will be the prompt sentence, output will be the embedding. We
can get the embedding with using sentence transformer. The model
called Sentence-BERT [8].
Figure: SBERT architecture at inference
16 / 28
Image to Prompts
Analysis
Sentence Transformer
Figure: Ground Truth Prompts
Figure: Text Embedding for each prompt. Each of them has 384 dimensions.
17 / 28
Image to Prompts
Analysis
ViT
ViT
Figure: Label generated by Vision Transformer
18 / 28
Image to Prompts
Analysis
CLIP interrogator
CLIP interrogator
Figure: Prompts generated by CLIP interrogator
19 / 28
Image to Prompts
Analysis
OFA
OFA Model EDA
Figure: Prompts generated by OFA model
20 / 28
Image to Prompts
Analysis
OFA
Model Caption SBERT Score
ViT
comic book, triceratops,
coffee mug, jigsaw puzzle,
book jacket, dust cover,
dust jacket
0.553
CLIP
Interrogator
a cartoon dinosaur with
a piece of cheese in its mouth,
boardgamegeek, gaia,
woodland background,
pancake flat head, inspired by
Charles Fremont Conner,
mobile learning app prototype,
is essentially arbitrary, blue scales,
museum item, nachos, prehistory, plates
0.739
OFA
A blue dinosaur eating a
piece of cheese in a forest
0.772
21 / 28
Image to Prompts
Discussion
Table of contents
1 Motivation
2 Dataset
3 Method
4 Analysis
5 Discussion
6 Reference
22 / 28
Image to Prompts
Discussion
Discussion
This competition aims to explore the reversibility of the latent
relationship between text prompts and generated images, challenging
participants to predict the text prompt from a given image.
It highlights the importance of prompt engineering in text-to-image
models and the potential for understanding the intricate relationships
between prompts and images.
The combination of CLIP Interrogator, OFA, and ViT models provides
a robust approach for analyzing and predicting prompt embeddings.
Figure: Ranking on Kaggle Leader-board (Top 36%)
23 / 28
Image to Prompts
Discussion
Future work
But the method for this competition can still be improved.
Using the generation of a large dataset of images and prompts, along
with modifications and training of models, demonstrates the potential
for further improvements in the field of text-to-image generation and
prompt engineering (Score: 0.66316, Rank 1) [2].
Or, the utilization of a ViT-based method and the creation of a
customized dataset, indicating the importance of supervised learning
in predicting sentence embeddings and its relevance to advancing
text-to-image models (Score: 0.65179, Rank 2) [4].
24 / 28
Image to Prompts
Reference
Table of contents
1 Motivation
2 Dataset
3 Method
4 Analysis
5 Discussion
6 Reference
25 / 28
Image to Prompts
Reference
References I
[1] Will Cukierski Ashley Chow inversion. Stable Diffusion -
Image to Prompts. 2023. url:
https://kaggle.com/competitions/stable-
diffusion-image-to-prompts.
[2] bestfitting. 1st Place Solution. 2023. url:
https://www.kaggle.com/competitions/stable-
diffusion-image-to-prompts/discussion/411237.
[3] Alexey Dosovitskiy et al. An Image is Worth 16x16 Words:
Transformers for Image Recognition at Scale. 2021. arXiv:
2010.11929 [cs.CV].
[4] KaizaburoChubachi. 2nd place solution. 2023. url:
https://www.kaggle.com/competitions/stable-
diffusion-image-to-prompts/discussion/410606.
26 / 28
Image to Prompts
Reference
References II
[5] Junnan Li et al. BLIP: Bootstrapping Language-Image
Pre-training for Unified Vision-Language Understanding and
Generation. 2022. arXiv: 2201.12086 [cs.CV].
[6] Tsung-Yi Lin et al. Microsoft COCO: Common Objects in
Context. 2015. arXiv: 1405.0312 [cs.CV].
[7] Alec Radford et al. Learning Transferable Visual Models From
Natural Language Supervision. 2021. arXiv: 2103.00020
[cs.CV].
[8] Nils Reimers and Iryna Gurevych. “Sentence-BERT: Sentence
Embeddings using Siamese BERT-Networks”. In: CoRR
abs/1908.10084 (2019). arXiv: 1908.10084. url:
http://arxiv.org/abs/1908.10084.
27 / 28
Image to Prompts
Reference
References III
[9] Peng Wang et al. OFA: Unifying Architectures, Tasks, and
Modalities Through a Simple Sequence-to-Sequence Learning
Framework. 2022. arXiv: 2202.03052 [cs.CV].
[10] Ryan Webster et al. On the De-duplication of LAION-2B. 2023.
arXiv: 2303.12733 [cs.CV].
28 / 28

More Related Content

Similar to Image_to_Prompts.pdf

Web Image Retrieval Using Visual Dictionary
Web Image Retrieval Using Visual DictionaryWeb Image Retrieval Using Visual Dictionary
Web Image Retrieval Using Visual Dictionaryijwscjournal
 
Web Image Retrieval Using Visual Dictionary
Web Image Retrieval Using Visual DictionaryWeb Image Retrieval Using Visual Dictionary
Web Image Retrieval Using Visual Dictionaryijwscjournal
 
Comprehensive Performance Comparison of Cosine, Walsh, Haar, Kekre, Sine, Sla...
Comprehensive Performance Comparison of Cosine, Walsh, Haar, Kekre, Sine, Sla...Comprehensive Performance Comparison of Cosine, Walsh, Haar, Kekre, Sine, Sla...
Comprehensive Performance Comparison of Cosine, Walsh, Haar, Kekre, Sine, Sla...CSCJournals
 
Extended Performance Appraise of Image Retrieval Using the Feature Vector as ...
Extended Performance Appraise of Image Retrieval Using the Feature Vector as ...Extended Performance Appraise of Image Retrieval Using the Feature Vector as ...
Extended Performance Appraise of Image Retrieval Using the Feature Vector as ...Waqas Tariq
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
自然方策勾配法の基礎と応用
自然方策勾配法の基礎と応用自然方策勾配法の基礎と応用
自然方策勾配法の基礎と応用Ryo Iwaki
 
A comparative analysis of retrieval techniques in content based image retrieval
A comparative analysis of retrieval techniques in content based image retrievalA comparative analysis of retrieval techniques in content based image retrieval
A comparative analysis of retrieval techniques in content based image retrievalcsandit
 
A COMPARATIVE ANALYSIS OF RETRIEVAL TECHNIQUES IN CONTENT BASED IMAGE RETRIEVAL
A COMPARATIVE ANALYSIS OF RETRIEVAL TECHNIQUES IN CONTENT BASED IMAGE RETRIEVALA COMPARATIVE ANALYSIS OF RETRIEVAL TECHNIQUES IN CONTENT BASED IMAGE RETRIEVAL
A COMPARATIVE ANALYSIS OF RETRIEVAL TECHNIQUES IN CONTENT BASED IMAGE RETRIEVALcscpconf
 
IMAGE CAPTIONING USING TRANSFORMER: VISIONAID
IMAGE CAPTIONING USING TRANSFORMER: VISIONAIDIMAGE CAPTIONING USING TRANSFORMER: VISIONAID
IMAGE CAPTIONING USING TRANSFORMER: VISIONAIDIRJET Journal
 
IRJET-Image Question Answering: A Review
IRJET-Image Question Answering: A ReviewIRJET-Image Question Answering: A Review
IRJET-Image Question Answering: A ReviewIRJET Journal
 
Cartoon Based Image Retrieval : An Indexing Approach
Cartoon Based Image Retrieval : An Indexing ApproachCartoon Based Image Retrieval : An Indexing Approach
Cartoon Based Image Retrieval : An Indexing Approachmlaij
 
WEB IMAGE RETRIEVAL USING CLUSTERING APPROACHES
WEB IMAGE RETRIEVAL USING CLUSTERING APPROACHESWEB IMAGE RETRIEVAL USING CLUSTERING APPROACHES
WEB IMAGE RETRIEVAL USING CLUSTERING APPROACHEScscpconf
 
A Hybrid Approach for Content Based Image Retrieval System
A Hybrid Approach for Content Based Image Retrieval SystemA Hybrid Approach for Content Based Image Retrieval System
A Hybrid Approach for Content Based Image Retrieval SystemIOSR Journals
 
Implementation of Picwords to Warping Pictures and Keywords through Calligram
Implementation of Picwords to Warping Pictures and Keywords through CalligramImplementation of Picwords to Warping Pictures and Keywords through Calligram
Implementation of Picwords to Warping Pictures and Keywords through CalligramIRJET Journal
 
Modelling User Interaction utilising Information Foraging Theory (and a bit o...
Modelling User Interaction utilising Information Foraging Theory (and a bit o...Modelling User Interaction utilising Information Foraging Theory (and a bit o...
Modelling User Interaction utilising Information Foraging Theory (and a bit o...Ingo Frommholz
 
TEMPLATE MATCHING TECHNIQUE FOR SEARCHING WORDS IN DOCUMENT IMAGES
TEMPLATE MATCHING TECHNIQUE FOR SEARCHING WORDS IN DOCUMENT IMAGESTEMPLATE MATCHING TECHNIQUE FOR SEARCHING WORDS IN DOCUMENT IMAGES
TEMPLATE MATCHING TECHNIQUE FOR SEARCHING WORDS IN DOCUMENT IMAGESIJCI JOURNAL
 
LEARNING TO RANK IMAGE TAGS WITH LIMITED TRAINING EXAMPLES - IEEE PROJECTS I...
LEARNING TO RANK IMAGE TAGS WITH LIMITED TRAINING EXAMPLES  - IEEE PROJECTS I...LEARNING TO RANK IMAGE TAGS WITH LIMITED TRAINING EXAMPLES  - IEEE PROJECTS I...
LEARNING TO RANK IMAGE TAGS WITH LIMITED TRAINING EXAMPLES - IEEE PROJECTS I...Nexgen Technology
 
Learning to rank image tags with limited
Learning to rank image tags with limitedLearning to rank image tags with limited
Learning to rank image tags with limitednexgentech15
 
LEARNING TO RANK IMAGE TAGS WITH LIMITED TRAINING EXAMPLES
LEARNING TO RANK IMAGE TAGS WITH LIMITED TRAINING EXAMPLESLEARNING TO RANK IMAGE TAGS WITH LIMITED TRAINING EXAMPLES
LEARNING TO RANK IMAGE TAGS WITH LIMITED TRAINING EXAMPLESnexgentechnology
 

Similar to Image_to_Prompts.pdf (20)

Web Image Retrieval Using Visual Dictionary
Web Image Retrieval Using Visual DictionaryWeb Image Retrieval Using Visual Dictionary
Web Image Retrieval Using Visual Dictionary
 
Web Image Retrieval Using Visual Dictionary
Web Image Retrieval Using Visual DictionaryWeb Image Retrieval Using Visual Dictionary
Web Image Retrieval Using Visual Dictionary
 
Comprehensive Performance Comparison of Cosine, Walsh, Haar, Kekre, Sine, Sla...
Comprehensive Performance Comparison of Cosine, Walsh, Haar, Kekre, Sine, Sla...Comprehensive Performance Comparison of Cosine, Walsh, Haar, Kekre, Sine, Sla...
Comprehensive Performance Comparison of Cosine, Walsh, Haar, Kekre, Sine, Sla...
 
Extended Performance Appraise of Image Retrieval Using the Feature Vector as ...
Extended Performance Appraise of Image Retrieval Using the Feature Vector as ...Extended Performance Appraise of Image Retrieval Using the Feature Vector as ...
Extended Performance Appraise of Image Retrieval Using the Feature Vector as ...
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
自然方策勾配法の基礎と応用
自然方策勾配法の基礎と応用自然方策勾配法の基礎と応用
自然方策勾配法の基礎と応用
 
A comparative analysis of retrieval techniques in content based image retrieval
A comparative analysis of retrieval techniques in content based image retrievalA comparative analysis of retrieval techniques in content based image retrieval
A comparative analysis of retrieval techniques in content based image retrieval
 
A COMPARATIVE ANALYSIS OF RETRIEVAL TECHNIQUES IN CONTENT BASED IMAGE RETRIEVAL
A COMPARATIVE ANALYSIS OF RETRIEVAL TECHNIQUES IN CONTENT BASED IMAGE RETRIEVALA COMPARATIVE ANALYSIS OF RETRIEVAL TECHNIQUES IN CONTENT BASED IMAGE RETRIEVAL
A COMPARATIVE ANALYSIS OF RETRIEVAL TECHNIQUES IN CONTENT BASED IMAGE RETRIEVAL
 
IMAGE CAPTIONING USING TRANSFORMER: VISIONAID
IMAGE CAPTIONING USING TRANSFORMER: VISIONAIDIMAGE CAPTIONING USING TRANSFORMER: VISIONAID
IMAGE CAPTIONING USING TRANSFORMER: VISIONAID
 
IRJET-Image Question Answering: A Review
IRJET-Image Question Answering: A ReviewIRJET-Image Question Answering: A Review
IRJET-Image Question Answering: A Review
 
Cartoon Based Image Retrieval : An Indexing Approach
Cartoon Based Image Retrieval : An Indexing ApproachCartoon Based Image Retrieval : An Indexing Approach
Cartoon Based Image Retrieval : An Indexing Approach
 
IPT.pdf
IPT.pdfIPT.pdf
IPT.pdf
 
WEB IMAGE RETRIEVAL USING CLUSTERING APPROACHES
WEB IMAGE RETRIEVAL USING CLUSTERING APPROACHESWEB IMAGE RETRIEVAL USING CLUSTERING APPROACHES
WEB IMAGE RETRIEVAL USING CLUSTERING APPROACHES
 
A Hybrid Approach for Content Based Image Retrieval System
A Hybrid Approach for Content Based Image Retrieval SystemA Hybrid Approach for Content Based Image Retrieval System
A Hybrid Approach for Content Based Image Retrieval System
 
Implementation of Picwords to Warping Pictures and Keywords through Calligram
Implementation of Picwords to Warping Pictures and Keywords through CalligramImplementation of Picwords to Warping Pictures and Keywords through Calligram
Implementation of Picwords to Warping Pictures and Keywords through Calligram
 
Modelling User Interaction utilising Information Foraging Theory (and a bit o...
Modelling User Interaction utilising Information Foraging Theory (and a bit o...Modelling User Interaction utilising Information Foraging Theory (and a bit o...
Modelling User Interaction utilising Information Foraging Theory (and a bit o...
 
TEMPLATE MATCHING TECHNIQUE FOR SEARCHING WORDS IN DOCUMENT IMAGES
TEMPLATE MATCHING TECHNIQUE FOR SEARCHING WORDS IN DOCUMENT IMAGESTEMPLATE MATCHING TECHNIQUE FOR SEARCHING WORDS IN DOCUMENT IMAGES
TEMPLATE MATCHING TECHNIQUE FOR SEARCHING WORDS IN DOCUMENT IMAGES
 
LEARNING TO RANK IMAGE TAGS WITH LIMITED TRAINING EXAMPLES - IEEE PROJECTS I...
LEARNING TO RANK IMAGE TAGS WITH LIMITED TRAINING EXAMPLES  - IEEE PROJECTS I...LEARNING TO RANK IMAGE TAGS WITH LIMITED TRAINING EXAMPLES  - IEEE PROJECTS I...
LEARNING TO RANK IMAGE TAGS WITH LIMITED TRAINING EXAMPLES - IEEE PROJECTS I...
 
Learning to rank image tags with limited
Learning to rank image tags with limitedLearning to rank image tags with limited
Learning to rank image tags with limited
 
LEARNING TO RANK IMAGE TAGS WITH LIMITED TRAINING EXAMPLES
LEARNING TO RANK IMAGE TAGS WITH LIMITED TRAINING EXAMPLESLEARNING TO RANK IMAGE TAGS WITH LIMITED TRAINING EXAMPLES
LEARNING TO RANK IMAGE TAGS WITH LIMITED TRAINING EXAMPLES
 

More from Po-Chuan Chen

Effective Structured Prompting by Meta-Learning and Representative Verbalizer...
Effective Structured Prompting by Meta-Learning and Representative Verbalizer...Effective Structured Prompting by Meta-Learning and Representative Verbalizer...
Effective Structured Prompting by Meta-Learning and Representative Verbalizer...Po-Chuan Chen
 
Quark: Controllable Text Generation with Reinforced [Un]learning.pdf
Quark: Controllable Text Generation with Reinforced [Un]learning.pdfQuark: Controllable Text Generation with Reinforced [Un]learning.pdf
Quark: Controllable Text Generation with Reinforced [Un]learning.pdfPo-Chuan Chen
 
Empathetic Dialogue Generation via Sensitive Emotion Recognition and Sensible...
Empathetic Dialogue Generation via Sensitive Emotion Recognition and Sensible...Empathetic Dialogue Generation via Sensitive Emotion Recognition and Sensible...
Empathetic Dialogue Generation via Sensitive Emotion Recognition and Sensible...Po-Chuan Chen
 
On the Effectiveness of Offline RL for Dialogue Response Generation.pdf
On the Effectiveness of Offline RL for Dialogue Response Generation.pdfOn the Effectiveness of Offline RL for Dialogue Response Generation.pdf
On the Effectiveness of Offline RL for Dialogue Response Generation.pdfPo-Chuan Chen
 
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transfor...
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transfor...GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transfor...
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transfor...Po-Chuan Chen
 
A Statistical Perspective on Retrieval-Based Models.pdf
A Statistical Perspective on Retrieval-Based Models.pdfA Statistical Perspective on Retrieval-Based Models.pdf
A Statistical Perspective on Retrieval-Based Models.pdfPo-Chuan Chen
 
A Neural Corpus Indexer for Document Retrieval.pdf
A Neural Corpus Indexer for Document Retrieval.pdfA Neural Corpus Indexer for Document Retrieval.pdf
A Neural Corpus Indexer for Document Retrieval.pdfPo-Chuan Chen
 
AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning.pdf
AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning.pdfAdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning.pdf
AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning.pdfPo-Chuan Chen
 
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...Po-Chuan Chen
 
Active Retrieval Augmented Generation.pdf
Active Retrieval Augmented Generation.pdfActive Retrieval Augmented Generation.pdf
Active Retrieval Augmented Generation.pdfPo-Chuan Chen
 
Offline Reinforcement Learning for Informal Summarization in Online Domains.pdf
Offline Reinforcement Learning for Informal Summarization in Online Domains.pdfOffline Reinforcement Learning for Informal Summarization in Online Domains.pdf
Offline Reinforcement Learning for Informal Summarization in Online Domains.pdfPo-Chuan Chen
 
Cold_Start_Reinforcement_Learning_with_Softmax_Policy_Gradient.pdf
Cold_Start_Reinforcement_Learning_with_Softmax_Policy_Gradient.pdfCold_Start_Reinforcement_Learning_with_Softmax_Policy_Gradient.pdf
Cold_Start_Reinforcement_Learning_with_Softmax_Policy_Gradient.pdfPo-Chuan Chen
 
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdfRetrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdfPo-Chuan Chen
 
Evaluating Parameter Efficient Learning for Generation.pdf
Evaluating Parameter Efficient Learning for Generation.pdfEvaluating Parameter Efficient Learning for Generation.pdf
Evaluating Parameter Efficient Learning for Generation.pdfPo-Chuan Chen
 
Off-Policy Deep Reinforcement Learning without Exploration.pdf
Off-Policy Deep Reinforcement Learning without Exploration.pdfOff-Policy Deep Reinforcement Learning without Exploration.pdf
Off-Policy Deep Reinforcement Learning without Exploration.pdfPo-Chuan Chen
 
A Mixture-of-Expert Approach to RL-based Dialogue Management.pdf
A Mixture-of-Expert Approach to RL-based Dialogue Management.pdfA Mixture-of-Expert Approach to RL-based Dialogue Management.pdf
A Mixture-of-Expert Approach to RL-based Dialogue Management.pdfPo-Chuan Chen
 
Is Reinforcement Learning (Not) for Natural Language Processing.pdf
Is Reinforcement Learning (Not) for Natural
Language Processing.pdfIs Reinforcement Learning (Not) for Natural
Language Processing.pdf
Is Reinforcement Learning (Not) for Natural Language Processing.pdfPo-Chuan Chen
 
HyperPrompt:Prompt-based Task-Conditioning of Transformerspdf
HyperPrompt:Prompt-based Task-Conditioning of TransformerspdfHyperPrompt:Prompt-based Task-Conditioning of Transformerspdf
HyperPrompt:Prompt-based Task-Conditioning of TransformerspdfPo-Chuan Chen
 
Training language models to follow instructions with human feedback.pdf
Training language models to follow instructions
with human feedback.pdfTraining language models to follow instructions
with human feedback.pdf
Training language models to follow instructions with human feedback.pdfPo-Chuan Chen
 
Leveling to the Last Mile: Near-zero-cost Bit Level Wear Leveling for PCM-bas...
Leveling to the Last Mile: Near-zero-cost Bit Level Wear Leveling for PCM-bas...Leveling to the Last Mile: Near-zero-cost Bit Level Wear Leveling for PCM-bas...
Leveling to the Last Mile: Near-zero-cost Bit Level Wear Leveling for PCM-bas...Po-Chuan Chen
 

More from Po-Chuan Chen (20)

Effective Structured Prompting by Meta-Learning and Representative Verbalizer...
Effective Structured Prompting by Meta-Learning and Representative Verbalizer...Effective Structured Prompting by Meta-Learning and Representative Verbalizer...
Effective Structured Prompting by Meta-Learning and Representative Verbalizer...
 
Quark: Controllable Text Generation with Reinforced [Un]learning.pdf
Quark: Controllable Text Generation with Reinforced [Un]learning.pdfQuark: Controllable Text Generation with Reinforced [Un]learning.pdf
Quark: Controllable Text Generation with Reinforced [Un]learning.pdf
 
Empathetic Dialogue Generation via Sensitive Emotion Recognition and Sensible...
Empathetic Dialogue Generation via Sensitive Emotion Recognition and Sensible...Empathetic Dialogue Generation via Sensitive Emotion Recognition and Sensible...
Empathetic Dialogue Generation via Sensitive Emotion Recognition and Sensible...
 
On the Effectiveness of Offline RL for Dialogue Response Generation.pdf
On the Effectiveness of Offline RL for Dialogue Response Generation.pdfOn the Effectiveness of Offline RL for Dialogue Response Generation.pdf
On the Effectiveness of Offline RL for Dialogue Response Generation.pdf
 
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transfor...
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transfor...GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transfor...
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transfor...
 
A Statistical Perspective on Retrieval-Based Models.pdf
A Statistical Perspective on Retrieval-Based Models.pdfA Statistical Perspective on Retrieval-Based Models.pdf
A Statistical Perspective on Retrieval-Based Models.pdf
 
A Neural Corpus Indexer for Document Retrieval.pdf
A Neural Corpus Indexer for Document Retrieval.pdfA Neural Corpus Indexer for Document Retrieval.pdf
A Neural Corpus Indexer for Document Retrieval.pdf
 
AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning.pdf
AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning.pdfAdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning.pdf
AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning.pdf
 
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...
 
Active Retrieval Augmented Generation.pdf
Active Retrieval Augmented Generation.pdfActive Retrieval Augmented Generation.pdf
Active Retrieval Augmented Generation.pdf
 
Offline Reinforcement Learning for Informal Summarization in Online Domains.pdf
Offline Reinforcement Learning for Informal Summarization in Online Domains.pdfOffline Reinforcement Learning for Informal Summarization in Online Domains.pdf
Offline Reinforcement Learning for Informal Summarization in Online Domains.pdf
 
Cold_Start_Reinforcement_Learning_with_Softmax_Policy_Gradient.pdf
Cold_Start_Reinforcement_Learning_with_Softmax_Policy_Gradient.pdfCold_Start_Reinforcement_Learning_with_Softmax_Policy_Gradient.pdf
Cold_Start_Reinforcement_Learning_with_Softmax_Policy_Gradient.pdf
 
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdfRetrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
 
Evaluating Parameter Efficient Learning for Generation.pdf
Evaluating Parameter Efficient Learning for Generation.pdfEvaluating Parameter Efficient Learning for Generation.pdf
Evaluating Parameter Efficient Learning for Generation.pdf
 
Off-Policy Deep Reinforcement Learning without Exploration.pdf
Off-Policy Deep Reinforcement Learning without Exploration.pdfOff-Policy Deep Reinforcement Learning without Exploration.pdf
Off-Policy Deep Reinforcement Learning without Exploration.pdf
 
A Mixture-of-Expert Approach to RL-based Dialogue Management.pdf
A Mixture-of-Expert Approach to RL-based Dialogue Management.pdfA Mixture-of-Expert Approach to RL-based Dialogue Management.pdf
A Mixture-of-Expert Approach to RL-based Dialogue Management.pdf
 
Is Reinforcement Learning (Not) for Natural Language Processing.pdf
Is Reinforcement Learning (Not) for Natural
Language Processing.pdfIs Reinforcement Learning (Not) for Natural
Language Processing.pdf
Is Reinforcement Learning (Not) for Natural Language Processing.pdf
 
HyperPrompt:Prompt-based Task-Conditioning of Transformerspdf
HyperPrompt:Prompt-based Task-Conditioning of TransformerspdfHyperPrompt:Prompt-based Task-Conditioning of Transformerspdf
HyperPrompt:Prompt-based Task-Conditioning of Transformerspdf
 
Training language models to follow instructions with human feedback.pdf
Training language models to follow instructions
with human feedback.pdfTraining language models to follow instructions
with human feedback.pdf
Training language models to follow instructions with human feedback.pdf
 
Leveling to the Last Mile: Near-zero-cost Bit Level Wear Leveling for PCM-bas...
Leveling to the Last Mile: Near-zero-cost Bit Level Wear Leveling for PCM-bas...Leveling to the Last Mile: Near-zero-cost Bit Level Wear Leveling for PCM-bas...
Leveling to the Last Mile: Near-zero-cost Bit Level Wear Leveling for PCM-bas...
 

Recently uploaded

Solving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.pptSolving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.pptJasonTagapanGulla
 
computer application and construction management
computer application and construction managementcomputer application and construction management
computer application and construction managementMariconPadriquez1
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catcherssdickerson1
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptSAURABHKUMAR892774
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substationstephanwindworld
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionMebane Rash
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHC Sai Kiran
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...Chandu841456
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
lifi-technology with integration of IOT.pptx
lifi-technology with integration of IOT.pptxlifi-technology with integration of IOT.pptx
lifi-technology with integration of IOT.pptxsomshekarkn64
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleAlluxio, Inc.
 
8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitterShivangiSharma879191
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfAsst.prof M.Gokilavani
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 

Recently uploaded (20)

Solving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.pptSolving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.ppt
 
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
 
computer application and construction management
computer application and construction managementcomputer application and construction management
computer application and construction management
 
young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.ppt
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substation
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of Action
 
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECH
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
lifi-technology with integration of IOT.pptx
lifi-technology with integration of IOT.pptxlifi-technology with integration of IOT.pptx
lifi-technology with integration of IOT.pptx
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
 
8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 

Image_to_Prompts.pdf

  • 1. Image to Prompts Stable Diffusion - Image to Prompts Featured Code Competition, Kaggle Pin-Yen Liu, Po-Chun Huang, Po-Chuan Chen June 2, 2023 1 / 28
  • 2. Image to Prompts Table of contents 1 Motivation 2 Dataset 3 Method 4 Analysis 5 Discussion 6 Reference 2 / 28
  • 3. Image to Prompts Motivation Table of contents 1 Motivation 2 Dataset 3 Method 4 Analysis 5 Discussion 6 Reference 3 / 28
  • 4. Image to Prompts Motivation Motivation This project is a Kaggle competition called Image to Prompts [1]. In this competition, we need to reverse the typical direction of a generative text-to-image model: instead of generating an image from a text prompt. We want to create a model which can predict the text prompt given a generated image. And making predictions on a dataset containing a wide variety of (prompt, image) pairs generated by Stable Diffusion 2.0, to understand how reversible the latent relationship is. 4 / 28
  • 5. Image to Prompts Motivation Here has an example, if we have an image like this below We may need to generate a prompt ultrasaurus holding a black bean taco and a piece of cheese in the woods for this image. 5 / 28
  • 6. Image to Prompts Dataset Table of contents 1 Motivation 2 Dataset 3 Method 4 Analysis 5 Discussion 6 Reference 6 / 28
  • 7. Image to Prompts Dataset Dataset Because our method is to use pre-trained model. So there has no dataset for our method. But, there have some datasets that can be used in this competition. The dataset is build in (image, prompt) pairs. For example: DiffusuinDB [8] : 14 million image-text pairs COCO (Microsoft Common Objects in Context) [6] : 2.5 million labeled instances in 328k images Laion2B-en [10] : 2.3 billion image-text pairs 7 / 28
  • 8. Image to Prompts Method Table of contents 1 Motivation 2 Dataset 3 Method ViT CLIP interrogator OFA 4 Analysis 5 Discussion 8 / 28
  • 9. Image to Prompts Method Method Our method is to ensemble the CLIP Interrogator, OFA model,and ViT model. Here’s the ratio for three different model 1 Vision Transformer (ViT) model: 74.88% 2 CLIP Interrogator: 21.12% 3 OFA model fine-tuned for image captioning: 4% 9 / 28
  • 10. Image to Prompts Method ViT ViT The Vision Transformer (ViT) [3] is a transformer encoder model (BERT-like), and pretrained on ImageNet-21k (14 million images, 21,843 classes) at resolution 224x224, and fine-tuned on ImageNet 2012 (1 million images, 1,000 classes) at resolution 224x224. 10 / 28
  • 11. Image to Prompts Method CLIP interrogator CLIP interrogator The CLIP Interrogator is a prompt engineering tool that combines OpenAI’s CLIP [7] and Salesforce’s BLIP [5] to optimize text prompts to match a given image. Figure: CLIP interrogator : A tool on Hugging Face Space 11 / 28
  • 12. Image to Prompts Method CLIP interrogator CLIP interrogator pipeline Figure: CLIP Interrogator pipeline 12 / 28
  • 13. Image to Prompts Method CLIP interrogator BLIP Figure: Framework of BLIP 13 / 28
  • 14. Image to Prompts Method OFA OFA OFA[9] is a unified multimodal pretrained model that unifies modalities (i.e., cross-modality, vision, language) and tasks (e.g., image generation, visual grounding, image captioning, image classification, text generation, etc.) to a simple sequence-to-sequence learning framework. In here, we use OFA-large-caption for image captioning for pre-trained model, it uses COCO dataset for training the model. 14 / 28
  • 15. Image to Prompts Analysis Table of contents 1 Motivation 2 Dataset 3 Method 4 Analysis Sentence Transformer ViT CLIP interrogator OFA 5 Discussion 15 / 28
  • 16. Image to Prompts Analysis Sentence Transformer For the competition, we need to calculate Stable Diffusion Prompt Embeddings. Input will be the prompt sentence, output will be the embedding. We can get the embedding with using sentence transformer. The model called Sentence-BERT [8]. Figure: SBERT architecture at inference 16 / 28
  • 17. Image to Prompts Analysis Sentence Transformer Figure: Ground Truth Prompts Figure: Text Embedding for each prompt. Each of them has 384 dimensions. 17 / 28
  • 18. Image to Prompts Analysis ViT ViT Figure: Label generated by Vision Transformer 18 / 28
  • 19. Image to Prompts Analysis CLIP interrogator CLIP interrogator Figure: Prompts generated by CLIP interrogator 19 / 28
  • 20. Image to Prompts Analysis OFA OFA Model EDA Figure: Prompts generated by OFA model 20 / 28
  • 21. Image to Prompts Analysis OFA Model Caption SBERT Score ViT comic book, triceratops, coffee mug, jigsaw puzzle, book jacket, dust cover, dust jacket 0.553 CLIP Interrogator a cartoon dinosaur with a piece of cheese in its mouth, boardgamegeek, gaia, woodland background, pancake flat head, inspired by Charles Fremont Conner, mobile learning app prototype, is essentially arbitrary, blue scales, museum item, nachos, prehistory, plates 0.739 OFA A blue dinosaur eating a piece of cheese in a forest 0.772 21 / 28
  • 22. Image to Prompts Discussion Table of contents 1 Motivation 2 Dataset 3 Method 4 Analysis 5 Discussion 6 Reference 22 / 28
  • 23. Image to Prompts Discussion Discussion This competition aims to explore the reversibility of the latent relationship between text prompts and generated images, challenging participants to predict the text prompt from a given image. It highlights the importance of prompt engineering in text-to-image models and the potential for understanding the intricate relationships between prompts and images. The combination of CLIP Interrogator, OFA, and ViT models provides a robust approach for analyzing and predicting prompt embeddings. Figure: Ranking on Kaggle Leader-board (Top 36%) 23 / 28
  • 24. Image to Prompts Discussion Future work But the method for this competition can still be improved. Using the generation of a large dataset of images and prompts, along with modifications and training of models, demonstrates the potential for further improvements in the field of text-to-image generation and prompt engineering (Score: 0.66316, Rank 1) [2]. Or, the utilization of a ViT-based method and the creation of a customized dataset, indicating the importance of supervised learning in predicting sentence embeddings and its relevance to advancing text-to-image models (Score: 0.65179, Rank 2) [4]. 24 / 28
  • 25. Image to Prompts Reference Table of contents 1 Motivation 2 Dataset 3 Method 4 Analysis 5 Discussion 6 Reference 25 / 28
  • 26. Image to Prompts Reference References I [1] Will Cukierski Ashley Chow inversion. Stable Diffusion - Image to Prompts. 2023. url: https://kaggle.com/competitions/stable- diffusion-image-to-prompts. [2] bestfitting. 1st Place Solution. 2023. url: https://www.kaggle.com/competitions/stable- diffusion-image-to-prompts/discussion/411237. [3] Alexey Dosovitskiy et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. 2021. arXiv: 2010.11929 [cs.CV]. [4] KaizaburoChubachi. 2nd place solution. 2023. url: https://www.kaggle.com/competitions/stable- diffusion-image-to-prompts/discussion/410606. 26 / 28
  • 27. Image to Prompts Reference References II [5] Junnan Li et al. BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation. 2022. arXiv: 2201.12086 [cs.CV]. [6] Tsung-Yi Lin et al. Microsoft COCO: Common Objects in Context. 2015. arXiv: 1405.0312 [cs.CV]. [7] Alec Radford et al. Learning Transferable Visual Models From Natural Language Supervision. 2021. arXiv: 2103.00020 [cs.CV]. [8] Nils Reimers and Iryna Gurevych. “Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks”. In: CoRR abs/1908.10084 (2019). arXiv: 1908.10084. url: http://arxiv.org/abs/1908.10084. 27 / 28
  • 28. Image to Prompts Reference References III [9] Peng Wang et al. OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework. 2022. arXiv: 2202.03052 [cs.CV]. [10] Ryan Webster et al. On the De-duplication of LAION-2B. 2023. arXiv: 2303.12733 [cs.CV]. 28 / 28