SlideShare una empresa de Scribd logo
1 de 39
Transfer Learning in NLP:
Concepts and Tools
Thomas Wolf
HuggingFace Inc.
Overview
● Concepts and History
● Anatomy of a State-of-the-art Model
● Open source tools
● Current Trends
● Limits and Open Questions
Sebastian
Ruder
Matthew
Peters
Swabha
Swayamdipta
Some slides are adapted from our
NAACL 2019 Tutorial on Transfer
Learning in NLP with my collaborators
👈
Slides: http://tiny.cc/NAACLTransfer
Transfer Learning in NLP: Concepts and Tools - Thomas Wolf - Slide 4
Concepts & History
5
What is Transfer Learning?
Pan and Yang (2010)
Transfer Learning in NLP: Concepts and Tools - Thomas Wolf - Slide 6
Why Transfer Learning in NLP? (intuitively)
Why should Transfer Learning work in NLP?
● Many NLP tasks share common knowledge about language (linguistic
representations, structural similarities...)
● Tasks can inform each other—e.g. syntax and semantics
● Annotated data is rare, make use of as much supervision as available.
● Unlabelled data is super abundant (internet), should try to use it
Empirically, transfer learning has resulted in SOTA for many supervised NLP tasks
(e.g. classification, information extraction, Q&A, etc)
Transfer Learning in NLP: Concepts and Tools - Thomas Wolf - Slide 7
Why Transfer Learning in NLP? (empirically)
Transfer Learning in NLP: Concepts and Tools - Thomas Wolf - Slide 8
Performance on Named Entity Recognition (NER) on CoNLL-2003 (English) over time
Transfer Learning in NLP: Concepts and Tools - Thomas Wolf - Slide 9
Ruder (2019)
We will
focus on
this
Types of transfer learning in NLP
Training: Sequential Transfer Learning
Learn on one task / dataset, then transfer to another task / dataset
word2vec
GloVe
skip-thought
InferSent
ELMo
ULMFiT
GPT
BERT
classification
sequence labeling
Q&A
....
Pretraining Adaptation
Transfer Learning in NLP: Concepts and Tools - Thomas Wolf - Slide 10
[-0.4, 0.9, …]
History
Word vectors
cats = [0.2, -0.3, …]
dogs = [0.4, -0.5, …]
Sentence/doc vectors
It’s raining
cats and dogs.
We have two
cats.
[0.8, 0.9, …]
[-1.2, 0.0, …]
}
}
Word-in-context
vectors
We have two cats.
}
[1.2, -0.3, …]
It’s raining cats and dogs.
}
Transfer Learning in NLP: Concepts and Tools - Thomas Wolf - Slide 11
History: The rise of language modeling
Many currently successful pretraining approaches are based on language
modeling, i.e. learning to predict:
● empirical probability of text: Pϴ(text)
● empirical conditional probability of text (e.g. translation): Pϴ(text | other text)
Advantages:
● Doesn’t require human annotation
● Many languages have enough text to learn high capacity model
● Versatile—can learn both sentence or word representations with a variety of
objective functions (autoregressive language modeling, masked language
modeling, span prediction, skip-thoughts, cross-view training….)
Transfer Learning in NLP: Concepts and Tools - Thomas Wolf - Slide 12
● Language modeling is a very difficult task, even for humans.
● Language models are expected to compress any possible context into a
vector that generalizes over possible completions.
○ E.g. “I think this is the beginning of a beautiful ???”
Transfer Learning in NLP: Concepts and Tools - Thomas Wolf - Slide 13
History: The rise of language modeling
● To have any chance at solving this task, a model is forced to learn syntax,
semantics, encode facts about the world, etc.
● Given enough data and compute, a big model can do a reasonable job!
Anatomy of a State-of-the-Art
Transfer Learning Model
14
A State-of-the-Art Transfer Learning Model
Transfer Learning in NLP: Concepts and Tools - Thomas Wolf - Slide 15
Two essential components: model & training
● The model: pre-training architecture and adaptations for fine-tuning
○ Current large architectures are mostly based on Transformers (but ULMFiT)
○ Unclear advantages of smart architecture (XLNet) versus more data (RoBERTa)
○ Trend toward larger models: XLM (664M), GPT-2 (1.5B), Megatron-LM (8.5 B)
● The training: pre-training and adaptation phases
○ Learning long term dependencies => long stream of continuous text (books, wiki)
○ Toward using more data in both phases RoBERTa (160GB) MT-DNN (WNLI)
○ Quality of the data is important
Pretrained
model
Adaptation
Head
Tokenizer
Model: Using a typical Transfer Learning model
Transfer Learning in NLP: Concepts and Tools - Thomas Wolf - Slide 16
Jim Henson was a puppeteer
Jim
Henson
was
a
puppet
##eer
Tokenization
11067
5567
245
120
7756
9908
1.2 2.7 0.6 -0.2
3.7 9.1 -2.1 3.1
1.5 -4.7 2.4 6.7
6.1 2.4 7.3 -0.6
-3.1 2.5 1.9 -0.1
0.7 2.1 4.2 -3.1
Classifier
model
Convert
to
vocabulary
indices
Pretrained
model
True 0.7886
False -0.223
Model: From shallow to deep
Devlin et al 2019: BERT: Pre-training of
Deep Bidirectional Transformers for
Language Understanding
1 layer 24 layers
Transfer Learning in NLP: Concepts and Tools - Thomas Wolf - Slide 17
Bengio et al 2003: A Neural Probabilistic
Language Model
BERT is pretrained for both sentence and contextual word representations, using masked language
modeling and next sentence prediction.
● Pretrained model: BERT-large has 340M parameters, 24 layers
● Adaptation head: just a linear layer on top of the representation output by the pretrained model.
Model: the example of BERT
Transfer Learning in NLP: Concepts and Tools - Thomas Wolf - Slide 18
See also: Logeswaran and Lee, ICLR 2018, (Devlin et al. 2019)
Large-scale transformer architectures (GPT-2, BERT, XLM…) are very similar to each other and consist of:
● summing words and position embeddings
● applying a succession of transformer blocks with:
○ layer normalisation
○ a self-attention module
○ dropout and a residual connection
○ another layer normalisation
○ a feed-forward module with one hidden layer and a non linearity:
Linear ⇨ ReLU/gelu ⇨ Linear
○ dropout and a residual connection
Model: Inside BERT, GPT-2, XLNet, RoBERTa
Main differences between BERT, GPT-2, XLNet: the pretraining objective
● causal language modeling for GPT
● masked language modeling for BERT (+ next sentence prediction)
Transfer Learning in NLP: Concepts and Tools - Thomas Wolf - Slide 19
(Child et al, 2019)
Model: Adapting for target task
Transfer Learning in NLP: Concepts and Tools - Thomas Wolf - Slide 20
General workflow:
1. Remove pretraining task head if not useful for target task
E.g. remove softmax classifier
2. Add target task-specific layers on top/bottom of
pretrained model
Simple: adding linear layer(s) on top of the pretrained model
More complex: model output as input for a separate model
Sometimes more complex: Adapting to a structurally different task
Ex: Pretraining with a single input sequence and adapting to a task with
several input sequences (ex: translation, conditional generation...)
➯ Use pretrained model to initialize as much as possible of target model
➯ Ramachandran et al., EMNLP 2017; Lample & Conneau, 2019
Training: Adaptation on a text classification task
Transfer Learning in NLP: Concepts and Tools - Thomas Wolf - Slide 21
Replace the pretraining
head with a classification
head: a linear layer,
which takes as input the
hidden-state of a token
Keep our pretrained model
unchanged as the backbone.
Initialization of the model:
● Initialize the weights of the model (in particular the added parameters)
● Reload common weights from the pretrained model.
Training: Adaptation on a text classification task
Transfer Learning in NLP: Concepts and Tools - Thomas Wolf - Slide 22
We are at the state-of-the-art
(ULMFiT)
Remarks:
❏ The error rate goes down quickly! After one epoch we already have >90% accuracy.
⇨ Fine-tuning is highly data efficient in Transfer Learning
❏ We took our pre-training & fine-tuning hyper-parameters straight from the literature on related models.
⇨ Fine-tuning is often robust to the exact choice of hyper-parameters
Training: Adaptation on a text classification task
Transfer Learning in NLP: Concepts and Tools - Thomas Wolf - Slide 23
A few words on robustness & variance.
❏ Large pretrained models (e.g. BERT large) are
prone to degenerate performance when fine-tuned
on tasks with small training sets.
❏ Observed behavior is often “on-off”: it either works
very well or doesn’t work at all.
❏ Understanding the conditions and causes of this
behavior (models, adaptation schemes) is an
open research question.
Phang et al., 2018 23
Open-source tools
Hubs and Libraries
24
Open-sourcing: practical considerations
● Pretraining large-scale models is costly
Use open-source models
Share your pretrained models
“Energy and Policy Considerations for Deep Learning in NLP” - Strubell, Ganesh, McCallum - ACL 2019
● Sharing/accessing pretrained models
○ Hubs: Tensorflow Hub, PyTorch Hub
○ Author released checkpoints: ex BERT, GPT...
○ Third-party libraries: AllenNLP, fast.ai, HuggingFace
● Design considerations
○ Hubs/libraries:
■ Simple to use but can be difficult to modify model internal architecture
○ Author released checkpoints:
■ More difficult to use but you have full control over the model internals
Transfer Learning in NLP: Concepts and Tools - Thomas Wolf - Slide 25
● Based on GitHub repositories, a model is shared by adding a file to the GitHub repository.
● PyTorch Hub can fetch the model from the master branch on GitHub. This means that you
don’t need to package your model (pip) & users can always access the most recent version.
● Both model definitions and pre-trained weights can be shared
● More details: https://pytorch.org/hub and https://pytorch.org/docs/stable/hub.html
Transfer Learning in NLP: Concepts and Tools - Thomas Wolf - Slide 26
PyTorch Hub
Main limitations of Hubs
TensorFlow-Hub
● TensorFlow Hub is a library for sharing machine learning models as self-contained pieces of
TensorFlow graph with their weights and assets.
● Modules are automatically downloaded and cached when instantiated.
● Each time a module is called, it adds operations to the current TensorFlow graph.
● More details: https://tensorflow.org/hub
● No access to the source code of the model (black-box)
● Not possible to modify the internals of the model (e.g. to add Adapters)
HuggingFace library with Transformers 👾
We’ve built an opinionated library of pretrained models (Pytorch-transformers) for
NLP researchers and practitioners seeking to use/study/modify large-scale
pretrained transformers models such as BERT, GPT, GPT-2, XLNet, RoBERTa...
The library was designed with two strong principles in mind:
● be as easy to use and fast to on-board as possible:
○ almost no abstractions to learn: models, tokenizer and configuration,
○ a common from_pretrained() method takes care of downloading/caching/loading
classes from pretrained instances supplied in the library or user’s saved instances,
○ to build-upon the library, the user can use regular PyTorch modules.
● provide state-of-the-art models identical to the original models:
○ examples reproducing official results,
○ carefully drafted code as close as possible to the original computation graph.
Transfer Learning in NLP: Concepts and Tools - Thomas Wolf - Slide 27
Transfer Learning in NLP: Concepts and Tools - Thomas Wolf - Slide 28
Transfer Learning in NLP: Concepts and Tools - Thomas Wolf - Slide 29
Current Trends
30
Larger models
Transfer Learning in NLP: Concepts and Tools - Thomas Wolf - Slide 31
Numberofparametersofthemodel
(inmillions)
Larger models on larger datasets
Transfer Learning in NLP: Concepts and Tools - Thomas Wolf - Slide 32
Minimum amount of data is required to unlock the potential of Transfer Learning
Question: Did no one think of this before? Why did it only start in ‘18 (ELMo)?
J. Devlin’s Answer: Good results on pre-training is >1,000x to 100,000 more
expensive than supervised training.
○ E.g., 10x-100x bigger model trained for 100x-1,000x as many steps.
○ Imagine in 2013: well-tuned 2-layer, LSTM gets 80% accuracy on sentiment
analysis, training for 8 hours.
○ Pre-train large-scale language model on same architecture for a week, get +0.5%.
○ Reviewers: “Who would do something so expensive for such a small gain?”
Devlin et al.
Larger models on larger datasets
Transfer Learning in NLP: Concepts and Tools - Thomas Wolf - Slide 33
Diminishing returns of using more data/bigger models:
➭ For a linear gain in performance, an exponentially larger model is required.
Radford and Wu et al. Devlin et al. Hancock @ Fwdays’19
A trend for smaller models
Transfer Learning in NLP: Concepts and Tools - Thomas Wolf - Slide 34
And a lot of very fresh work which will be published around the end of the year:
Tsai et al., Turc et al., Tang et al., ...
Numberofparametersofthemodel
(inmillions)
Smaller models: Distillating large models
Transfer Learning in NLP: Concepts and Tools - Thomas Wolf - Slide 35
Training cost make headlines but as large-scale models reach production,
inference time will likely account for most of a model's total environmental cost.
Distilling larger models in smaller ones:
● reduce inference cost
● capitalize on the inductive biases learned by a large model.
95% of the performances of a model like Bert can be preserved in a distilled
model 40% smaller and 60% faster (our teams work on DistilBERT, open-sourced
in our pytorch-transformers library)
Smaller models: Distillation from large models
Transfer Learning in NLP: Concepts and Tools - Thomas Wolf - Slide 36
Distillation: two main tricks to train a student model from a teacher model:
1. Starting from high-quality weights initializations derived from the teacher
2. Training the student to mimic the full output distribution of the teacher
Limits, Open Questions
37
Shortcomings of pretrained language models
Large, pretrained language models can be difficult to optimize.
● Fine-tuning is often unstable and has a high variance, particularly if the
target datasets are very small. BERT large is prone to degenerate
performance; multiple random restarts can be necessary (Phang et al., 2018)
● Do we really need all these parameters?
● Recent work shows that only a few of the attention heads in BERT are
required (Voita et al., ACL 2019, Michel et al.).
● More work needed to understand model parameters.
● Pruning and distillation are two ways to deal with this.
● See also: the lottery ticket hypothesis (Frankle et al., ICLR 2019).
Transfer Learning in NLP: Concepts and Tools - Thomas Wolf - Slide 38
Shortcoming of language modeling in general
Transfer Learning in NLP: Concepts and Tools - Thomas Wolf - Slide 39
The most successful current pretraining methods are based on variants of
language modeling. But this have many shortcomings:
● Not appropriate for all models
○ If we condition on more inputs (video/sound), need to pretrain those parts
● Weak signal for semantics and long-term context vs. strong signal for
syntax and short-term word co-occurrences
● Pretrained language models are bad at
○ fine-grained linguistic tasks (Liu et al., NAACL 2019)
○ common sense (when you actually make it difficult; Zellers et al., ACL 2019);
coherent natural language generation
○ tend to overfit to surface form information when fine-tuned; ‘rapid surface
learners’
Shortcoming of language modeling in general
Need for grounded representations
● Limits of distributional hypothesis—difficult to learn certain types of
information from raw text
○ Human reporting bias: not stating the obvious (Gordon and Van Durme, 2013)
○ Common sense isn’t written down
○ Facts about named entities
○ No grounding to other modalities
● Possible solutions:
○ Incorporate structured knowledge (e.g. databases - ERNIE: Zhang et al 2019)
○ Multimodal learning (e.g. visual representations - VideoBERT: Sun et al. 2019)
○ Interactive/human-in-the-loop approaches (e.g. dialog: Hancock et al. 2018)
Transfer Learning in NLP: Concepts and Tools - Thomas Wolf - Slide 40
That’s all folks!
41

Más contenido relacionado

La actualidad más candente

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingMinh Pham
 
Seq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) modelSeq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) model佳蓉 倪
 
INTRODUCTION TO NLP, RNN, LSTM, GRU
INTRODUCTION TO NLP, RNN, LSTM, GRUINTRODUCTION TO NLP, RNN, LSTM, GRU
INTRODUCTION TO NLP, RNN, LSTM, GRUSri Geetha
 
LLaMA 2.pptx
LLaMA 2.pptxLLaMA 2.pptx
LLaMA 2.pptxRkRahul16
 
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"Fwdays
 
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...Databricks
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understandinggohyunwoong
 
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdfLLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdftaeseon ryu
 
An introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERTAn introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERTSuman Debnath
 
Natural language processing and transformer models
Natural language processing and transformer modelsNatural language processing and transformer models
Natural language processing and transformer modelsDing Li
 
Reinventing Deep Learning
 with Hugging Face Transformers
Reinventing Deep Learning
 with Hugging Face TransformersReinventing Deep Learning
 with Hugging Face Transformers
Reinventing Deep Learning
 with Hugging Face TransformersJulien SIMON
 
Recent trends in natural language processing
Recent trends in natural language processingRecent trends in natural language processing
Recent trends in natural language processingBalayogi G
 
BERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from TransformersBERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from TransformersLiangqun Lu
 
Customizing LLMs
Customizing LLMsCustomizing LLMs
Customizing LLMsJim Steele
 

La actualidad más candente (20)

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 
Bert
BertBert
Bert
 
BERT
BERTBERT
BERT
 
Seq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) modelSeq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) model
 
INTRODUCTION TO NLP, RNN, LSTM, GRU
INTRODUCTION TO NLP, RNN, LSTM, GRUINTRODUCTION TO NLP, RNN, LSTM, GRU
INTRODUCTION TO NLP, RNN, LSTM, GRU
 
LLaMA 2.pptx
LLaMA 2.pptxLLaMA 2.pptx
LLaMA 2.pptx
 
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
 
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 
[Paper review] BERT
[Paper review] BERT[Paper review] BERT
[Paper review] BERT
 
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdfLLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
 
An introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERTAn introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERT
 
Natural language processing and transformer models
Natural language processing and transformer modelsNatural language processing and transformer models
Natural language processing and transformer models
 
Reinventing Deep Learning
 with Hugging Face Transformers
Reinventing Deep Learning
 with Hugging Face TransformersReinventing Deep Learning
 with Hugging Face Transformers
Reinventing Deep Learning
 with Hugging Face Transformers
 
Word embedding
Word embedding Word embedding
Word embedding
 
Recent trends in natural language processing
Recent trends in natural language processingRecent trends in natural language processing
Recent trends in natural language processing
 
Transformers AI PPT.pptx
Transformers AI PPT.pptxTransformers AI PPT.pptx
Transformers AI PPT.pptx
 
BERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from TransformersBERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from Transformers
 
Customizing LLMs
Customizing LLMsCustomizing LLMs
Customizing LLMs
 
gpt3_presentation.pdf
gpt3_presentation.pdfgpt3_presentation.pdf
gpt3_presentation.pdf
 

Similar a Transfer Learning in NLP: Concepts and Tools

Dealing with Data Scarcity in Natural Language Processing - Belgium NLP Meetup
Dealing with Data Scarcity in Natural Language Processing - Belgium NLP MeetupDealing with Data Scarcity in Natural Language Processing - Belgium NLP Meetup
Dealing with Data Scarcity in Natural Language Processing - Belgium NLP MeetupYves Peirsman
 
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...Dr. Haxel Consult
 
Nlp research presentation
Nlp research presentationNlp research presentation
Nlp research presentationSurya Sg
 
ODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPindico data
 
Benchmarking transfer learning approaches for NLP
Benchmarking transfer learning approaches for NLPBenchmarking transfer learning approaches for NLP
Benchmarking transfer learning approaches for NLPYury Kashnitsky
 
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019Anoop Deoras
 
Trustworthy Generative AI_ ICML'23 Tutorial.pptx
Trustworthy Generative AI_ ICML'23 Tutorial.pptxTrustworthy Generative AI_ ICML'23 Tutorial.pptx
Trustworthy Generative AI_ ICML'23 Tutorial.pptxsylvioneto11
 
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15MLconf
 
10 more lessons learned from building Machine Learning systems - MLConf
10 more lessons learned from building Machine Learning systems - MLConf10 more lessons learned from building Machine Learning systems - MLConf
10 more lessons learned from building Machine Learning systems - MLConfXavier Amatriain
 
10 more lessons learned from building Machine Learning systems
10 more lessons learned from building Machine Learning systems10 more lessons learned from building Machine Learning systems
10 more lessons learned from building Machine Learning systemsXavier Amatriain
 
#1 Berlin Students in AI, Machine Learning & NLP presentation
#1 Berlin Students in AI, Machine Learning & NLP presentation#1 Berlin Students in AI, Machine Learning & NLP presentation
#1 Berlin Students in AI, Machine Learning & NLP presentationparlamind
 
Natural Language Processing - Research and Application Trends
Natural Language Processing - Research and Application TrendsNatural Language Processing - Research and Application Trends
Natural Language Processing - Research and Application TrendsShreyas Suresh Rao
 
Local Applications of Large Language Models based on RAG.pptx
Local Applications of Large Language Models based on RAG.pptxLocal Applications of Large Language Models based on RAG.pptx
Local Applications of Large Language Models based on RAG.pptxlwz614595250
 
Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...alessio_ferrari
 
Natural Language Generation / Stanford cs224n 2019w lecture 15 Review
Natural Language Generation / Stanford cs224n 2019w lecture 15 ReviewNatural Language Generation / Stanford cs224n 2019w lecture 15 Review
Natural Language Generation / Stanford cs224n 2019w lecture 15 Reviewchangedaeoh
 
Transfer Learning in NLP: A Survey
Transfer Learning in NLP: A SurveyTransfer Learning in NLP: A Survey
Transfer Learning in NLP: A SurveyNUPUR YADAV
 
Learning to Translate with Joey NMT
Learning to Translate with Joey NMTLearning to Translate with Joey NMT
Learning to Translate with Joey NMTJulia Kreutzer
 
QLoRA Fine-Tuning on Cassandra Link Data Set (1/2) Cassandra Lunch 137
QLoRA Fine-Tuning on Cassandra Link Data Set (1/2) Cassandra Lunch 137QLoRA Fine-Tuning on Cassandra Link Data Set (1/2) Cassandra Lunch 137
QLoRA Fine-Tuning on Cassandra Link Data Set (1/2) Cassandra Lunch 137Anant Corporation
 
Transfer_Learning_for_Natural_Language_P_v3_MEAP.pdf
Transfer_Learning_for_Natural_Language_P_v3_MEAP.pdfTransfer_Learning_for_Natural_Language_P_v3_MEAP.pdf
Transfer_Learning_for_Natural_Language_P_v3_MEAP.pdforanisalcani
 

Similar a Transfer Learning in NLP: Concepts and Tools (20)

Dealing with Data Scarcity in Natural Language Processing - Belgium NLP Meetup
Dealing with Data Scarcity in Natural Language Processing - Belgium NLP MeetupDealing with Data Scarcity in Natural Language Processing - Belgium NLP Meetup
Dealing with Data Scarcity in Natural Language Processing - Belgium NLP Meetup
 
Cd project
Cd projectCd project
Cd project
 
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
 
Nlp research presentation
Nlp research presentationNlp research presentation
Nlp research presentation
 
ODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLP
 
Benchmarking transfer learning approaches for NLP
Benchmarking transfer learning approaches for NLPBenchmarking transfer learning approaches for NLP
Benchmarking transfer learning approaches for NLP
 
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
 
Trustworthy Generative AI_ ICML'23 Tutorial.pptx
Trustworthy Generative AI_ ICML'23 Tutorial.pptxTrustworthy Generative AI_ ICML'23 Tutorial.pptx
Trustworthy Generative AI_ ICML'23 Tutorial.pptx
 
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
 
10 more lessons learned from building Machine Learning systems - MLConf
10 more lessons learned from building Machine Learning systems - MLConf10 more lessons learned from building Machine Learning systems - MLConf
10 more lessons learned from building Machine Learning systems - MLConf
 
10 more lessons learned from building Machine Learning systems
10 more lessons learned from building Machine Learning systems10 more lessons learned from building Machine Learning systems
10 more lessons learned from building Machine Learning systems
 
#1 Berlin Students in AI, Machine Learning & NLP presentation
#1 Berlin Students in AI, Machine Learning & NLP presentation#1 Berlin Students in AI, Machine Learning & NLP presentation
#1 Berlin Students in AI, Machine Learning & NLP presentation
 
Natural Language Processing - Research and Application Trends
Natural Language Processing - Research and Application TrendsNatural Language Processing - Research and Application Trends
Natural Language Processing - Research and Application Trends
 
Local Applications of Large Language Models based on RAG.pptx
Local Applications of Large Language Models based on RAG.pptxLocal Applications of Large Language Models based on RAG.pptx
Local Applications of Large Language Models based on RAG.pptx
 
Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...
 
Natural Language Generation / Stanford cs224n 2019w lecture 15 Review
Natural Language Generation / Stanford cs224n 2019w lecture 15 ReviewNatural Language Generation / Stanford cs224n 2019w lecture 15 Review
Natural Language Generation / Stanford cs224n 2019w lecture 15 Review
 
Transfer Learning in NLP: A Survey
Transfer Learning in NLP: A SurveyTransfer Learning in NLP: A Survey
Transfer Learning in NLP: A Survey
 
Learning to Translate with Joey NMT
Learning to Translate with Joey NMTLearning to Translate with Joey NMT
Learning to Translate with Joey NMT
 
QLoRA Fine-Tuning on Cassandra Link Data Set (1/2) Cassandra Lunch 137
QLoRA Fine-Tuning on Cassandra Link Data Set (1/2) Cassandra Lunch 137QLoRA Fine-Tuning on Cassandra Link Data Set (1/2) Cassandra Lunch 137
QLoRA Fine-Tuning on Cassandra Link Data Set (1/2) Cassandra Lunch 137
 
Transfer_Learning_for_Natural_Language_P_v3_MEAP.pdf
Transfer_Learning_for_Natural_Language_P_v3_MEAP.pdfTransfer_Learning_for_Natural_Language_P_v3_MEAP.pdf
Transfer_Learning_for_Natural_Language_P_v3_MEAP.pdf
 

Más de Fwdays

"How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y...
"How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y..."How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y...
"How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y...Fwdays
 
"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii
"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii
"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil TopchiiFwdays
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
"What is a RAG system and how to build it",Dmytro Spodarets
"What is a RAG system and how to build it",Dmytro Spodarets"What is a RAG system and how to build it",Dmytro Spodarets
"What is a RAG system and how to build it",Dmytro SpodaretsFwdays
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
"Distributed graphs and microservices in Prom.ua", Maksym Kindritskyi
"Distributed graphs and microservices in Prom.ua",  Maksym Kindritskyi"Distributed graphs and microservices in Prom.ua",  Maksym Kindritskyi
"Distributed graphs and microservices in Prom.ua", Maksym KindritskyiFwdays
 
"Rethinking the existing data loading and processing process as an ETL exampl...
"Rethinking the existing data loading and processing process as an ETL exampl..."Rethinking the existing data loading and processing process as an ETL exampl...
"Rethinking the existing data loading and processing process as an ETL exampl...Fwdays
 
"How Ukrainian IT specialist can go on vacation abroad without crossing the T...
"How Ukrainian IT specialist can go on vacation abroad without crossing the T..."How Ukrainian IT specialist can go on vacation abroad without crossing the T...
"How Ukrainian IT specialist can go on vacation abroad without crossing the T...Fwdays
 
"The Strength of Being Vulnerable: the experience from CIA, Tesla and Uber", ...
"The Strength of Being Vulnerable: the experience from CIA, Tesla and Uber", ..."The Strength of Being Vulnerable: the experience from CIA, Tesla and Uber", ...
"The Strength of Being Vulnerable: the experience from CIA, Tesla and Uber", ...Fwdays
 
"[QUICK TALK] Radical candor: how to achieve results faster thanks to a cultu...
"[QUICK TALK] Radical candor: how to achieve results faster thanks to a cultu..."[QUICK TALK] Radical candor: how to achieve results faster thanks to a cultu...
"[QUICK TALK] Radical candor: how to achieve results faster thanks to a cultu...Fwdays
 
"[QUICK TALK] PDP Plan, the only one door to raise your salary and boost care...
"[QUICK TALK] PDP Plan, the only one door to raise your salary and boost care..."[QUICK TALK] PDP Plan, the only one door to raise your salary and boost care...
"[QUICK TALK] PDP Plan, the only one door to raise your salary and boost care...Fwdays
 
"4 horsemen of the apocalypse of working relationships (+ antidotes to them)"...
"4 horsemen of the apocalypse of working relationships (+ antidotes to them)"..."4 horsemen of the apocalypse of working relationships (+ antidotes to them)"...
"4 horsemen of the apocalypse of working relationships (+ antidotes to them)"...Fwdays
 
"Reconnecting with Purpose: Rediscovering Job Interest after Burnout", Anast...
"Reconnecting with Purpose: Rediscovering Job Interest after Burnout",  Anast..."Reconnecting with Purpose: Rediscovering Job Interest after Burnout",  Anast...
"Reconnecting with Purpose: Rediscovering Job Interest after Burnout", Anast...Fwdays
 
"Mentoring 101: How to effectively invest experience in the success of others...
"Mentoring 101: How to effectively invest experience in the success of others..."Mentoring 101: How to effectively invest experience in the success of others...
"Mentoring 101: How to effectively invest experience in the success of others...Fwdays
 
"Mission (im) possible: How to get an offer in 2024?", Oleksandra Myronova
"Mission (im) possible: How to get an offer in 2024?",  Oleksandra Myronova"Mission (im) possible: How to get an offer in 2024?",  Oleksandra Myronova
"Mission (im) possible: How to get an offer in 2024?", Oleksandra MyronovaFwdays
 
"Why have we learned how to package products, but not how to 'package ourselv...
"Why have we learned how to package products, but not how to 'package ourselv..."Why have we learned how to package products, but not how to 'package ourselv...
"Why have we learned how to package products, but not how to 'package ourselv...Fwdays
 
"How to tame the dragon, or leadership with imposter syndrome", Oleksandr Zin...
"How to tame the dragon, or leadership with imposter syndrome", Oleksandr Zin..."How to tame the dragon, or leadership with imposter syndrome", Oleksandr Zin...
"How to tame the dragon, or leadership with imposter syndrome", Oleksandr Zin...Fwdays
 

Más de Fwdays (20)

"How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y...
"How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y..."How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y...
"How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y...
 
"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii
"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii
"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
"What is a RAG system and how to build it",Dmytro Spodarets
"What is a RAG system and how to build it",Dmytro Spodarets"What is a RAG system and how to build it",Dmytro Spodarets
"What is a RAG system and how to build it",Dmytro Spodarets
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
"Distributed graphs and microservices in Prom.ua", Maksym Kindritskyi
"Distributed graphs and microservices in Prom.ua",  Maksym Kindritskyi"Distributed graphs and microservices in Prom.ua",  Maksym Kindritskyi
"Distributed graphs and microservices in Prom.ua", Maksym Kindritskyi
 
"Rethinking the existing data loading and processing process as an ETL exampl...
"Rethinking the existing data loading and processing process as an ETL exampl..."Rethinking the existing data loading and processing process as an ETL exampl...
"Rethinking the existing data loading and processing process as an ETL exampl...
 
"How Ukrainian IT specialist can go on vacation abroad without crossing the T...
"How Ukrainian IT specialist can go on vacation abroad without crossing the T..."How Ukrainian IT specialist can go on vacation abroad without crossing the T...
"How Ukrainian IT specialist can go on vacation abroad without crossing the T...
 
"The Strength of Being Vulnerable: the experience from CIA, Tesla and Uber", ...
"The Strength of Being Vulnerable: the experience from CIA, Tesla and Uber", ..."The Strength of Being Vulnerable: the experience from CIA, Tesla and Uber", ...
"The Strength of Being Vulnerable: the experience from CIA, Tesla and Uber", ...
 
"[QUICK TALK] Radical candor: how to achieve results faster thanks to a cultu...
"[QUICK TALK] Radical candor: how to achieve results faster thanks to a cultu..."[QUICK TALK] Radical candor: how to achieve results faster thanks to a cultu...
"[QUICK TALK] Radical candor: how to achieve results faster thanks to a cultu...
 
"[QUICK TALK] PDP Plan, the only one door to raise your salary and boost care...
"[QUICK TALK] PDP Plan, the only one door to raise your salary and boost care..."[QUICK TALK] PDP Plan, the only one door to raise your salary and boost care...
"[QUICK TALK] PDP Plan, the only one door to raise your salary and boost care...
 
"4 horsemen of the apocalypse of working relationships (+ antidotes to them)"...
"4 horsemen of the apocalypse of working relationships (+ antidotes to them)"..."4 horsemen of the apocalypse of working relationships (+ antidotes to them)"...
"4 horsemen of the apocalypse of working relationships (+ antidotes to them)"...
 
"Reconnecting with Purpose: Rediscovering Job Interest after Burnout", Anast...
"Reconnecting with Purpose: Rediscovering Job Interest after Burnout",  Anast..."Reconnecting with Purpose: Rediscovering Job Interest after Burnout",  Anast...
"Reconnecting with Purpose: Rediscovering Job Interest after Burnout", Anast...
 
"Mentoring 101: How to effectively invest experience in the success of others...
"Mentoring 101: How to effectively invest experience in the success of others..."Mentoring 101: How to effectively invest experience in the success of others...
"Mentoring 101: How to effectively invest experience in the success of others...
 
"Mission (im) possible: How to get an offer in 2024?", Oleksandra Myronova
"Mission (im) possible: How to get an offer in 2024?",  Oleksandra Myronova"Mission (im) possible: How to get an offer in 2024?",  Oleksandra Myronova
"Mission (im) possible: How to get an offer in 2024?", Oleksandra Myronova
 
"Why have we learned how to package products, but not how to 'package ourselv...
"Why have we learned how to package products, but not how to 'package ourselv..."Why have we learned how to package products, but not how to 'package ourselv...
"Why have we learned how to package products, but not how to 'package ourselv...
 
"How to tame the dragon, or leadership with imposter syndrome", Oleksandr Zin...
"How to tame the dragon, or leadership with imposter syndrome", Oleksandr Zin..."How to tame the dragon, or leadership with imposter syndrome", Oleksandr Zin...
"How to tame the dragon, or leadership with imposter syndrome", Oleksandr Zin...
 

Último

Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 

Último (20)

Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 

Transfer Learning in NLP: Concepts and Tools

  • 1. Transfer Learning in NLP: Concepts and Tools Thomas Wolf HuggingFace Inc.
  • 2. Overview ● Concepts and History ● Anatomy of a State-of-the-art Model ● Open source tools ● Current Trends ● Limits and Open Questions Sebastian Ruder Matthew Peters Swabha Swayamdipta Some slides are adapted from our NAACL 2019 Tutorial on Transfer Learning in NLP with my collaborators 👈 Slides: http://tiny.cc/NAACLTransfer Transfer Learning in NLP: Concepts and Tools - Thomas Wolf - Slide 4
  • 4. What is Transfer Learning? Pan and Yang (2010) Transfer Learning in NLP: Concepts and Tools - Thomas Wolf - Slide 6
  • 5. Why Transfer Learning in NLP? (intuitively) Why should Transfer Learning work in NLP? ● Many NLP tasks share common knowledge about language (linguistic representations, structural similarities...) ● Tasks can inform each other—e.g. syntax and semantics ● Annotated data is rare, make use of as much supervision as available. ● Unlabelled data is super abundant (internet), should try to use it Empirically, transfer learning has resulted in SOTA for many supervised NLP tasks (e.g. classification, information extraction, Q&A, etc) Transfer Learning in NLP: Concepts and Tools - Thomas Wolf - Slide 7
  • 6. Why Transfer Learning in NLP? (empirically) Transfer Learning in NLP: Concepts and Tools - Thomas Wolf - Slide 8 Performance on Named Entity Recognition (NER) on CoNLL-2003 (English) over time
  • 7. Transfer Learning in NLP: Concepts and Tools - Thomas Wolf - Slide 9 Ruder (2019) We will focus on this Types of transfer learning in NLP
  • 8. Training: Sequential Transfer Learning Learn on one task / dataset, then transfer to another task / dataset word2vec GloVe skip-thought InferSent ELMo ULMFiT GPT BERT classification sequence labeling Q&A .... Pretraining Adaptation Transfer Learning in NLP: Concepts and Tools - Thomas Wolf - Slide 10
  • 9. [-0.4, 0.9, …] History Word vectors cats = [0.2, -0.3, …] dogs = [0.4, -0.5, …] Sentence/doc vectors It’s raining cats and dogs. We have two cats. [0.8, 0.9, …] [-1.2, 0.0, …] } } Word-in-context vectors We have two cats. } [1.2, -0.3, …] It’s raining cats and dogs. } Transfer Learning in NLP: Concepts and Tools - Thomas Wolf - Slide 11
  • 10. History: The rise of language modeling Many currently successful pretraining approaches are based on language modeling, i.e. learning to predict: ● empirical probability of text: Pϴ(text) ● empirical conditional probability of text (e.g. translation): Pϴ(text | other text) Advantages: ● Doesn’t require human annotation ● Many languages have enough text to learn high capacity model ● Versatile—can learn both sentence or word representations with a variety of objective functions (autoregressive language modeling, masked language modeling, span prediction, skip-thoughts, cross-view training….) Transfer Learning in NLP: Concepts and Tools - Thomas Wolf - Slide 12
  • 11. ● Language modeling is a very difficult task, even for humans. ● Language models are expected to compress any possible context into a vector that generalizes over possible completions. ○ E.g. “I think this is the beginning of a beautiful ???” Transfer Learning in NLP: Concepts and Tools - Thomas Wolf - Slide 13 History: The rise of language modeling ● To have any chance at solving this task, a model is forced to learn syntax, semantics, encode facts about the world, etc. ● Given enough data and compute, a big model can do a reasonable job!
  • 12. Anatomy of a State-of-the-Art Transfer Learning Model 14
  • 13. A State-of-the-Art Transfer Learning Model Transfer Learning in NLP: Concepts and Tools - Thomas Wolf - Slide 15 Two essential components: model & training ● The model: pre-training architecture and adaptations for fine-tuning ○ Current large architectures are mostly based on Transformers (but ULMFiT) ○ Unclear advantages of smart architecture (XLNet) versus more data (RoBERTa) ○ Trend toward larger models: XLM (664M), GPT-2 (1.5B), Megatron-LM (8.5 B) ● The training: pre-training and adaptation phases ○ Learning long term dependencies => long stream of continuous text (books, wiki) ○ Toward using more data in both phases RoBERTa (160GB) MT-DNN (WNLI) ○ Quality of the data is important
  • 14. Pretrained model Adaptation Head Tokenizer Model: Using a typical Transfer Learning model Transfer Learning in NLP: Concepts and Tools - Thomas Wolf - Slide 16 Jim Henson was a puppeteer Jim Henson was a puppet ##eer Tokenization 11067 5567 245 120 7756 9908 1.2 2.7 0.6 -0.2 3.7 9.1 -2.1 3.1 1.5 -4.7 2.4 6.7 6.1 2.4 7.3 -0.6 -3.1 2.5 1.9 -0.1 0.7 2.1 4.2 -3.1 Classifier model Convert to vocabulary indices Pretrained model True 0.7886 False -0.223
  • 15. Model: From shallow to deep Devlin et al 2019: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding 1 layer 24 layers Transfer Learning in NLP: Concepts and Tools - Thomas Wolf - Slide 17 Bengio et al 2003: A Neural Probabilistic Language Model
  • 16. BERT is pretrained for both sentence and contextual word representations, using masked language modeling and next sentence prediction. ● Pretrained model: BERT-large has 340M parameters, 24 layers ● Adaptation head: just a linear layer on top of the representation output by the pretrained model. Model: the example of BERT Transfer Learning in NLP: Concepts and Tools - Thomas Wolf - Slide 18 See also: Logeswaran and Lee, ICLR 2018, (Devlin et al. 2019)
  • 17. Large-scale transformer architectures (GPT-2, BERT, XLM…) are very similar to each other and consist of: ● summing words and position embeddings ● applying a succession of transformer blocks with: ○ layer normalisation ○ a self-attention module ○ dropout and a residual connection ○ another layer normalisation ○ a feed-forward module with one hidden layer and a non linearity: Linear ⇨ ReLU/gelu ⇨ Linear ○ dropout and a residual connection Model: Inside BERT, GPT-2, XLNet, RoBERTa Main differences between BERT, GPT-2, XLNet: the pretraining objective ● causal language modeling for GPT ● masked language modeling for BERT (+ next sentence prediction) Transfer Learning in NLP: Concepts and Tools - Thomas Wolf - Slide 19 (Child et al, 2019)
  • 18. Model: Adapting for target task Transfer Learning in NLP: Concepts and Tools - Thomas Wolf - Slide 20 General workflow: 1. Remove pretraining task head if not useful for target task E.g. remove softmax classifier 2. Add target task-specific layers on top/bottom of pretrained model Simple: adding linear layer(s) on top of the pretrained model More complex: model output as input for a separate model Sometimes more complex: Adapting to a structurally different task Ex: Pretraining with a single input sequence and adapting to a task with several input sequences (ex: translation, conditional generation...) ➯ Use pretrained model to initialize as much as possible of target model ➯ Ramachandran et al., EMNLP 2017; Lample & Conneau, 2019
  • 19. Training: Adaptation on a text classification task Transfer Learning in NLP: Concepts and Tools - Thomas Wolf - Slide 21 Replace the pretraining head with a classification head: a linear layer, which takes as input the hidden-state of a token Keep our pretrained model unchanged as the backbone. Initialization of the model: ● Initialize the weights of the model (in particular the added parameters) ● Reload common weights from the pretrained model.
  • 20. Training: Adaptation on a text classification task Transfer Learning in NLP: Concepts and Tools - Thomas Wolf - Slide 22 We are at the state-of-the-art (ULMFiT) Remarks: ❏ The error rate goes down quickly! After one epoch we already have >90% accuracy. ⇨ Fine-tuning is highly data efficient in Transfer Learning ❏ We took our pre-training & fine-tuning hyper-parameters straight from the literature on related models. ⇨ Fine-tuning is often robust to the exact choice of hyper-parameters
  • 21. Training: Adaptation on a text classification task Transfer Learning in NLP: Concepts and Tools - Thomas Wolf - Slide 23 A few words on robustness & variance. ❏ Large pretrained models (e.g. BERT large) are prone to degenerate performance when fine-tuned on tasks with small training sets. ❏ Observed behavior is often “on-off”: it either works very well or doesn’t work at all. ❏ Understanding the conditions and causes of this behavior (models, adaptation schemes) is an open research question. Phang et al., 2018 23
  • 23. Open-sourcing: practical considerations ● Pretraining large-scale models is costly Use open-source models Share your pretrained models “Energy and Policy Considerations for Deep Learning in NLP” - Strubell, Ganesh, McCallum - ACL 2019 ● Sharing/accessing pretrained models ○ Hubs: Tensorflow Hub, PyTorch Hub ○ Author released checkpoints: ex BERT, GPT... ○ Third-party libraries: AllenNLP, fast.ai, HuggingFace ● Design considerations ○ Hubs/libraries: ■ Simple to use but can be difficult to modify model internal architecture ○ Author released checkpoints: ■ More difficult to use but you have full control over the model internals Transfer Learning in NLP: Concepts and Tools - Thomas Wolf - Slide 25
  • 24. ● Based on GitHub repositories, a model is shared by adding a file to the GitHub repository. ● PyTorch Hub can fetch the model from the master branch on GitHub. This means that you don’t need to package your model (pip) & users can always access the most recent version. ● Both model definitions and pre-trained weights can be shared ● More details: https://pytorch.org/hub and https://pytorch.org/docs/stable/hub.html Transfer Learning in NLP: Concepts and Tools - Thomas Wolf - Slide 26 PyTorch Hub Main limitations of Hubs TensorFlow-Hub ● TensorFlow Hub is a library for sharing machine learning models as self-contained pieces of TensorFlow graph with their weights and assets. ● Modules are automatically downloaded and cached when instantiated. ● Each time a module is called, it adds operations to the current TensorFlow graph. ● More details: https://tensorflow.org/hub ● No access to the source code of the model (black-box) ● Not possible to modify the internals of the model (e.g. to add Adapters)
  • 25. HuggingFace library with Transformers 👾 We’ve built an opinionated library of pretrained models (Pytorch-transformers) for NLP researchers and practitioners seeking to use/study/modify large-scale pretrained transformers models such as BERT, GPT, GPT-2, XLNet, RoBERTa... The library was designed with two strong principles in mind: ● be as easy to use and fast to on-board as possible: ○ almost no abstractions to learn: models, tokenizer and configuration, ○ a common from_pretrained() method takes care of downloading/caching/loading classes from pretrained instances supplied in the library or user’s saved instances, ○ to build-upon the library, the user can use regular PyTorch modules. ● provide state-of-the-art models identical to the original models: ○ examples reproducing official results, ○ carefully drafted code as close as possible to the original computation graph. Transfer Learning in NLP: Concepts and Tools - Thomas Wolf - Slide 27
  • 26. Transfer Learning in NLP: Concepts and Tools - Thomas Wolf - Slide 28
  • 27. Transfer Learning in NLP: Concepts and Tools - Thomas Wolf - Slide 29
  • 29. Larger models Transfer Learning in NLP: Concepts and Tools - Thomas Wolf - Slide 31 Numberofparametersofthemodel (inmillions)
  • 30. Larger models on larger datasets Transfer Learning in NLP: Concepts and Tools - Thomas Wolf - Slide 32 Minimum amount of data is required to unlock the potential of Transfer Learning Question: Did no one think of this before? Why did it only start in ‘18 (ELMo)? J. Devlin’s Answer: Good results on pre-training is >1,000x to 100,000 more expensive than supervised training. ○ E.g., 10x-100x bigger model trained for 100x-1,000x as many steps. ○ Imagine in 2013: well-tuned 2-layer, LSTM gets 80% accuracy on sentiment analysis, training for 8 hours. ○ Pre-train large-scale language model on same architecture for a week, get +0.5%. ○ Reviewers: “Who would do something so expensive for such a small gain?” Devlin et al.
  • 31. Larger models on larger datasets Transfer Learning in NLP: Concepts and Tools - Thomas Wolf - Slide 33 Diminishing returns of using more data/bigger models: ➭ For a linear gain in performance, an exponentially larger model is required. Radford and Wu et al. Devlin et al. Hancock @ Fwdays’19
  • 32. A trend for smaller models Transfer Learning in NLP: Concepts and Tools - Thomas Wolf - Slide 34 And a lot of very fresh work which will be published around the end of the year: Tsai et al., Turc et al., Tang et al., ... Numberofparametersofthemodel (inmillions)
  • 33. Smaller models: Distillating large models Transfer Learning in NLP: Concepts and Tools - Thomas Wolf - Slide 35 Training cost make headlines but as large-scale models reach production, inference time will likely account for most of a model's total environmental cost. Distilling larger models in smaller ones: ● reduce inference cost ● capitalize on the inductive biases learned by a large model. 95% of the performances of a model like Bert can be preserved in a distilled model 40% smaller and 60% faster (our teams work on DistilBERT, open-sourced in our pytorch-transformers library)
  • 34. Smaller models: Distillation from large models Transfer Learning in NLP: Concepts and Tools - Thomas Wolf - Slide 36 Distillation: two main tricks to train a student model from a teacher model: 1. Starting from high-quality weights initializations derived from the teacher 2. Training the student to mimic the full output distribution of the teacher
  • 36. Shortcomings of pretrained language models Large, pretrained language models can be difficult to optimize. ● Fine-tuning is often unstable and has a high variance, particularly if the target datasets are very small. BERT large is prone to degenerate performance; multiple random restarts can be necessary (Phang et al., 2018) ● Do we really need all these parameters? ● Recent work shows that only a few of the attention heads in BERT are required (Voita et al., ACL 2019, Michel et al.). ● More work needed to understand model parameters. ● Pruning and distillation are two ways to deal with this. ● See also: the lottery ticket hypothesis (Frankle et al., ICLR 2019). Transfer Learning in NLP: Concepts and Tools - Thomas Wolf - Slide 38
  • 37. Shortcoming of language modeling in general Transfer Learning in NLP: Concepts and Tools - Thomas Wolf - Slide 39 The most successful current pretraining methods are based on variants of language modeling. But this have many shortcomings: ● Not appropriate for all models ○ If we condition on more inputs (video/sound), need to pretrain those parts ● Weak signal for semantics and long-term context vs. strong signal for syntax and short-term word co-occurrences ● Pretrained language models are bad at ○ fine-grained linguistic tasks (Liu et al., NAACL 2019) ○ common sense (when you actually make it difficult; Zellers et al., ACL 2019); coherent natural language generation ○ tend to overfit to surface form information when fine-tuned; ‘rapid surface learners’
  • 38. Shortcoming of language modeling in general Need for grounded representations ● Limits of distributional hypothesis—difficult to learn certain types of information from raw text ○ Human reporting bias: not stating the obvious (Gordon and Van Durme, 2013) ○ Common sense isn’t written down ○ Facts about named entities ○ No grounding to other modalities ● Possible solutions: ○ Incorporate structured knowledge (e.g. databases - ERNIE: Zhang et al 2019) ○ Multimodal learning (e.g. visual representations - VideoBERT: Sun et al. 2019) ○ Interactive/human-in-the-loop approaches (e.g. dialog: Hancock et al. 2018) Transfer Learning in NLP: Concepts and Tools - Thomas Wolf - Slide 40