SlideShare una empresa de Scribd logo
1 de 25
Pre-trained Language
model - from ELMO to GPT-2
Jiwon Kim
jiwon.kim@dpschool.io
Why ‘pre-trained language model’?
● Achieved great performance on a variety of language
tasks.
● similar to how ImageNet classification pre-training helps
many vision tasks (*)
● Even better than CV tasks, it does not require labeled data
for pre-training.
(*) Although recently He et al. (2018) found that pre-training might not be necessary for image segmentation task.
The problem of
previous
embeddings
feat., Gensim, NLTK, Scikit-learn
Then, what’s difference between
“I’m eating an apple”
and
“I have an Apple pencil”?
The gradient - Sebastian Ruder
Embeddings Dependent on Context
ELMO
ULMFiT
BERT
GPT-2
ELMo: Deep contextualized word
representations
AllenNLP
ELMO - core structure
ELMO - code
feat. tensorflow hub
ULMFiT: Universal Language Model
Fine-tuning for Text Classification
fast.ai
ULMFiT - core structure
Transfer-learning!
ULMFiT
ULMFiT
feat. fastai
BERT: Pre-training of Deep Bidirectional
Transformers for Language Understanding
Google
BERT - core structure
BERT - code
BERT - code
feat. huggingface
“You know, we could just
throw more GPUs and data at
it.”
GPT-2: "Language Models are
Unsupervised Multitask Learners"
OpenAI
GPT-2 - core structure
GPT-2 - code
A text generation sample from OpenAI’s GPT-2 language model
Comparing the Model
What tasks can we do with these models?
1 - tricky to implement and not good accuracy
SQuAD NER SRL Sentiment Coref Text Generation
ELMO 4 3 3 3 3 2
ULMFiT 1 2 2 5 2 4
BERT 3 4 4 4 4 3
GPT-2 5 1 1 2 1 5
5 - possible and easy

Más contenido relacionado

La actualidad más candente

Data Con LA 2022 - Transformers for NLP
Data Con LA 2022 - Transformers for NLPData Con LA 2022 - Transformers for NLP
Data Con LA 2022 - Transformers for NLP
Data Con LA
 

La actualidad más candente (20)

Fine tuning large LMs
Fine tuning large LMsFine tuning large LMs
Fine tuning large LMs
 
Bert
BertBert
Bert
 
[Paper review] BERT
[Paper review] BERT[Paper review] BERT
[Paper review] BERT
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 
Bert.pptx
Bert.pptxBert.pptx
Bert.pptx
 
NLP using transformers
NLP using transformers NLP using transformers
NLP using transformers
 
OpenAI’s GPT 3 Language Model - guest Steve Omohundro
OpenAI’s GPT 3 Language Model - guest Steve OmohundroOpenAI’s GPT 3 Language Model - guest Steve Omohundro
OpenAI’s GPT 3 Language Model - guest Steve Omohundro
 
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
 
Fine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP modelsFine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP models
 
GPT : Generative Pre-Training Model
GPT : Generative Pre-Training ModelGPT : Generative Pre-Training Model
GPT : Generative Pre-Training Model
 
Customizing LLMs
Customizing LLMsCustomizing LLMs
Customizing LLMs
 
(Paper Seminar detailed version) BART: Denoising Sequence-to-Sequence Pre-tra...
(Paper Seminar detailed version) BART: Denoising Sequence-to-Sequence Pre-tra...(Paper Seminar detailed version) BART: Denoising Sequence-to-Sequence Pre-tra...
(Paper Seminar detailed version) BART: Denoising Sequence-to-Sequence Pre-tra...
 
BERT
BERTBERT
BERT
 
Data Con LA 2022 - Transformers for NLP
Data Con LA 2022 - Transformers for NLPData Con LA 2022 - Transformers for NLP
Data Con LA 2022 - Transformers for NLP
 
Deep learning for NLP and Transformer
 Deep learning for NLP  and Transformer Deep learning for NLP  and Transformer
Deep learning for NLP and Transformer
 
1909 BERT: why-and-how (CODE SEMINAR)
1909 BERT: why-and-how (CODE SEMINAR)1909 BERT: why-and-how (CODE SEMINAR)
1909 BERT: why-and-how (CODE SEMINAR)
 
[Paper] attention mechanism(luong)
[Paper] attention mechanism(luong)[Paper] attention mechanism(luong)
[Paper] attention mechanism(luong)
 
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
 
GPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask LearnersGPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask Learners
 
BERT Finetuning Webinar Presentation
BERT Finetuning Webinar PresentationBERT Finetuning Webinar Presentation
BERT Finetuning Webinar Presentation
 

Similar a Pre trained language model

BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...
BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...
BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...
Kyuri Kim
 
Applying Aspect-Extended UML Modelling to 'e'
Applying Aspect-Extended UML Modelling to 'e'Applying Aspect-Extended UML Modelling to 'e'
Applying Aspect-Extended UML Modelling to 'e'
DVClub
 
Darren galpin q4_2008_bristol
Darren galpin q4_2008_bristolDarren galpin q4_2008_bristol
Darren galpin q4_2008_bristol
Obsidian Software
 
Prasad Rompalli latest Resume
Prasad Rompalli latest ResumePrasad Rompalli latest Resume
Prasad Rompalli latest Resume
Rsv Prasad
 
BERT Explained_ State of the art language model for NLP.pdf
BERT Explained_ State of the art language model for NLP.pdfBERT Explained_ State of the art language model for NLP.pdf
BERT Explained_ State of the art language model for NLP.pdf
sudeshnakundu10
 

Similar a Pre trained language model (20)

BERT
BERTBERT
BERT
 
The NLP Muppets revolution!
The NLP Muppets revolution!The NLP Muppets revolution!
The NLP Muppets revolution!
 
How to build a GPT model.pdf
How to build a GPT model.pdfHow to build a GPT model.pdf
How to build a GPT model.pdf
 
Introduction to Large Language Models and the Transformer Architecture.pdf
Introduction to Large Language Models and the Transformer Architecture.pdfIntroduction to Large Language Models and the Transformer Architecture.pdf
Introduction to Large Language Models and the Transformer Architecture.pdf
 
BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...
BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...
BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...
 
Analysis of the evolution of advanced transformer-based language models: Expe...
Analysis of the evolution of advanced transformer-based language models: Expe...Analysis of the evolution of advanced transformer-based language models: Expe...
Analysis of the evolution of advanced transformer-based language models: Expe...
 
Ie essay j yongchan_lee
Ie essay j yongchan_leeIe essay j yongchan_lee
Ie essay j yongchan_lee
 
Zeroshot multimodal named entity disambiguation for noisy social media posts
Zeroshot multimodal named entity disambiguation for noisy social media postsZeroshot multimodal named entity disambiguation for noisy social media posts
Zeroshot multimodal named entity disambiguation for noisy social media posts
 
Weak Supervision.pdf
Weak Supervision.pdfWeak Supervision.pdf
Weak Supervision.pdf
 
Applying Aspect-Extended UML Modelling to 'e'
Applying Aspect-Extended UML Modelling to 'e'Applying Aspect-Extended UML Modelling to 'e'
Applying Aspect-Extended UML Modelling to 'e'
 
Darren galpin q4_2008_bristol
Darren galpin q4_2008_bristolDarren galpin q4_2008_bristol
Darren galpin q4_2008_bristol
 
Sanjeev Satheesj, Research Scientist, Baidu at The AI Conference 2017
Sanjeev Satheesj, Research Scientist, Baidu at The AI Conference 2017Sanjeev Satheesj, Research Scientist, Baidu at The AI Conference 2017
Sanjeev Satheesj, Research Scientist, Baidu at The AI Conference 2017
 
ODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLP
 
Allganize AI seminar - GPT3 and PET
Allganize AI seminar - GPT3 and PETAllganize AI seminar - GPT3 and PET
Allganize AI seminar - GPT3 and PET
 
4 - Overview of Generative AI Session#4.pptx
4 - Overview of Generative AI Session#4.pptx4 - Overview of Generative AI Session#4.pptx
4 - Overview of Generative AI Session#4.pptx
 
BERT - Part 1 Learning Notes of Senthil Kumar
BERT - Part 1 Learning Notes of Senthil KumarBERT - Part 1 Learning Notes of Senthil Kumar
BERT - Part 1 Learning Notes of Senthil Kumar
 
Devoxx traitement automatique du langage sur du texte en 2019
Devoxx   traitement automatique du langage sur du texte en 2019 Devoxx   traitement automatique du langage sur du texte en 2019
Devoxx traitement automatique du langage sur du texte en 2019
 
Prasad Rompalli latest Resume
Prasad Rompalli latest ResumePrasad Rompalli latest Resume
Prasad Rompalli latest Resume
 
Neural Mask Generator : Learning to Generate Adaptive Word Maskings for Langu...
Neural Mask Generator : Learning to Generate Adaptive WordMaskings for Langu...Neural Mask Generator : Learning to Generate Adaptive WordMaskings for Langu...
Neural Mask Generator : Learning to Generate Adaptive Word Maskings for Langu...
 
BERT Explained_ State of the art language model for NLP.pdf
BERT Explained_ State of the art language model for NLP.pdfBERT Explained_ State of the art language model for NLP.pdf
BERT Explained_ State of the art language model for NLP.pdf
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Último (20)

Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

Pre trained language model

Notas del editor

  1. allowing us to experiment with increased training scale, up to our very limit.
  2. Idea is simple, learn each word’s vector, which usually called word2vec
  3. two “apple” words refer to very different things but they would still share the same word embedding vector.
  4. How embedding of ELMO comes up?
  5. Starting of era of transfer-learning