SlideShare una empresa de Scribd logo
1 de 21
Descargar para leer sin conexión
BERT – Bird’s Eye View
PART1
Learning Notes of Senthil Kumar
Agenda
Bird’s eye view (PART1)
About Language Models
What is BERT? – A SOTA LM
What does BERT do? – Multi-task Learning
How is BERT built? – BERT Architecture
BERT Fine-tuning and Results
What is powering NLP’s Growth?
Two important trends that power recent NLP growth
• Better representation of the text data (with no supervision)
• By grasping the 'context' better
(e.g. terms used language models, word embeddings, contextualized word embeddings)
• By enabling 'transfer learning’
(e.g.: term used pre-trained language models)
• Deep Learning Models that use the better representations to solve real-world problems like Text
Classification, Semantic Role Labelling, Translation, Image Captioning, etc.,
Important Terms
Language Model:
A language model (LM) is a model that generates a probability distribution over sequences of words
Language Model - A model to predict the next word in a sequence of words
***************************
Encoder:
Converts variable length text (any sequence information) into a fixed length vector
***************************
Transformer:
- A sequence model forgoes the recurrent structure of RNN to adopt attention-based approach
- Transformer vs Seq2Seq Model
- { Encoder + (attention-mechanism) + Decoder } vs { Encoder + Decoder }
***************************
Recurrent Structure: Processes all input elements sequentially
Attention-based Approach: Process all input elements SIMULTANEOUSLY
Intro – Count-based N-gram LM (1/5)
Sources:
NLP course on Coursera – National Higher School of Economics
Advanced NLP and Deep Learning course on Udemy (by LazyProgrammer)
Intro – Context-Prediction based LMs (2/5)
𝑃(𝑦|𝑥) = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(WT x)
𝑃(𝑦 = 𝑗|𝑥) =
𝑒𝑥𝑝 𝑤𝑗
𝑇 𝑥 + 𝑏𝑗
σ 𝑘 ∊ 𝐾 𝑒𝑥𝑝 𝑤 𝑘
𝑇 𝑥 + 𝑏𝑘
𝑃 𝑦 = 𝑗 𝑥 =
𝑒𝑥𝑝 tanh(𝑤𝑗𝑇 𝑥 + 𝑏𝑗 )
σ 𝑘 ∊ 𝐾 𝑒𝑥𝑝 (𝑡𝑎𝑛ℎ 𝑤 𝑘
𝑇 𝑥 + 𝑏𝑘 )
𝑃(𝑦|𝑥) = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(tanh(WT x))
𝐿𝑜𝑔𝑖𝑠𝑡𝑖𝑐 𝐵𝑖𝑔𝑟𝑎𝑚 𝑀𝑜𝑑𝑒𝑙 Feed Forward 𝑁𝑒𝑢𝑟𝑎𝑙 𝑁𝑒𝑡𝑤𝑜𝑟𝑘 𝐿𝑀
proposed by Bengio in 2001
Sources:
Advanced NLP and Deep Learning course on Udemy (by LazyProgrammer); Idea: http://www.marekrei.com/blog/dont-count-predict/
Word2Vec
proposed by Mikilov in 2001
𝑃 𝑦 𝑥 =
𝑛𝑒𝑔_𝑠𝑎𝑚𝑝𝑙𝑖𝑛𝑔_𝑠𝑜𝑓𝑡𝑚𝑎𝑥
(WT x)
Higher Form of LMs(3/5)
𝑇𝑦𝑝𝑖𝑐𝑎𝑙 𝐿𝑀
Source:
https://indico.io/blog/sequence-modeling-neuralnets-part1/
Seq2Seq – A higher form of LM
Sequence to sequence models build on top of language models by adding an encoder step
and a decoder step. the decoder model sees an encoded representation of the input
sequence as well as the translation sequence, it can make more intelligent predictions
about future words based on the current word
Higher Form of LMs(4/5)
𝑇𝑦𝑝𝑖𝑐𝑎𝑙 𝑆𝑒𝑞2𝑆𝑒𝑞 𝑀𝑜𝑑𝑒𝑙 𝑇𝑟𝑎𝑛𝑠𝑓𝑜𝑟𝑚𝑒𝑟 𝑀𝑜𝑑𝑒𝑙
{ Encoder + (attention-mechanism) + Decoder }
Attention-based Approach: Process all input elements
SIMULTANEOUSLY
{ Encoder + Decoder }
Recurrent Structure: Processes all input elements
sequentially
Sources:
https://towardsdatascience.com/nlp-sequence-to-sequence-networks-part-2-seq2seq-model-encoderdecoder-model-6c22e29fd7e1
Pre-trained Language Models (5/5)
Sources: BERT Paper - https://arxiv.org/pdf/1810.04805.pdf
What is BERT
• Bidirectional Encoder Representations
from Transformers
• SOTA Language Model
• ULMFiT, ELMo, BERT, XLNet -- NLP’s
ImageNet Moment – the era of
contextualized word embedding and
Transfer Learning
Sources:
https://slideplayer.com/slide/17025344/
BERT is Bidirectional
• BERT is bidirectional – reads text from both directions,
left as well as from right to gain better understanding
of the text
• For example, the above sentence is read from BERT is
bidirectional … as well as … understanding of the text
• BERT is bidirectional because it uses Transformer
Encoder (which is made bidirectional by the self
attention mechanism)
• OpenAI Generative Pre-trained Transformer is uni-
directional because it employs Transformer Decoder
(the masked self-attention makes it only left to right)
Sources:
https://github.com/google-research/bert/issues/319
https://github.com/google-research/bert/issues/83
BERT uses Encoder portion of Transformer
Sources: Attention is all you Need: https://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf
BERT Architecture
Source: BERT GitHub Repository: https://github.com/google-research/bert
Pretrained Tasks
BERT's core methodology:
- Applying the bidirectional
training of Transformer to do
Language Modeling and Next
Sentence Prediction
BERT is originally pretrained to do 2
tasks:
• Masked Language Model technique
allows bidirectional training for next
word prediction
• Next Sentence Prediction: BERT to
predict the next sentence
Multi-task Learning
Sources:
Idea of Multi-task learning – Karate Kid reference:
https://towardsdatascience.com/whats-new-in-deep-learning-research-how-deep-
builds-multi-task-reinforcement-learning-algorithms-f0d868684c45
Pretrained Tasks - Finetuning
Sources:
BERT Paper - https://arxiv.org/pdf/1810.04805.pdf
http://jalammar.github.io/illustrated-bert/
Task1 - Masked Language Model
• BERT (Masked LM technique) converges slower than left to right predictions. But after a small number of training steps, its accuracy beats the unidirectional model
• Multi-genre Natural Language Inference is a text classification task
• MNLI is a large-scale, crowdsourced entailment classification task (Williams et al., 2018). Given a pair of sentences, the goal is to predict whether the second sentence is an
entailment, contradiction, or neutral with respect to the first one.
Source: https://medium.com/dissecting-bert/dissecting-bert-part-1-d3c3d495cdb3
Task2 - Next Sentence Prediction
BERT to predict the next sentence
BERT <-- two types of input pairs of sentences are fed to the
model
i.50% of the sentence pairs are in the right order
ii.In the other 50%, the second sentence in the pair is
randomly chosen
Source: https://medium.com/dissecting-bert/dissecting-bert-part-1-d3c3d495cdb3
BERT Fine-tuning - Classification
- Stanford Sentiment Treebank 2
- Corpus of Linguistic Acceptability
- Mutigenre Natural Language Inference
- The General Language Understanding Evaluation(GLUE) benchmark is a col-lection
of diverse natural language understandingtasks.
Sources: BERT Paper - https://arxiv.org/pdf/1810.04805.pdf
BERT Fine-tuning – Semantic Role Labeling
- Conference on Natural Language Learning 2003 NER Task
Sources: BERT Paper - https://arxiv.org/pdf/1810.04805.pdf
BERT Fine-tuning – Q&A
TheStanfordQuestionAnsweringDataset
Sources: BERT Paper - https://arxiv.org/pdf/1810.04805.pdf
Thank You

Más contenido relacionado

La actualidad más candente

GPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask LearnersGPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask LearnersYoung Seok Kim
 
An introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERTAn introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERTSuman Debnath
 
Gpt1 and 2 model review
Gpt1 and 2 model reviewGpt1 and 2 model review
Gpt1 and 2 model reviewSeoung-Ho Choi
 
Natural language processing and transformer models
Natural language processing and transformer modelsNatural language processing and transformer models
Natural language processing and transformer modelsDing Li
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understandinggohyunwoong
 
Introduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga PetrovaIntroduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga PetrovaAlexey Grigorev
 
Transformers and BERT with SageMaker
Transformers and BERT with SageMakerTransformers and BERT with SageMaker
Transformers and BERT with SageMakerSuman Debnath
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingYoung Seok Kim
 
Fine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP modelsFine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP modelsOVHcloud
 
Pre trained language model
Pre trained language modelPre trained language model
Pre trained language modelJiWenKim
 
Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Alia Hamwi
 
Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"Fwdays
 
Nlp and transformer (v3s)
Nlp and transformer (v3s)Nlp and transformer (v3s)
Nlp and transformer (v3s)H K Yoon
 
Deep learning for NLP and Transformer
 Deep learning for NLP  and Transformer Deep learning for NLP  and Transformer
Deep learning for NLP and TransformerArvind Devaraj
 

La actualidad más candente (20)

BERT
BERTBERT
BERT
 
GPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask LearnersGPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask Learners
 
An introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERTAn introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERT
 
Gpt1 and 2 model review
Gpt1 and 2 model reviewGpt1 and 2 model review
Gpt1 and 2 model review
 
Natural language processing and transformer models
Natural language processing and transformer modelsNatural language processing and transformer models
Natural language processing and transformer models
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 
Introduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga PetrovaIntroduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga Petrova
 
Transformers and BERT with SageMaker
Transformers and BERT with SageMakerTransformers and BERT with SageMaker
Transformers and BERT with SageMaker
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 
Introduction to Transformer Model
Introduction to Transformer ModelIntroduction to Transformer Model
Introduction to Transformer Model
 
Fine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP modelsFine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP models
 
BERT
BERTBERT
BERT
 
Pre trained language model
Pre trained language modelPre trained language model
Pre trained language model
 
Transformers in 2021
Transformers in 2021Transformers in 2021
Transformers in 2021
 
Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)
 
Transformers 101
Transformers 101 Transformers 101
Transformers 101
 
Word embedding
Word embedding Word embedding
Word embedding
 
Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"
 
Nlp and transformer (v3s)
Nlp and transformer (v3s)Nlp and transformer (v3s)
Nlp and transformer (v3s)
 
Deep learning for NLP and Transformer
 Deep learning for NLP  and Transformer Deep learning for NLP  and Transformer
Deep learning for NLP and Transformer
 

Similar a BERT - Part 1 Learning Notes of Senthil Kumar

BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...
BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...
BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...Kyuri Kim
 
Transformer Models_ BERT vs. GPT.pdf
Transformer Models_ BERT vs. GPT.pdfTransformer Models_ BERT vs. GPT.pdf
Transformer Models_ BERT vs. GPT.pdfhelloworld28847
 
BERT Explained_ State of the art language model for NLP.pdf
BERT Explained_ State of the art language model for NLP.pdfBERT Explained_ State of the art language model for NLP.pdf
BERT Explained_ State of the art language model for NLP.pdfsudeshnakundu10
 
BERT - Part 2 Learning Notes
BERT - Part 2 Learning NotesBERT - Part 2 Learning Notes
BERT - Part 2 Learning NotesSenthil Kumar M
 
Natural language processing
Natural language processingNatural language processing
Natural language processingBirger Moell
 
Andrea gatto meetup_dli_18_feb_2020
Andrea gatto meetup_dli_18_feb_2020Andrea gatto meetup_dli_18_feb_2020
Andrea gatto meetup_dli_18_feb_2020Deep Learning Italia
 
A Novel Approach for Rule Based Translation of English to Marathi
A Novel Approach for Rule Based Translation of English to MarathiA Novel Approach for Rule Based Translation of English to Marathi
A Novel Approach for Rule Based Translation of English to Marathiaciijournal
 
A Novel Approach for Rule Based Translation of English to Marathi
A Novel Approach for Rule Based Translation of English to MarathiA Novel Approach for Rule Based Translation of English to Marathi
A Novel Approach for Rule Based Translation of English to Marathiaciijournal
 
A Novel Approach for Rule Based Translation of English to Marathi
A Novel Approach for Rule Based Translation of English to MarathiA Novel Approach for Rule Based Translation of English to Marathi
A Novel Approach for Rule Based Translation of English to Marathiaciijournal
 
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRFEnd-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRFJayavardhan Reddy Peddamail
 
Intent Classifier with Facebook fastText
Intent Classifier with Facebook fastTextIntent Classifier with Facebook fastText
Intent Classifier with Facebook fastTextBayu Aldi Yansyah
 
Natural Language Processing - Research and Application Trends
Natural Language Processing - Research and Application TrendsNatural Language Processing - Research and Application Trends
Natural Language Processing - Research and Application TrendsShreyas Suresh Rao
 
A NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISH
A NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISHA NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISH
A NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISHIRJET Journal
 
Meta-evaluation of machine translation evaluation methods
Meta-evaluation of machine translation evaluation methodsMeta-evaluation of machine translation evaluation methods
Meta-evaluation of machine translation evaluation methodsLifeng (Aaron) Han
 
Deep Learning in NLP (BERT, ERNIE and REFORMER)
Deep Learning in NLP (BERT, ERNIE and REFORMER)Deep Learning in NLP (BERT, ERNIE and REFORMER)
Deep Learning in NLP (BERT, ERNIE and REFORMER)Biswajit Biswas
 
NLP using transformers
NLP using transformers NLP using transformers
NLP using transformers Arvind Devaraj
 
Introduction to Large Language Models and the Transformer Architecture.pdf
Introduction to Large Language Models and the Transformer Architecture.pdfIntroduction to Large Language Models and the Transformer Architecture.pdf
Introduction to Large Language Models and the Transformer Architecture.pdfsudeshnakundu10
 
Named Entity Recognition (NER) Using Automatic Summarization of Resumes
Named Entity Recognition (NER) Using Automatic Summarization of ResumesNamed Entity Recognition (NER) Using Automatic Summarization of Resumes
Named Entity Recognition (NER) Using Automatic Summarization of ResumesIRJET Journal
 

Similar a BERT - Part 1 Learning Notes of Senthil Kumar (20)

BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...
BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...
BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...
 
Transformer Models_ BERT vs. GPT.pdf
Transformer Models_ BERT vs. GPT.pdfTransformer Models_ BERT vs. GPT.pdf
Transformer Models_ BERT vs. GPT.pdf
 
BERT Explained_ State of the art language model for NLP.pdf
BERT Explained_ State of the art language model for NLP.pdfBERT Explained_ State of the art language model for NLP.pdf
BERT Explained_ State of the art language model for NLP.pdf
 
BERT - Part 2 Learning Notes
BERT - Part 2 Learning NotesBERT - Part 2 Learning Notes
BERT - Part 2 Learning Notes
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Andrea gatto meetup_dli_18_feb_2020
Andrea gatto meetup_dli_18_feb_2020Andrea gatto meetup_dli_18_feb_2020
Andrea gatto meetup_dli_18_feb_2020
 
A Novel Approach for Rule Based Translation of English to Marathi
A Novel Approach for Rule Based Translation of English to MarathiA Novel Approach for Rule Based Translation of English to Marathi
A Novel Approach for Rule Based Translation of English to Marathi
 
A Novel Approach for Rule Based Translation of English to Marathi
A Novel Approach for Rule Based Translation of English to MarathiA Novel Approach for Rule Based Translation of English to Marathi
A Novel Approach for Rule Based Translation of English to Marathi
 
A Novel Approach for Rule Based Translation of English to Marathi
A Novel Approach for Rule Based Translation of English to MarathiA Novel Approach for Rule Based Translation of English to Marathi
A Novel Approach for Rule Based Translation of English to Marathi
 
Transformer Zoo
Transformer ZooTransformer Zoo
Transformer Zoo
 
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRFEnd-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
 
Intent Classifier with Facebook fastText
Intent Classifier with Facebook fastTextIntent Classifier with Facebook fastText
Intent Classifier with Facebook fastText
 
How to Translate from English to Khmer using Moses
How to Translate from English to Khmer using MosesHow to Translate from English to Khmer using Moses
How to Translate from English to Khmer using Moses
 
Natural Language Processing - Research and Application Trends
Natural Language Processing - Research and Application TrendsNatural Language Processing - Research and Application Trends
Natural Language Processing - Research and Application Trends
 
A NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISH
A NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISHA NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISH
A NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISH
 
Meta-evaluation of machine translation evaluation methods
Meta-evaluation of machine translation evaluation methodsMeta-evaluation of machine translation evaluation methods
Meta-evaluation of machine translation evaluation methods
 
Deep Learning in NLP (BERT, ERNIE and REFORMER)
Deep Learning in NLP (BERT, ERNIE and REFORMER)Deep Learning in NLP (BERT, ERNIE and REFORMER)
Deep Learning in NLP (BERT, ERNIE and REFORMER)
 
NLP using transformers
NLP using transformers NLP using transformers
NLP using transformers
 
Introduction to Large Language Models and the Transformer Architecture.pdf
Introduction to Large Language Models and the Transformer Architecture.pdfIntroduction to Large Language Models and the Transformer Architecture.pdf
Introduction to Large Language Models and the Transformer Architecture.pdf
 
Named Entity Recognition (NER) Using Automatic Summarization of Resumes
Named Entity Recognition (NER) Using Automatic Summarization of ResumesNamed Entity Recognition (NER) Using Automatic Summarization of Resumes
Named Entity Recognition (NER) Using Automatic Summarization of Resumes
 

Último

(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlCall Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlkumarajju5765
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 

Último (20)

(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlCall Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 

BERT - Part 1 Learning Notes of Senthil Kumar

  • 1. BERT – Bird’s Eye View PART1 Learning Notes of Senthil Kumar
  • 2. Agenda Bird’s eye view (PART1) About Language Models What is BERT? – A SOTA LM What does BERT do? – Multi-task Learning How is BERT built? – BERT Architecture BERT Fine-tuning and Results
  • 3. What is powering NLP’s Growth? Two important trends that power recent NLP growth • Better representation of the text data (with no supervision) • By grasping the 'context' better (e.g. terms used language models, word embeddings, contextualized word embeddings) • By enabling 'transfer learning’ (e.g.: term used pre-trained language models) • Deep Learning Models that use the better representations to solve real-world problems like Text Classification, Semantic Role Labelling, Translation, Image Captioning, etc.,
  • 4. Important Terms Language Model: A language model (LM) is a model that generates a probability distribution over sequences of words Language Model - A model to predict the next word in a sequence of words *************************** Encoder: Converts variable length text (any sequence information) into a fixed length vector *************************** Transformer: - A sequence model forgoes the recurrent structure of RNN to adopt attention-based approach - Transformer vs Seq2Seq Model - { Encoder + (attention-mechanism) + Decoder } vs { Encoder + Decoder } *************************** Recurrent Structure: Processes all input elements sequentially Attention-based Approach: Process all input elements SIMULTANEOUSLY
  • 5. Intro – Count-based N-gram LM (1/5) Sources: NLP course on Coursera – National Higher School of Economics Advanced NLP and Deep Learning course on Udemy (by LazyProgrammer)
  • 6. Intro – Context-Prediction based LMs (2/5) 𝑃(𝑦|𝑥) = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(WT x) 𝑃(𝑦 = 𝑗|𝑥) = 𝑒𝑥𝑝 𝑤𝑗 𝑇 𝑥 + 𝑏𝑗 σ 𝑘 ∊ 𝐾 𝑒𝑥𝑝 𝑤 𝑘 𝑇 𝑥 + 𝑏𝑘 𝑃 𝑦 = 𝑗 𝑥 = 𝑒𝑥𝑝 tanh(𝑤𝑗𝑇 𝑥 + 𝑏𝑗 ) σ 𝑘 ∊ 𝐾 𝑒𝑥𝑝 (𝑡𝑎𝑛ℎ 𝑤 𝑘 𝑇 𝑥 + 𝑏𝑘 ) 𝑃(𝑦|𝑥) = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(tanh(WT x)) 𝐿𝑜𝑔𝑖𝑠𝑡𝑖𝑐 𝐵𝑖𝑔𝑟𝑎𝑚 𝑀𝑜𝑑𝑒𝑙 Feed Forward 𝑁𝑒𝑢𝑟𝑎𝑙 𝑁𝑒𝑡𝑤𝑜𝑟𝑘 𝐿𝑀 proposed by Bengio in 2001 Sources: Advanced NLP and Deep Learning course on Udemy (by LazyProgrammer); Idea: http://www.marekrei.com/blog/dont-count-predict/ Word2Vec proposed by Mikilov in 2001 𝑃 𝑦 𝑥 = 𝑛𝑒𝑔_𝑠𝑎𝑚𝑝𝑙𝑖𝑛𝑔_𝑠𝑜𝑓𝑡𝑚𝑎𝑥 (WT x)
  • 7. Higher Form of LMs(3/5) 𝑇𝑦𝑝𝑖𝑐𝑎𝑙 𝐿𝑀 Source: https://indico.io/blog/sequence-modeling-neuralnets-part1/ Seq2Seq – A higher form of LM Sequence to sequence models build on top of language models by adding an encoder step and a decoder step. the decoder model sees an encoded representation of the input sequence as well as the translation sequence, it can make more intelligent predictions about future words based on the current word
  • 8. Higher Form of LMs(4/5) 𝑇𝑦𝑝𝑖𝑐𝑎𝑙 𝑆𝑒𝑞2𝑆𝑒𝑞 𝑀𝑜𝑑𝑒𝑙 𝑇𝑟𝑎𝑛𝑠𝑓𝑜𝑟𝑚𝑒𝑟 𝑀𝑜𝑑𝑒𝑙 { Encoder + (attention-mechanism) + Decoder } Attention-based Approach: Process all input elements SIMULTANEOUSLY { Encoder + Decoder } Recurrent Structure: Processes all input elements sequentially Sources: https://towardsdatascience.com/nlp-sequence-to-sequence-networks-part-2-seq2seq-model-encoderdecoder-model-6c22e29fd7e1
  • 9. Pre-trained Language Models (5/5) Sources: BERT Paper - https://arxiv.org/pdf/1810.04805.pdf
  • 10. What is BERT • Bidirectional Encoder Representations from Transformers • SOTA Language Model • ULMFiT, ELMo, BERT, XLNet -- NLP’s ImageNet Moment – the era of contextualized word embedding and Transfer Learning Sources: https://slideplayer.com/slide/17025344/
  • 11. BERT is Bidirectional • BERT is bidirectional – reads text from both directions, left as well as from right to gain better understanding of the text • For example, the above sentence is read from BERT is bidirectional … as well as … understanding of the text • BERT is bidirectional because it uses Transformer Encoder (which is made bidirectional by the self attention mechanism) • OpenAI Generative Pre-trained Transformer is uni- directional because it employs Transformer Decoder (the masked self-attention makes it only left to right) Sources: https://github.com/google-research/bert/issues/319 https://github.com/google-research/bert/issues/83
  • 12. BERT uses Encoder portion of Transformer Sources: Attention is all you Need: https://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf
  • 13. BERT Architecture Source: BERT GitHub Repository: https://github.com/google-research/bert
  • 14. Pretrained Tasks BERT's core methodology: - Applying the bidirectional training of Transformer to do Language Modeling and Next Sentence Prediction BERT is originally pretrained to do 2 tasks: • Masked Language Model technique allows bidirectional training for next word prediction • Next Sentence Prediction: BERT to predict the next sentence Multi-task Learning Sources: Idea of Multi-task learning – Karate Kid reference: https://towardsdatascience.com/whats-new-in-deep-learning-research-how-deep- builds-multi-task-reinforcement-learning-algorithms-f0d868684c45
  • 15. Pretrained Tasks - Finetuning Sources: BERT Paper - https://arxiv.org/pdf/1810.04805.pdf http://jalammar.github.io/illustrated-bert/
  • 16. Task1 - Masked Language Model • BERT (Masked LM technique) converges slower than left to right predictions. But after a small number of training steps, its accuracy beats the unidirectional model • Multi-genre Natural Language Inference is a text classification task • MNLI is a large-scale, crowdsourced entailment classification task (Williams et al., 2018). Given a pair of sentences, the goal is to predict whether the second sentence is an entailment, contradiction, or neutral with respect to the first one. Source: https://medium.com/dissecting-bert/dissecting-bert-part-1-d3c3d495cdb3
  • 17. Task2 - Next Sentence Prediction BERT to predict the next sentence BERT <-- two types of input pairs of sentences are fed to the model i.50% of the sentence pairs are in the right order ii.In the other 50%, the second sentence in the pair is randomly chosen Source: https://medium.com/dissecting-bert/dissecting-bert-part-1-d3c3d495cdb3
  • 18. BERT Fine-tuning - Classification - Stanford Sentiment Treebank 2 - Corpus of Linguistic Acceptability - Mutigenre Natural Language Inference - The General Language Understanding Evaluation(GLUE) benchmark is a col-lection of diverse natural language understandingtasks. Sources: BERT Paper - https://arxiv.org/pdf/1810.04805.pdf
  • 19. BERT Fine-tuning – Semantic Role Labeling - Conference on Natural Language Learning 2003 NER Task Sources: BERT Paper - https://arxiv.org/pdf/1810.04805.pdf
  • 20. BERT Fine-tuning – Q&A TheStanfordQuestionAnsweringDataset Sources: BERT Paper - https://arxiv.org/pdf/1810.04805.pdf