XLNet: State-of-the-art NLP Model for Language Understanding

•Descargar como PPTX, PDF•

8 recomendaciones•1,656 vistas

XLNet is a generalized autoregressive pretraining model for natural language understanding. It leverages all possible permutations of the input sequence to capture bidirectional contexts, unlike previous autoregressive models which only learn information in one direction. This allows XLNet to better model relationships between non-consecutive words. XLNet also does not rely on corrupted data, but directly models the probability distribution over the input text. In experiments, XLNet achieves state-of-the-art results on 18 different natural language processing tasks.

Tecnología

XLNet: Generalized
Autoregressive Pretraining for
Language Understanding
이도명
2019/7/17
Zhilin Yang, Zihang Dai, Yiming Yang, Jaime
Carbonell, Ruslan Salakhutdinov, Quoc V. Le

It’s all about Pretraining + Fine Tuning
Machine Reading
Comprehension
Pretrained Model
(Language Model)
Fine Tuning
Fine Tuning
One pretrained model achieves state-of-the-art results
on a wide range of NLP tasks (18)
We assume that NLU(Natural Language Understanding)
can be estimated by those tasks
Fine Tuning

How can we generate a pretrained model
to understand the Natural Language?

Language Model
• A statistical language model is a probability distribution over
sequences of words.
Given such a sequence, say of length m, it assigns a
probability
to the whole sequence.
“I love Natural Language Processing” 문장이 그럴듯한 분포인지
P( I, love , Natural, Language , Processing) 을 추정

Previous Work
Conventional Language Models : A probability distribution
over sequence of words, given its History.

It is called Autoregressive(AR) Language Model
A example of Autoregressive :
‘t’ depends on ‘<t' contexts
and predicts recursively

Limitations
• AR based language models only learns uni-directional
information.
 Missing bidirectional contexts

Previous Work
BERT : Pre-training of Deep Bidirectional Transformers for
Language Understanding

Reconstruct the missing part
(Denoising Auto Encoder based approach)
Image Text
“I love Natural [MASK] Processing”

Limitations
• 1. BERT doesn’t use [MASK] in fine tuning step but only in
pretraining step.
• 2. BERT assumes the predicted tokens are independent of
each other given unmasked tokens.
 Missing important information such as Noun Phrase(‘New’,
‘York’)

XLNet – Permutation Language Model
• 1. XLNet leverages all possible permutations.
 Enables to capture both direction
• 2. As a generalized AR language model, XLNet does not rely on data
corruption.
 Considers the entire probability distribution of a text sequence
• 3. XLNet achieves SOTA on 18 NLP tasks.

Model Overview
Previous Context
(Brought from
*Transformer-XL)
주의:
Sequence order doesn’t change.
But with the different Factorization order,
different parts are used as input.
*Dai, Zihang, et al. "Transformer-xl: Attentive language models
beyond a fixed-length context." arXiv preprint arXiv:1901.02860(2019).
I love Natural Language
Natural
Key IDEA:
It only sees t-1 when I'm predicting
t's words.
Let's mix the order index!
# Even if you predict the same Natural,
the word Permutation is different.
(They Solve it Attention Mask).
It means that you will look back and
forth in many combinations.

We have a problem in target function
• Input words(x) : I love Natural Language Processing
• Permutation1 : 4 1 2 5 3 ex) P(Processing | I, love, Language)
• Permutation2 : 4 1 2 3 5 ex) P(Natural | I, love, Language)
𝒛 𝒕𝒛<𝒕
𝒛 𝟏
𝒛 𝟐
Target
Function :

Why Two-stream? We have to get h for the 𝒛<𝒕 Content stream which will be used in g

Conclusion
• The XLNet is a generalized AR pretraining model that
combines the advantage of both conventional language
model(AR) and AutoEncoder model (AE)
• Leveraging the Transformer-XL and designing the two stream
mechanism, XLNet is trained to estimate the probability
distribution as in AR.
• XLNet achieves state-of-the-art results various tasks
(18 NLPtasks)

Más contenido relacionado

La actualidad más candente

[Paper Reading] Attention is All You NeedDaiki Tanaka

And then there were ... Large Language ModelsLeon Dohmen

Natural language processing and transformer modelsDing Li

Glove global vectors for word representationhyunyoung Lee

Attention is All You Need (Transformer)Jeong-Gwan Lee

Notes on attention mechanismKhang Pham

Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Deep Learning Italia

Fine tuning large LMsSylvainGugger

Large Language Models - From RNN to BERTATPowr

Transformer Introduction (Seminar Material)Yuta Niki

Natural language processingYogendra Tamang

BertAbdallah Bashir

[Paper review] BERTJEE HYUN PARK

LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...Po-Chuan Chen

A Comprehensive Review of Large Language Models for.pptxSaiPragnaKancheti

LLaMA 2.pptxRkRahul16

NLPGirish Khanzode

Tutorial on Deep Generative ModelsMLReview

Deep learning for NLP and TransformerArvind Devaraj

Natural language processing Md.Sumon Sarder

La actualidad más candente (20)

[Paper Reading] Attention is All You Need

And then there were ... Large Language Models

Natural language processing and transformer models

Glove global vectors for word representation

Attention is All You Need (Transformer)

Notes on attention mechanism

Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)

Fine tuning large LMs

Large Language Models - From RNN to BERT

Transformer Introduction (Seminar Material)

Natural language processing

Bert

[Paper review] BERT

LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...

A Comprehensive Review of Large Language Models for.pptx

LLaMA 2.pptx

NLP

Tutorial on Deep Generative Models

Deep learning for NLP and Transformer

Natural language processing

Similar a XLNet: State-of-the-art NLP Model for Language Understanding

Artificial Intelligence Notes Unit 4DigiGurukul

Deep Learning for Natural Language Processing: Word EmbeddingsRoelof Pieters

A Neural Probabilistic Language Model.pptxRama Irsheidat

Visual-Semantic Embeddings: some thoughts on LanguageRoelof Pieters

Natural Language Processing Course in AISATHYANARAYANAKB

Deep Learning for Natural Language ProcessingParrotAI

Natural Language Processing (NLP)Abdullah al Mamun

NlpNishanthini Mary

REPORT.docIswaryaPurushothaman1

Yves Peirsman - Deep Learning for NLPHendrik D'Oosterlinck

Natural Language ProcessingToine Bogers

NltkAnirudh

Natural Language Processing (NLP).pptxSHIBDASDUTTA

CSCE181 Big ideas in NLPInsoo Chung

Esa actAdvanced-Concepts-Team

Natural Language ProcessingVarunjeet Singh Rekhi

NLP.pptxRudraPrasannaMishra

Chunker Based Sentiment Analysis and Tense Classification for Nepali Textkevig

Langauage modelc sharada

Similar a XLNet: State-of-the-art NLP Model for Language Understanding (20)

Artificial Intelligence Notes Unit 4

Deep Learning for Natural Language Processing: Word Embeddings

A Neural Probabilistic Language Model.pptx

Visual-Semantic Embeddings: some thoughts on Language

Natural Language Processing Course in AI

Deep Learning for Natural Language Processing

Natural Language Processing (NLP)

Nlp

REPORT.doc

Yves Peirsman - Deep Learning for NLP

Natural Language Processing

Nltk

Natural Language Processing (NLP).pptx

CSCE181 Big ideas in NLP

Esa act

Natural Language Processing

NLP.pptx

Chunker Based Sentiment Analysis and Tense Classification for Nepali Text

Langauage model

Último

Commit 2024 - Secret Management made easyAlfredo García Lavilla

Story boards and shot lists for my a level piececharlottematthew16

Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst

Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi

Search Engine Optimization SEO PDF for 2024.pdfRankYa

Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm

Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge

Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar

TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey

Take control of your SAP testing with UiPath Test SuiteDianaGray10

TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc

DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell

"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays

Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz

WordPress Websites for Engineers: Elevate Your Brandgvaughan

Anypoint Exchange: It’s Not Just a Repo!Manik S Magar

DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy

Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos

XLNet: State-of-the-art NLP Model for Language Understanding

1. XLNet: Generalized Autoregressive Pretraining for Language Understanding 이도명 2019/7/17 Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le

2. NLP's ImageNet moment has arrived

3. It’s all about Pretraining + Fine Tuning Machine Reading Comprehension Pretrained Model (Language Model) Fine Tuning Fine Tuning One pretrained model achieves state-of-the-art results on a wide range of NLP tasks (18) We assume that NLU(Natural Language Understanding) can be estimated by those tasks Fine Tuning

4. How can we generate a pretrained model to understand the Natural Language?

5. Language Model • A statistical language model is a probability distribution over sequences of words. Given such a sequence, say of length m, it assigns a probability to the whole sequence. “I love Natural Language Processing” 문장이 그럴듯한 분포인지 P( I, love , Natural, Language , Processing) 을 추정

6. Previous Work Conventional Language Models : A probability distribution over sequence of words, given its History.

7. It is called Autoregressive(AR) Language Model A example of Autoregressive : ‘t’ depends on ‘<t' contexts and predicts recursively

8. Limitations • AR based language models only learns uni-directional information.  Missing bidirectional contexts

9. Previous Work BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding

10. Reconstruct the missing part (Denoising Auto Encoder based approach) Image Text “I love Natural [MASK] Processing”

11. Limitations • 1. BERT doesn’t use [MASK] in fine tuning step but only in pretraining step. • 2. BERT assumes the predicted tokens are independent of each other given unmasked tokens.  Missing important information such as Noun Phrase(‘New’, ‘York’)

12. XLNet – Permutation Language Model • 1. XLNet leverages all possible permutations.  Enables to capture both direction • 2. As a generalized AR language model, XLNet does not rely on data corruption.  Considers the entire probability distribution of a text sequence • 3. XLNet achieves SOTA on 18 NLP tasks.

13. Model Overview Previous Context (Brought from *Transformer-XL) 주의: Sequence order doesn’t change. But with the different Factorization order, different parts are used as input. *Dai, Zihang, et al. "Transformer-xl: Attentive language models beyond a fixed-length context." arXiv preprint arXiv:1901.02860(2019). I love Natural Language Natural Key IDEA: It only sees t-1 when I'm predicting t's words. Let's mix the order index! # Even if you predict the same Natural, the word Permutation is different. (They Solve it Attention Mask). It means that you will look back and forth in many combinations.

14. We have a problem in target function • Input words(x) : I love Natural Language Processing • Permutation1 : 4 1 2 5 3 ex) P(Processing | I, love, Language) • Permutation2 : 4 1 2 3 5 ex) P(Natural | I, love, Language) 𝒛 𝒕𝒛<𝒕 𝒛 𝟏 𝒛 𝟐 Target Function :

15. We have a problem in target function • Input words(x) : I love Natural Language Processing • Permutation1 : 4 1 2 5 3 ex) P(Processing | I, love, Language) • Permutation2 : 4 1 2 3 5 ex) P(Natural | I, love, Language) 𝒛 𝒕𝒛<𝒕 We didn’t consider the position 𝒛 𝒕 ! 𝒛 𝟏 𝒛 𝟐 Natural 이랑 Processing 예측했을 때 확률이 같네? Shouldn't we tell them apart where they are located?

16.

17.

18. Why Two-stream? We have to get h for the 𝒛<𝒕 Content stream which will be used in g

19. Evaluation

20. Ablation Study

21. Conclusion • The XLNet is a generalized AR pretraining model that combines the advantage of both conventional language model(AR) and AutoEncoder model (AE) • Leveraging the Transformer-XL and designing the two stream mechanism, XLNet is trained to estimate the probability distribution as in AR. • XLNet achieves state-of-the-art results various tasks (18 NLPtasks)

22. _END_

Notas del editor

Imagenet 데이터로 1000개 분류를 학습하면서 common knowledge 를 학습한것 하고 거기서 finetuning 한것처럼
같은 Natural을 예측해도 Permutation이 달라서 들어가는 것이다르다. 여기서 모든 Factorization order에 대한 loss 기대값을 낮추는 방향으로 학습을 한다. 여러 조합으로 앞뒤로 봐서 잘 나와야된다는 뜻임.

XLNet: State-of-the-art NLP Model for Language Understanding

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a XLNet: State-of-the-art NLP Model for Language Understanding

Similar a XLNet: State-of-the-art NLP Model for Language Understanding (20)

Último

Último (20)

XLNet: State-of-the-art NLP Model for Language Understanding

Notas del editor