SlideShare una empresa de Scribd logo
1 de 43
Descargar para leer sin conexión
BlenderBot
Recipes for building an open-domain chatbot
Facebook의 open-domain chatbot을 위한 2년간의 연구 집약체
자연어처리팀: 백지윤 주정헌 진명훈 황소현
(발표자)
What is Blenderbot?
https://ai.facebook.com/blog/state-of-the-art-open-source-chatbot/
https://arxiv.org/abs/2004.13637
Large Model History (Google, OpenAI)
2018 2021
2019 2020
https://qiita.com/Afo_guard_enthusiast/items/b37830430e465f397983
2017/06/02
Transformers
Google
2018/06/11
GPT-1
OpenAI
2018/10/11
BERT
Google
2019/02/14
GPT-2
OpenAI
2020/02/27
Meena
Google
2020/05/28
GPT-3
OpenAI
Large Model History (Google, OpenAI, and FAIR)
2018 2021
2019 2020
2017/06/02
Transformers
Google
2018/06/11
GPT-1
OpenAI
2018/10/11
BERT
Google
2019/02/14
GPT-2
OpenAI
2020/02/27
Meena
Google
2020/05/28
GPT-3
OpenAI
2020/04/29
BlenderBot
FAIR
2020/01
KVMemNN
FAIR
2020/02
Specificity Control
FAIR
2020/04/22
ConvAI2 Poly-Encoder
FAIR
2020/01
BST Poly-Encoder
FAIR
https://qiita.com/Afo_guard_enthusiast/items/b37830430e465f397983
https://ai.facebook.com/blog/state-of-the-art-open-source-chatbot/
ParlAI’s Projects
대화 모델의 학습 및 평가를 위해
Facebook AI Research가개발한Library
https://parl.ai/projects/
Generative Models
• UnlikelihoodTraining for Consistence Dialogue
• Retrieve andRefine
• Importance of SearchStrategy
Retrieval Models
• Poly-Encoders
Open-Domain Dialogues
• AddressingContradictions in Dialogue Modeling
• Recipes for open-domain chatbots
• Blended Skill Talk
• dodeca Dialogue
• Dialogue Natural Language Inference
• Empathetic Dialogues
• ConvAI2 Competition
• Persona-Chat
Well-Behaved
• Recipes for Safety in Open-DomainChatbots
• MitigatingGenderationBias
KnowledgeGrounded
• Wizardof Wikipedia
Evaluation
• ACUTE-Eval
BlenderBot’s Contribution
https://parl.ai/projects/
Generative Models
• UnlikelihoodTraining for Consistence Dialogue (MLE 노노)
• Retrieve andRefine (생성/검색의장점만 쏙쏙)
• Importance of SearchStrategy (BeamBlocking)
Retrieval Models
• Poly-Encoders (Bi/Cross Encoder의장점만쏙쏙)
Open-Domain Dialogues
• AddressingContradictions in Dialogue Modeling
• Recipes for open-domainchatbots
• Blended Skill Talk (d. 여러 대화 기술을 적절히 사용)
• dodeca Dialogue
• Dialogue Natural Language Inference
• Empathetic Dialogues (b.공감하는대화)
• ConvAI2 Competition(a. 매력적인 대화 + Persona)
• Persona-Chat
Well-Behaved
• Recipes for Safety in Open-DomainChatbots
• MitigatingGenderationBias
KnowledgeGrounded
• Wizardof Wikipedia (c.전문적인 대화)
Evaluation
• ACUTE-Eval (Automatic평가지표?기계적 이제 그만!)
1. Large Corpora로pre-training하는것이중요하다는 것은 이미
알려진 사실
• Transformer MemoryNetworks(Dinan et al.,2020)
• Dialogpt (Zhang et al.,2019)
• Meena (Adiwardanaet al., 2020)
2. 좋은 대화란? 아래 대화 스킬들을 요구한다고 합니다.
a. 상대로 하여금 매력적으로 보이게 만들며
b. 내 주장만 펼치는 것이 아니고 공감해주며
c. 자신의 전공/흥미 분야에 있어선 전문적인 대화가 가능한
d. 자신의 인격을 유지하며 위의 대화 스킬을 골고루 사용
3. GenerationStrategy를적절히선택하는 것이 중요!
• 사람처럼 발화하기 위해서 어떤 생성 전략을 선택할 것인가?
• 생성(Generative) vs 검색(Retrieval)? 둘의장점을채택!
• Beam Search도이것저것붙여주면 충분히 잘해요~
• Ex) Unlikelihood,Beam Blocking,Length Penalty
그리고, BLEU, Hit@k,Rouge,PPL 등Automatic 평가지표?
사람과 차이 존재! → 비용도 적고 사람답게 평가할 순 없을까??
→ ACUTE-Eval!Meena랑비교해봐야지~
Contents
✓ PART 1. The Data Good Conversation? Blended Skills!
• 데이터셋에 대한 설명 (ConvAI2,ED,WOW,BST,and Reddit)
✓ PART 2. The Choice of GenerationStrategy Model Architecture, Training Objective, Decoding
• Poly-Encoder, BART
• Retrieve and Refine
• Unlikelihood Training
• Decoding Strategy
✓ PART 3. Results & Analysis Human Evaluation using ACUTE-Eval
• ACUTE-Eval 평가 지표 소개 및 Ablation Study, Meena와의 비교
PART 1. The Data
1. Pre-training Data: Reddit & Subreddit Data
• Humeau et al., 2020(Poly-Encoder)과Mazare et al., 2018(TrainingMillions of Personalized Dialogue Agents)
• Pre-Train을BERT처럼위키/TorontoBook으로시키는것이 아닌 엄청 많은 대화 데이터가 있는 Reddit으로시키면성능 up
https://arxiv.org/pdf/1905.01969.pdf
Transfertransfo
Advanced ESIM
HRDE+LTC
Enhanced Word Repr
Seqeuntial Mathching Network
PART 1. The Data
1. Pre-training Data: Reddit & Subreddit Data
• Humeau et al., 2020(Poly-Encoder)과Mazare et al., 2018(TrainingMillions of Personalized Dialogue Agents)
• Pre-Train을BERT처럼위키/TorontoBook으로시키는것이 아닌 엄청 많은 대화 데이터가 있는 Reddit으로시키면성능 up
• 데이터셋 정제를 위해 아래와 같은 HeuristicRule을적용하여Filtering을실시함
1. The author is a known bot.
2. It comes from a known non-Englishsubreddit.
3. The comment is marked as removed(운영자가제거) / deleted(작성자가삭제).
4. It is longer than 2,048 charactersand does not contain spaces.
5. It is longer than 128 BPE tokens.
6. It is shorter than 5 characters.
7. It contains a URL.
8. It starts with a non-ASCII character.
9. It is further than depth 7 in the thread.
• 그러나 여전히 Toxicity,Noise, 그룹 토론 등의 문제가 남아있음 → Safety Project!(offensive detection)
PART 1. The Data
2. Fine-tuning Data: ① ConvAI2
• Personality(인간성), Engagingness(참여도, 매력)
• The Second Conversational Intelligence Challenge
• Aim to engage humans!
• 서로에 대해 모르는 두 명이 서로를 알아가는 대화!
• 상대방의 Persona는볼수 없음
• num_examples:131,438, num_episodes:17,878
140K
https://arxiv.org/pdf/1902.00098.pdf
https://medium.com/neuromation-blog/neuromation-researchers-won-the-neurips-convai2-competition-ae7375c5f9cf
PART 1. The Data
2. Fine-tuning Data: ② Empathetic Dialogues
• Empathetic(감정,공감해주기)
• FAIR 인턴이셨던 Hannah Rashkin님의paper
• Experiencer가자신의상황에 대해서 설명을 하면
• Responder(Listener)는이에대해 공감을 해줌
• Train/Valid/Test 19,533/2,770/2,547 conversation
• 양 side 대화를 모아 약 50K의utterance로학습
https://arxiv.org/pdf/1811.00207.pdf
PART 1. The Data
2. Fine-tuning Data: ③ Wizard of Wikipedias
• Representation of Knowledge,Expertise
• 특정 주제에 대해 심도있게 논의하자!
• Wizard(Expert)와Apprentice(Curious Learner)와의대화
• Wizard를Bot으로대체하는것이 목적!
• 대화 IR(TF-IDF) 시스템이학습자의 질문에 가장 관련있는 위키피디아 Paragraph를검색(top-7)
194k utterance
https://arxiv.org/pdf/1811.01241.pdf
PART 1. The Data
2. Fine-tuning Data: ④ BlendedSkill Talk
• Can You Put it All Together?(Smith et al.,2020)
• 공감/매력, 인격유지/전문가 → 다들 개별적으로 좋은 대화 스킬
• 그러나 하나의 스킬만 마스터하는 것보다 이를 적절히 활용하는 것이 중요
• BST는 이를 위한 Dataset
• (num_episodes) Train/Valid/Test 4,819/1,009/980
• (num_examples) Train/Valid/Test 27,018/5,651/5,482
• TOTAL 38,151*2 = 76,302
• 각 답변의 추천은 Single SkillModel의output
https://arxiv.org/pdf/2004.08449.pdf
PART 2. The Choice of Generation Strategy
Model Architecture: 다음 3가지의 구조를 고려함
- Retrieval
- Generative
- Retrieve-and-Refine
PART 2. The Choice of Generation Strategy
Model Architecture: 다음 3가지의 구조를 고려함
- Retrieval→ Poly-Encoder
- Generative
- Retrieve-and-Refine
https://arxiv.org/pdf/1905.01969.pdf
https://towardsdatascience.com/blender-bot-part-2-the-transformer-2e4d960b149f
red()
- First Pooling
- Avg over all outputs
- Avg over the first m <= N outputs
1st Attention:
Self-Attention over the InputContext’s tokens
2nd Attention:
Cross-AttentionBetween Context’s
output and learned m codes
3rd Attention:
Cross-AttentionBetween ‘m’
global featuresand Candidate
embedding
PART 2. The Choice of Generation Strategy
Model Architecture: 다음 3가지의 구조를 고려함
- Retrieval→ Poly-Encoder
- Generative
- Retrieve-and-Refine
BiEncoder→ 성능이 별로
CrossEncoder→ Inference 속도너무 느림
PolyEncoder→ 위 두 단점을 해결!
PolyEncoder Retrieval모델!
- 흥미로운 답변 가능
- 그러나 지정된 검색 Set으로발화가 제한됨
PART 2. The Choice of Generation Strategy
Model Architecture: 다음 3가지의 구조를 고려함
- Retrieval
- Generative→ Seq2Seq Transformers (BART)
- Retrieve-and-Refine
Generative 모델로사람과 얘기하면 되는거 아닌가?
그러나 생성 모델로 대화를 하면 아래 문제가 발생
- 짧은 문장으로 말하려는 경향이 있음 (Objective..)
- 일반적인, 즉 매력도가 떨어지는 대답을 하려고 함
위는 보통 Maximization 기반의Beam Search의문제로지적되곤
했지만, 이를 RetrievalModel과mix하여문제를해결코자 함
https://www.youtube.com/watch?v=VmYMnpDLPEo
PART 2. The Choice of Generation Strategy
Model Architecture: 다음 3가지의 구조를 고려함
- Retrieval
- Generative
- Retrieve-and-Refine
https://arxiv.org/pdf/1808.04776.pdf
Poly Encoder
Input Context
Candidate Labels
Input Context
Retrieval Next
Utterance
[SEP]
Decoder
Generated Next
Utterance
Dialog Retrieve and Refine
PART 2. The Choice of Generation Strategy
Model Architecture: 다음 3가지의 구조를 고려함
- Retrieval
- Generative
- Retrieve-and-Refine
https://arxiv.org/pdf/1808.04776.pdf
Poly Encoder
Input Context
Candidate Labels
Decoder
Generated Next
Utterance
KnowledgeRetrieve and Refine
IR System
(TF-IDF)
Retrieval Next
Utterance
PART 2. The Choice of Generation Strategy
Training Objevtives
- Retrieval
Poly Encoder(Homeauet al., 2019)와 동일하게 CrossEntropyLoss로 학습
Negative Training(학습 당시 Batch 중 첫번째만 Positive 나머진 Negative Sample로 구축)
- Generative
Maximum LikelihoodEstimation→ 문제가 있음...
- Retrieve-and-Refine
각각 모델을 학습시키면 됨.
Dialog RetNRef의 경우, Retrieve를좀 더 수행할 수 있게 𝛼-Blending을실시해줌.
Gold Response를 RetrievedResponse로 𝛼 만큼 바꿔줌
PART 2. The Choice of Generation Strategy
Rethink: Retrieval
PART 2. The Choice of Generation Strategy
Rethink: Generative with MLE
PART 2. The Choice of Generation Strategy
The Curious Case of Neural Text DeGeneration (Holtzman et al., 2020)
- Likelihood를objective로 사용하는 것이 NLU 전반에 걸쳐 고품질 모델로 만들어주지만 (High Grammatically)
- Maximization 기반 Decoding 전략 (ex> Beam or Greedy)은 Text Degeneration으로 이어짐 (Low Diversity)
- 긴 문장에 대해 MLE를 해주는 것은 종종 generic, repetitive, awkward한TXT를 생성함
- 이에 해당 논문에선 Nucleus Sampling(Top-p)을제안
- Top-k도 좋은 대안이지만 K, T를 Hyperparameter로튜닝해줘야 함
Neural Text DeGeneration with Unlikelihood Training (Welleck et al., 2020)
- MLE → Human Distribution과다르게 반복/빈번한 단어에 확률을 과하게 부여함. 이는 Dull/Repetitive한응답으로 이어짐
- Top-K, Nucleus Sampling → Token 단위엔 적용 못하고 Maximization 기법과 완전 다른 기법
- 그렇다면 본질적으로 왜 Degeneration될까?
1) Model Architecture → Transformer가 반복적인 말을 선호
2) 인간 언어의 내재적인 특성 → If Beam Search is the answer, What was the question?에서 UID로 해석
3) Training Objective가Fixed Corpora에 의존 → 이를 본 논문에서 Unlikelihood Training으로해결
- Unlikelihood Training은학습 방법이기 때문에 Beam Search에도 적용 가능
PART 2. The Choice of Generation Strategy
Don’t Say That! Making Inconsistent Dialogue Unlikely with Unlikelihood Training (Li et al., 2020)
- MLE 기반의 생성모델은 Context Copy/Repetition/FrequentlyConversation 사용/논리적 결점이 존재함
- 1~3은 비슷한 문제 → Human Distribution과비슷하게 맞춰주기 위해 Unlikelihood Training 적용
- 4는 Dialogue NLI 데이터로 해결 → 다른 Topic
- 1) Unlikelihood Training이무엇인가? 2) 어떻게 반복을 해결하는가?
반복되는 Term이 낮은 확률을 갖도록 강제시킴!
해당 Term은 Human Distribution과MLE가 만드는 Overall Sequence
Prob Distribution의mismatch를조절한다고 함
UL은 Known Bias를 수정!
PART 2. The Choice of Generation Strategy
요약하자면?
Generative 모델 학습을 위해 MLE+UL Loss를 사용했다!
Then, Decoding Strategy는?
Meena Paper에서는 Beam Search는 Sampling기법에 비해 열등하다고 주장
FAIR는 이에 대항하며 MLE Training Objective에 Unlikelihood Loss를 더해 Beam의 문제를 경감시키고
이에 추가적으로 Length Penalty, Beam Blocking(n-gram >= Threshold면 가설 기각)을 적용!
실험 비교를 위해 Top-k 및 Meena의 Sample-and-Rank도 고려
PART 3. Results & Analysis
Training Detail
Pre-training:
1) Ranker model
- FAIRSEQ사용
- 256M는PolyEncoder의Setting으로학습
- 622M는단순한MLM을Large Generative의동일한데이터와 사전에 적용
- 모든 HyperParameter는RoBERTa의추천값을사용
2) Generative Model
- 2.7B/9.4B 공통
- Adam Optimizer,Megatron-LM스타일model-parallelism
- Adafactor,mixedprecisiontraining(SwitchTransformer와유사)
- Batch당
- 2.7B는200k SGD update + Maximum lr 2e-4 liner warmup3,125 steps incsqrt lr scheduler
- 9.4B는200k SGD update + Maximum lr 1.15e-4 linear warmup2,400 steps
Fine-tuning: ParlAI toolkit 사용
PART 3. Results & Analysis
Evaluation Metrics
- ACUTE-Eval: Human-Like 평가지표 고안
- Engagingness Questions: “누구와 더 오래 대화를 하고싶은가?”
- Humanness Questions: “어떤 Speaker가 더 사람같은가?”
- Self-Chat ACUTE-Eval: ParlAI의 기능, Self-Chat 기능으로 직접 평가
PART 3. Results & Analysis
PART 3. Results & Analysis
PART 3. Results & Analysis
PART 3. Results & Analysis
PART 3. Results & Analysis
PART 3. Results & Analysis
PART 3. Results & Analysis
PART 3. Results & Analysis
PART 3. Results & Analysis
PART 3. Results & Analysis
PART 3. Results & Analysis
Limitation
1) Vocabulary Usage: Beam Search 기반 모델은 너무 일반적인 단어를 자주 생성하고 드문 단어를 드물게
생성 → 아직도 큰 개선은 X
2) Nontrivial Repetition: 자신이 말한 것을 반복
3) Contradiction and Forgetfulness: Large에선덜하지만 모순적이며 논리적 링크를 만드는 것을 잘 못함
4) Knowledge and factual correctness: 사실적 오류를 만들기 쉬움
5) Conversation length and memory: 지루하고 반복적인 대화를 지속, 이전대화기억 X
6) Deeper Understanding: 더 많은 대화를 통해 개념을 배울 수 있는 능력 부족. 현실 세계 실체, 행동, 경험
에 기초할 방법이 없음
PART 3. Results & Analysis
Conclusion
아직 너무 부족한 모델
Unlikelihood 등 여러 concept를도입했으나FAIR, Google이니까가능한 부분도 크고 Human-level엔 아
직 너무 멀었음
그러나 방향성 제시 및 Meena 대비 대화의 본질에 대해 고려해보는 점이 고무적
ParlAI 코드도 활용성이 높아보임 (분산처리등의 코드는 아직 미비한 듯)
OpenChat으로다들 활용해보세요!
https://github.com/hyunwoongko/openchat
감사합니다 ☺

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

位置データもPythonで!!!
位置データもPythonで!!!位置データもPythonで!!!
位置データもPythonで!!!
 
책 읽어주는 딥러닝: 배우 유인나가 해리포터를 읽어준다면 DEVIEW 2017
책 읽어주는 딥러닝: 배우 유인나가 해리포터를 읽어준다면 DEVIEW 2017책 읽어주는 딥러닝: 배우 유인나가 해리포터를 읽어준다면 DEVIEW 2017
책 읽어주는 딥러닝: 배우 유인나가 해리포터를 읽어준다면 DEVIEW 2017
 
Python과 Tensorflow를 활용한 AI Chatbot 개발 및 실무 적용
Python과 Tensorflow를 활용한  AI Chatbot 개발 및 실무 적용Python과 Tensorflow를 활용한  AI Chatbot 개발 및 실무 적용
Python과 Tensorflow를 활용한 AI Chatbot 개발 및 실무 적용
 
인공지능추천시스템 airs개발기_모델링과시스템
인공지능추천시스템 airs개발기_모델링과시스템인공지능추천시스템 airs개발기_모델링과시스템
인공지능추천시스템 airs개발기_모델링과시스템
 
2011 H3 컨퍼런스-파이썬으로 클라우드 하고 싶어요
2011 H3 컨퍼런스-파이썬으로 클라우드 하고 싶어요2011 H3 컨퍼런스-파이썬으로 클라우드 하고 싶어요
2011 H3 컨퍼런스-파이썬으로 클라우드 하고 싶어요
 
Python을 활용한 챗봇 서비스 개발 1일차
Python을 활용한 챗봇 서비스 개발 1일차Python을 활용한 챗봇 서비스 개발 1일차
Python을 활용한 챗봇 서비스 개발 1일차
 
Race condition
Race conditionRace condition
Race condition
 
인프런 - 스타트업 인프랩 시작 사례
인프런 - 스타트업 인프랩 시작 사례인프런 - 스타트업 인프랩 시작 사례
인프런 - 스타트업 인프랩 시작 사례
 
How To Become Better Engineer
How To Become Better EngineerHow To Become Better Engineer
How To Become Better Engineer
 
오토인코더의 모든 것
오토인코더의 모든 것오토인코더의 모든 것
오토인코더의 모든 것
 
Transformerを多層にする際の勾配消失問題と解決法について
Transformerを多層にする際の勾配消失問題と解決法についてTransformerを多層にする際の勾配消失問題と解決法について
Transformerを多層にする際の勾配消失問題と解決法について
 
기계학습 / 딥러닝이란 무엇인가
기계학습 / 딥러닝이란 무엇인가기계학습 / 딥러닝이란 무엇인가
기계학습 / 딥러닝이란 무엇인가
 
LSTM (Long short-term memory) 概要
LSTM (Long short-term memory) 概要LSTM (Long short-term memory) 概要
LSTM (Long short-term memory) 概要
 
한국어 MRC 연구를 위한 표준 데이터셋(KorQuAD) 소개 및 B2B를 위한 MRC 연구 사례
한국어 MRC 연구를 위한 표준 데이터셋(KorQuAD) 소개 및 B2B를 위한 MRC 연구 사례한국어 MRC 연구를 위한 표준 데이터셋(KorQuAD) 소개 및 B2B를 위한 MRC 연구 사례
한국어 MRC 연구를 위한 표준 데이터셋(KorQuAD) 소개 및 B2B를 위한 MRC 연구 사례
 
DeBERTA : Decoding-Enhanced BERT with Disentangled Attention
DeBERTA : Decoding-Enhanced BERT with Disentangled AttentionDeBERTA : Decoding-Enhanced BERT with Disentangled Attention
DeBERTA : Decoding-Enhanced BERT with Disentangled Attention
 
深層学習の不確実性 - Uncertainty in Deep Neural Networks -
深層学習の不確実性 - Uncertainty in Deep Neural Networks -深層学習の不確実性 - Uncertainty in Deep Neural Networks -
深層学習の不確実性 - Uncertainty in Deep Neural Networks -
 
KorQuAD v2.0 소개
KorQuAD v2.0 소개KorQuAD v2.0 소개
KorQuAD v2.0 소개
 
Dependency Parser, 의존 구조 분석기
Dependency Parser, 의존 구조 분석기Dependency Parser, 의존 구조 분석기
Dependency Parser, 의존 구조 분석기
 
Big Bird - Transformers for Longer Sequences
Big Bird - Transformers for Longer SequencesBig Bird - Transformers for Longer Sequences
Big Bird - Transformers for Longer Sequences
 
카카오톡으로 여친 만들기 2013.06.29
카카오톡으로 여친 만들기 2013.06.29카카오톡으로 여친 만들기 2013.06.29
카카오톡으로 여친 만들기 2013.06.29
 

Similar a Blenderbot

Similar a Blenderbot (20)

Lessons learned from building practical deep learning systems
Lessons learned from building practical deep learning systemsLessons learned from building practical deep learning systems
Lessons learned from building practical deep learning systems
 
Deep Learning and the state of AI / 2016
Deep Learning and the state of AI / 2016Deep Learning and the state of AI / 2016
Deep Learning and the state of AI / 2016
 
Stefan Geissler kairntech - SDC Nice Apr 2019
Stefan Geissler kairntech - SDC Nice Apr 2019 Stefan Geissler kairntech - SDC Nice Apr 2019
Stefan Geissler kairntech - SDC Nice Apr 2019
 
Dialogue system②
Dialogue system②Dialogue system②
Dialogue system②
 
Nlp research presentation
Nlp research presentationNlp research presentation
Nlp research presentation
 
Interpretable Machine Learning
Interpretable Machine LearningInterpretable Machine Learning
Interpretable Machine Learning
 
Breaking Through The Challenges of Scalable Deep Learning for Video Analytics
Breaking Through The Challenges of Scalable Deep Learning for Video AnalyticsBreaking Through The Challenges of Scalable Deep Learning for Video Analytics
Breaking Through The Challenges of Scalable Deep Learning for Video Analytics
 
Distributed Models Over Distributed Data with MLflow, Pyspark, and Pandas
Distributed Models Over Distributed Data with MLflow, Pyspark, and PandasDistributed Models Over Distributed Data with MLflow, Pyspark, and Pandas
Distributed Models Over Distributed Data with MLflow, Pyspark, and Pandas
 
How do OpenAI GPT Models Work - Misconceptions and Tips for Developers
How do OpenAI GPT Models Work - Misconceptions and Tips for DevelopersHow do OpenAI GPT Models Work - Misconceptions and Tips for Developers
How do OpenAI GPT Models Work - Misconceptions and Tips for Developers
 
BRV CTO Summit Deep Learning Talk
BRV CTO Summit Deep Learning TalkBRV CTO Summit Deep Learning Talk
BRV CTO Summit Deep Learning Talk
 
XP2018 presentation for Phoenix Scrum User Group 2018
XP2018 presentation for Phoenix Scrum User Group 2018XP2018 presentation for Phoenix Scrum User Group 2018
XP2018 presentation for Phoenix Scrum User Group 2018
 
AI and Deep Learning
AI and Deep Learning AI and Deep Learning
AI and Deep Learning
 
ChatGPT-and-Generative-AI-Landscape Working of generative ai search
ChatGPT-and-Generative-AI-Landscape Working of generative ai searchChatGPT-and-Generative-AI-Landscape Working of generative ai search
ChatGPT-and-Generative-AI-Landscape Working of generative ai search
 
2014 pycon-talk
2014 pycon-talk2014 pycon-talk
2014 pycon-talk
 
Deep learning for NLP
Deep learning for NLPDeep learning for NLP
Deep learning for NLP
 
Bridging the gap between AI and UI - DSI Vienna - full version
Bridging the gap between AI and UI - DSI Vienna - full versionBridging the gap between AI and UI - DSI Vienna - full version
Bridging the gap between AI and UI - DSI Vienna - full version
 
A Research Paper on HUMAN MACHINE CONVERSATION USING CHATBOT
A Research Paper on HUMAN MACHINE CONVERSATION USING CHATBOTA Research Paper on HUMAN MACHINE CONVERSATION USING CHATBOT
A Research Paper on HUMAN MACHINE CONVERSATION USING CHATBOT
 
Machine Learning in NLP
Machine Learning in NLPMachine Learning in NLP
Machine Learning in NLP
 
What’s next for deep learning for Search?
What’s next for deep learning for Search?What’s next for deep learning for Search?
What’s next for deep learning for Search?
 
Introduction to LLMs
Introduction to LLMsIntroduction to LLMs
Introduction to LLMs
 

Más de taeseon ryu

VoxelNet
VoxelNetVoxelNet
VoxelNet
taeseon ryu
 
OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...
taeseon ryu
 
RL_UpsideDown
RL_UpsideDownRL_UpsideDown
RL_UpsideDown
taeseon ryu
 
MOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement LearningMOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement Learning
taeseon ryu
 

Más de taeseon ryu (20)

VoxelNet
VoxelNetVoxelNet
VoxelNet
 
OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...
 
3D Gaussian Splatting
3D Gaussian Splatting3D Gaussian Splatting
3D Gaussian Splatting
 
JetsonTX2 Python
 JetsonTX2 Python  JetsonTX2 Python
JetsonTX2 Python
 
Hyperbolic Image Embedding.pptx
Hyperbolic  Image Embedding.pptxHyperbolic  Image Embedding.pptx
Hyperbolic Image Embedding.pptx
 
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
 
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdfLLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
 
YOLO V6
YOLO V6YOLO V6
YOLO V6
 
Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories
 
RL_UpsideDown
RL_UpsideDownRL_UpsideDown
RL_UpsideDown
 
Packed Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation ExtractionPacked Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation Extraction
 
MOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement LearningMOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement Learning
 
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language ModelsScaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language Models
 
Visual prompt tuning
Visual prompt tuningVisual prompt tuning
Visual prompt tuning
 
mPLUG
mPLUGmPLUG
mPLUG
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
 
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdfReinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
 
The Forward-Forward Algorithm
The Forward-Forward AlgorithmThe Forward-Forward Algorithm
The Forward-Forward Algorithm
 
Towards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural NetworksTowards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural Networks
 
BRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive SummarizationBRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive Summarization
 

Último

The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
seri bangash
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
1301aanya
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
Areesha Ahmad
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
NazaninKarimi6
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
Areesha Ahmad
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
Silpa
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
Silpa
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
MohamedFarag457087
 

Último (20)

Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Velocity and Acceleration PowerPoint.ppt
Velocity and Acceleration PowerPoint.pptVelocity and Acceleration PowerPoint.ppt
Velocity and Acceleration PowerPoint.ppt
 
Chemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfChemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdf
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
Exploring Criminology and Criminal Behaviour.pdf
Exploring Criminology and Criminal Behaviour.pdfExploring Criminology and Criminal Behaviour.pdf
Exploring Criminology and Criminal Behaviour.pdf
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 

Blenderbot

  • 1. BlenderBot Recipes for building an open-domain chatbot Facebook의 open-domain chatbot을 위한 2년간의 연구 집약체 자연어처리팀: 백지윤 주정헌 진명훈 황소현 (발표자)
  • 3. Large Model History (Google, OpenAI) 2018 2021 2019 2020 https://qiita.com/Afo_guard_enthusiast/items/b37830430e465f397983 2017/06/02 Transformers Google 2018/06/11 GPT-1 OpenAI 2018/10/11 BERT Google 2019/02/14 GPT-2 OpenAI 2020/02/27 Meena Google 2020/05/28 GPT-3 OpenAI
  • 4. Large Model History (Google, OpenAI, and FAIR) 2018 2021 2019 2020 2017/06/02 Transformers Google 2018/06/11 GPT-1 OpenAI 2018/10/11 BERT Google 2019/02/14 GPT-2 OpenAI 2020/02/27 Meena Google 2020/05/28 GPT-3 OpenAI 2020/04/29 BlenderBot FAIR 2020/01 KVMemNN FAIR 2020/02 Specificity Control FAIR 2020/04/22 ConvAI2 Poly-Encoder FAIR 2020/01 BST Poly-Encoder FAIR https://qiita.com/Afo_guard_enthusiast/items/b37830430e465f397983 https://ai.facebook.com/blog/state-of-the-art-open-source-chatbot/
  • 5. ParlAI’s Projects 대화 모델의 학습 및 평가를 위해 Facebook AI Research가개발한Library https://parl.ai/projects/ Generative Models • UnlikelihoodTraining for Consistence Dialogue • Retrieve andRefine • Importance of SearchStrategy Retrieval Models • Poly-Encoders Open-Domain Dialogues • AddressingContradictions in Dialogue Modeling • Recipes for open-domain chatbots • Blended Skill Talk • dodeca Dialogue • Dialogue Natural Language Inference • Empathetic Dialogues • ConvAI2 Competition • Persona-Chat Well-Behaved • Recipes for Safety in Open-DomainChatbots • MitigatingGenderationBias KnowledgeGrounded • Wizardof Wikipedia Evaluation • ACUTE-Eval
  • 6. BlenderBot’s Contribution https://parl.ai/projects/ Generative Models • UnlikelihoodTraining for Consistence Dialogue (MLE 노노) • Retrieve andRefine (생성/검색의장점만 쏙쏙) • Importance of SearchStrategy (BeamBlocking) Retrieval Models • Poly-Encoders (Bi/Cross Encoder의장점만쏙쏙) Open-Domain Dialogues • AddressingContradictions in Dialogue Modeling • Recipes for open-domainchatbots • Blended Skill Talk (d. 여러 대화 기술을 적절히 사용) • dodeca Dialogue • Dialogue Natural Language Inference • Empathetic Dialogues (b.공감하는대화) • ConvAI2 Competition(a. 매력적인 대화 + Persona) • Persona-Chat Well-Behaved • Recipes for Safety in Open-DomainChatbots • MitigatingGenderationBias KnowledgeGrounded • Wizardof Wikipedia (c.전문적인 대화) Evaluation • ACUTE-Eval (Automatic평가지표?기계적 이제 그만!) 1. Large Corpora로pre-training하는것이중요하다는 것은 이미 알려진 사실 • Transformer MemoryNetworks(Dinan et al.,2020) • Dialogpt (Zhang et al.,2019) • Meena (Adiwardanaet al., 2020) 2. 좋은 대화란? 아래 대화 스킬들을 요구한다고 합니다. a. 상대로 하여금 매력적으로 보이게 만들며 b. 내 주장만 펼치는 것이 아니고 공감해주며 c. 자신의 전공/흥미 분야에 있어선 전문적인 대화가 가능한 d. 자신의 인격을 유지하며 위의 대화 스킬을 골고루 사용 3. GenerationStrategy를적절히선택하는 것이 중요! • 사람처럼 발화하기 위해서 어떤 생성 전략을 선택할 것인가? • 생성(Generative) vs 검색(Retrieval)? 둘의장점을채택! • Beam Search도이것저것붙여주면 충분히 잘해요~ • Ex) Unlikelihood,Beam Blocking,Length Penalty 그리고, BLEU, Hit@k,Rouge,PPL 등Automatic 평가지표? 사람과 차이 존재! → 비용도 적고 사람답게 평가할 순 없을까?? → ACUTE-Eval!Meena랑비교해봐야지~
  • 7. Contents ✓ PART 1. The Data Good Conversation? Blended Skills! • 데이터셋에 대한 설명 (ConvAI2,ED,WOW,BST,and Reddit) ✓ PART 2. The Choice of GenerationStrategy Model Architecture, Training Objective, Decoding • Poly-Encoder, BART • Retrieve and Refine • Unlikelihood Training • Decoding Strategy ✓ PART 3. Results & Analysis Human Evaluation using ACUTE-Eval • ACUTE-Eval 평가 지표 소개 및 Ablation Study, Meena와의 비교
  • 8. PART 1. The Data 1. Pre-training Data: Reddit & Subreddit Data • Humeau et al., 2020(Poly-Encoder)과Mazare et al., 2018(TrainingMillions of Personalized Dialogue Agents) • Pre-Train을BERT처럼위키/TorontoBook으로시키는것이 아닌 엄청 많은 대화 데이터가 있는 Reddit으로시키면성능 up https://arxiv.org/pdf/1905.01969.pdf Transfertransfo Advanced ESIM HRDE+LTC Enhanced Word Repr Seqeuntial Mathching Network
  • 9. PART 1. The Data 1. Pre-training Data: Reddit & Subreddit Data • Humeau et al., 2020(Poly-Encoder)과Mazare et al., 2018(TrainingMillions of Personalized Dialogue Agents) • Pre-Train을BERT처럼위키/TorontoBook으로시키는것이 아닌 엄청 많은 대화 데이터가 있는 Reddit으로시키면성능 up • 데이터셋 정제를 위해 아래와 같은 HeuristicRule을적용하여Filtering을실시함 1. The author is a known bot. 2. It comes from a known non-Englishsubreddit. 3. The comment is marked as removed(운영자가제거) / deleted(작성자가삭제). 4. It is longer than 2,048 charactersand does not contain spaces. 5. It is longer than 128 BPE tokens. 6. It is shorter than 5 characters. 7. It contains a URL. 8. It starts with a non-ASCII character. 9. It is further than depth 7 in the thread. • 그러나 여전히 Toxicity,Noise, 그룹 토론 등의 문제가 남아있음 → Safety Project!(offensive detection)
  • 10. PART 1. The Data 2. Fine-tuning Data: ① ConvAI2 • Personality(인간성), Engagingness(참여도, 매력) • The Second Conversational Intelligence Challenge • Aim to engage humans! • 서로에 대해 모르는 두 명이 서로를 알아가는 대화! • 상대방의 Persona는볼수 없음 • num_examples:131,438, num_episodes:17,878 140K https://arxiv.org/pdf/1902.00098.pdf https://medium.com/neuromation-blog/neuromation-researchers-won-the-neurips-convai2-competition-ae7375c5f9cf
  • 11. PART 1. The Data 2. Fine-tuning Data: ② Empathetic Dialogues • Empathetic(감정,공감해주기) • FAIR 인턴이셨던 Hannah Rashkin님의paper • Experiencer가자신의상황에 대해서 설명을 하면 • Responder(Listener)는이에대해 공감을 해줌 • Train/Valid/Test 19,533/2,770/2,547 conversation • 양 side 대화를 모아 약 50K의utterance로학습 https://arxiv.org/pdf/1811.00207.pdf
  • 12. PART 1. The Data 2. Fine-tuning Data: ③ Wizard of Wikipedias • Representation of Knowledge,Expertise • 특정 주제에 대해 심도있게 논의하자! • Wizard(Expert)와Apprentice(Curious Learner)와의대화 • Wizard를Bot으로대체하는것이 목적! • 대화 IR(TF-IDF) 시스템이학습자의 질문에 가장 관련있는 위키피디아 Paragraph를검색(top-7) 194k utterance https://arxiv.org/pdf/1811.01241.pdf
  • 13. PART 1. The Data 2. Fine-tuning Data: ④ BlendedSkill Talk • Can You Put it All Together?(Smith et al.,2020) • 공감/매력, 인격유지/전문가 → 다들 개별적으로 좋은 대화 스킬 • 그러나 하나의 스킬만 마스터하는 것보다 이를 적절히 활용하는 것이 중요 • BST는 이를 위한 Dataset • (num_episodes) Train/Valid/Test 4,819/1,009/980 • (num_examples) Train/Valid/Test 27,018/5,651/5,482 • TOTAL 38,151*2 = 76,302 • 각 답변의 추천은 Single SkillModel의output https://arxiv.org/pdf/2004.08449.pdf
  • 14.
  • 15. PART 2. The Choice of Generation Strategy Model Architecture: 다음 3가지의 구조를 고려함 - Retrieval - Generative - Retrieve-and-Refine
  • 16. PART 2. The Choice of Generation Strategy Model Architecture: 다음 3가지의 구조를 고려함 - Retrieval→ Poly-Encoder - Generative - Retrieve-and-Refine https://arxiv.org/pdf/1905.01969.pdf https://towardsdatascience.com/blender-bot-part-2-the-transformer-2e4d960b149f
  • 17. red() - First Pooling - Avg over all outputs - Avg over the first m <= N outputs 1st Attention: Self-Attention over the InputContext’s tokens 2nd Attention: Cross-AttentionBetween Context’s output and learned m codes 3rd Attention: Cross-AttentionBetween ‘m’ global featuresand Candidate embedding
  • 18. PART 2. The Choice of Generation Strategy Model Architecture: 다음 3가지의 구조를 고려함 - Retrieval→ Poly-Encoder - Generative - Retrieve-and-Refine BiEncoder→ 성능이 별로 CrossEncoder→ Inference 속도너무 느림 PolyEncoder→ 위 두 단점을 해결! PolyEncoder Retrieval모델! - 흥미로운 답변 가능 - 그러나 지정된 검색 Set으로발화가 제한됨
  • 19. PART 2. The Choice of Generation Strategy Model Architecture: 다음 3가지의 구조를 고려함 - Retrieval - Generative→ Seq2Seq Transformers (BART) - Retrieve-and-Refine Generative 모델로사람과 얘기하면 되는거 아닌가? 그러나 생성 모델로 대화를 하면 아래 문제가 발생 - 짧은 문장으로 말하려는 경향이 있음 (Objective..) - 일반적인, 즉 매력도가 떨어지는 대답을 하려고 함 위는 보통 Maximization 기반의Beam Search의문제로지적되곤 했지만, 이를 RetrievalModel과mix하여문제를해결코자 함 https://www.youtube.com/watch?v=VmYMnpDLPEo
  • 20. PART 2. The Choice of Generation Strategy Model Architecture: 다음 3가지의 구조를 고려함 - Retrieval - Generative - Retrieve-and-Refine https://arxiv.org/pdf/1808.04776.pdf Poly Encoder Input Context Candidate Labels Input Context Retrieval Next Utterance [SEP] Decoder Generated Next Utterance Dialog Retrieve and Refine
  • 21. PART 2. The Choice of Generation Strategy Model Architecture: 다음 3가지의 구조를 고려함 - Retrieval - Generative - Retrieve-and-Refine https://arxiv.org/pdf/1808.04776.pdf Poly Encoder Input Context Candidate Labels Decoder Generated Next Utterance KnowledgeRetrieve and Refine IR System (TF-IDF) Retrieval Next Utterance
  • 22. PART 2. The Choice of Generation Strategy Training Objevtives - Retrieval Poly Encoder(Homeauet al., 2019)와 동일하게 CrossEntropyLoss로 학습 Negative Training(학습 당시 Batch 중 첫번째만 Positive 나머진 Negative Sample로 구축) - Generative Maximum LikelihoodEstimation→ 문제가 있음... - Retrieve-and-Refine 각각 모델을 학습시키면 됨. Dialog RetNRef의 경우, Retrieve를좀 더 수행할 수 있게 𝛼-Blending을실시해줌. Gold Response를 RetrievedResponse로 𝛼 만큼 바꿔줌
  • 23. PART 2. The Choice of Generation Strategy Rethink: Retrieval
  • 24. PART 2. The Choice of Generation Strategy Rethink: Generative with MLE
  • 25. PART 2. The Choice of Generation Strategy The Curious Case of Neural Text DeGeneration (Holtzman et al., 2020) - Likelihood를objective로 사용하는 것이 NLU 전반에 걸쳐 고품질 모델로 만들어주지만 (High Grammatically) - Maximization 기반 Decoding 전략 (ex> Beam or Greedy)은 Text Degeneration으로 이어짐 (Low Diversity) - 긴 문장에 대해 MLE를 해주는 것은 종종 generic, repetitive, awkward한TXT를 생성함 - 이에 해당 논문에선 Nucleus Sampling(Top-p)을제안 - Top-k도 좋은 대안이지만 K, T를 Hyperparameter로튜닝해줘야 함 Neural Text DeGeneration with Unlikelihood Training (Welleck et al., 2020) - MLE → Human Distribution과다르게 반복/빈번한 단어에 확률을 과하게 부여함. 이는 Dull/Repetitive한응답으로 이어짐 - Top-K, Nucleus Sampling → Token 단위엔 적용 못하고 Maximization 기법과 완전 다른 기법 - 그렇다면 본질적으로 왜 Degeneration될까? 1) Model Architecture → Transformer가 반복적인 말을 선호 2) 인간 언어의 내재적인 특성 → If Beam Search is the answer, What was the question?에서 UID로 해석 3) Training Objective가Fixed Corpora에 의존 → 이를 본 논문에서 Unlikelihood Training으로해결 - Unlikelihood Training은학습 방법이기 때문에 Beam Search에도 적용 가능
  • 26. PART 2. The Choice of Generation Strategy Don’t Say That! Making Inconsistent Dialogue Unlikely with Unlikelihood Training (Li et al., 2020) - MLE 기반의 생성모델은 Context Copy/Repetition/FrequentlyConversation 사용/논리적 결점이 존재함 - 1~3은 비슷한 문제 → Human Distribution과비슷하게 맞춰주기 위해 Unlikelihood Training 적용 - 4는 Dialogue NLI 데이터로 해결 → 다른 Topic - 1) Unlikelihood Training이무엇인가? 2) 어떻게 반복을 해결하는가? 반복되는 Term이 낮은 확률을 갖도록 강제시킴! 해당 Term은 Human Distribution과MLE가 만드는 Overall Sequence Prob Distribution의mismatch를조절한다고 함 UL은 Known Bias를 수정!
  • 27. PART 2. The Choice of Generation Strategy 요약하자면? Generative 모델 학습을 위해 MLE+UL Loss를 사용했다! Then, Decoding Strategy는? Meena Paper에서는 Beam Search는 Sampling기법에 비해 열등하다고 주장 FAIR는 이에 대항하며 MLE Training Objective에 Unlikelihood Loss를 더해 Beam의 문제를 경감시키고 이에 추가적으로 Length Penalty, Beam Blocking(n-gram >= Threshold면 가설 기각)을 적용! 실험 비교를 위해 Top-k 및 Meena의 Sample-and-Rank도 고려
  • 28. PART 3. Results & Analysis Training Detail Pre-training: 1) Ranker model - FAIRSEQ사용 - 256M는PolyEncoder의Setting으로학습 - 622M는단순한MLM을Large Generative의동일한데이터와 사전에 적용 - 모든 HyperParameter는RoBERTa의추천값을사용 2) Generative Model - 2.7B/9.4B 공통 - Adam Optimizer,Megatron-LM스타일model-parallelism - Adafactor,mixedprecisiontraining(SwitchTransformer와유사) - Batch당 - 2.7B는200k SGD update + Maximum lr 2e-4 liner warmup3,125 steps incsqrt lr scheduler - 9.4B는200k SGD update + Maximum lr 1.15e-4 linear warmup2,400 steps Fine-tuning: ParlAI toolkit 사용
  • 29. PART 3. Results & Analysis Evaluation Metrics - ACUTE-Eval: Human-Like 평가지표 고안 - Engagingness Questions: “누구와 더 오래 대화를 하고싶은가?” - Humanness Questions: “어떤 Speaker가 더 사람같은가?” - Self-Chat ACUTE-Eval: ParlAI의 기능, Self-Chat 기능으로 직접 평가
  • 30. PART 3. Results & Analysis
  • 31. PART 3. Results & Analysis
  • 32. PART 3. Results & Analysis
  • 33. PART 3. Results & Analysis
  • 34. PART 3. Results & Analysis
  • 35. PART 3. Results & Analysis
  • 36. PART 3. Results & Analysis
  • 37. PART 3. Results & Analysis
  • 38. PART 3. Results & Analysis
  • 39. PART 3. Results & Analysis
  • 40. PART 3. Results & Analysis Limitation 1) Vocabulary Usage: Beam Search 기반 모델은 너무 일반적인 단어를 자주 생성하고 드문 단어를 드물게 생성 → 아직도 큰 개선은 X 2) Nontrivial Repetition: 자신이 말한 것을 반복 3) Contradiction and Forgetfulness: Large에선덜하지만 모순적이며 논리적 링크를 만드는 것을 잘 못함 4) Knowledge and factual correctness: 사실적 오류를 만들기 쉬움 5) Conversation length and memory: 지루하고 반복적인 대화를 지속, 이전대화기억 X 6) Deeper Understanding: 더 많은 대화를 통해 개념을 배울 수 있는 능력 부족. 현실 세계 실체, 행동, 경험 에 기초할 방법이 없음
  • 41. PART 3. Results & Analysis Conclusion 아직 너무 부족한 모델 Unlikelihood 등 여러 concept를도입했으나FAIR, Google이니까가능한 부분도 크고 Human-level엔 아 직 너무 멀었음 그러나 방향성 제시 및 Meena 대비 대화의 본질에 대해 고려해보는 점이 고무적 ParlAI 코드도 활용성이 높아보임 (분산처리등의 코드는 아직 미비한 듯) OpenChat으로다들 활용해보세요!