SlideShare una empresa de Scribd logo
1 de 14
Descargar para leer sin conexión
Introducer: Z.Chen
Points
• A neural discriminative constituency parser - F1 93.55

• Chart parser/decoder

• Encoder-decoder style dcp - the architecture
• Structure meaning of multi-headed self-attention for cp

• 8-layer, 8-head transformer + BiLSTM decoder

• Analysis by input ablation: word, POS and position

• position or content (POS⟺morph, ElMo/CharConcat)

• Metric of tree structure accuracy: ParsEval
Constituency Parsing
Grammar structure CKY-algorithm ChomskyCFG Transition-based
Chart parser
NLP tutorial 10: (11↑)
• Probability as score
• Bottom-up combine
(bracketing per se)
• Beam search
Godfather
Transformer .
word+POS+position
Decomposition,
3 4 5 6 7
A BiLSTM for 

fence points
Incrementally build up
W0 W1 W2 W3 W4<bos> <eos>
CKY
⇊fence points⇊
Incrementally build up
Score for a bracket: (decoder)

How to deal with non-phrase?

• CKY: little probability (PCFG)

• Chen (me): <nil> tag / vector

• This research: s(i, j, ∅) = 0
i, j are fence points;

l is a label
↕ train with ∅ or <nil>
Position Embedding
Encoder: linguistic Information
Word Embedding
POS
Embedding
Input Zdmodel
T
Component-wise add
zt = wt + mt + pt
Since then, zt is sent to the
Transformer and dmodel keeps
throughout the encoder.
Encoder: linguistic Information
zt
xt
yt
xt
xt
Encoder: linguistic Information
qt = WT
Qxt
kt = WT
K xt
vt = WT
V xt
p(i → j)
¯vt
qi
ki
vi vj
kj
qj
p(i → j)
¯vi
xi
“gather information from up to 8 remote locations”
Decoder again
Wi Wj…
Run a BiRNN once
Run a FFN several times
“92.67 F1 on Penn Treebank WSJ dev set”
We must be the 2018 champion! と⼼心が叫びそうだ
T*(T+1)/2 times Δ
Analysis by Input Ablation
zt = wt + mt + pt
Word, POS and position embeddings are
added, but also overlapped:
qt = WT
Qzt
kt = WT
K zt
vt = WT
V zt
p(i → j)
¯vt
qt = WT
Q pt
kt = WT
K pt
vt = WT
V zt
Layer-wise disabled
“it seems strange that content-based attention
benefits our model to such a small degree.”
Decomposition on i/w
zt = wt + mt + pt
zt = [wt + mt; pt]
F1 92.60
F1 92.67
1. Decompose input
2. Decompose attention
q ⋅ k
q = q(c)
+ q(p)
k = k(c)
+ k(p)
k ⋅ q = (q(c)
+ q(p)
) ⋅ (k(c)
+ k(p)
)
k(c)
⋅ q(p)
+ k(p)
⋅ q(c)
All mix-up:
An example of cross-terms:

“the word the always attends to the 5th
position in the sentence”
xt = [x(c)
; x(p)
]
c = Wx = [c(c)
; c(p)
] = [W(c)
x(c)
; W(p)
x(p)
]
F1 93.15 (+0.5)
all on dev set
Analysis by Constrains
“When we began to investigate how the model makes use
of long-distance attention, we found that there are
particular attention heads at some layers in our model
that almost always attend to the start token.”
RECALL: There are 8 heads in
each of the transformer layer.

“This suggests that the start token
is being used as the location for
some sentence-wide pooling/
processing, or perhaps as a
dummy target location when a
head fails to find the particular
phenomenon that it’s learned to
search for.”

In short, it is a dustbin for
redundant .attention
WinA WinA + some spec
←Train with window
and then test on dev

8 layers :)
5 Lexical ModelsPOS tags from Stanford parserzt = [wt + mt; pt]
-4 layers at ELMo
pneumonoultramicrosco
picsilicovolcanoconiosis
>> Longtu’s
Finale

Más contenido relacionado

La actualidad más candente

CS2303 Theory of computation April may 2015
CS2303 Theory of computation April may  2015CS2303 Theory of computation April may  2015
CS2303 Theory of computation April may 2015appasami
 
9. ES6 | Let And Const | TypeScript | JavaScript
9. ES6 | Let And Const | TypeScript | JavaScript9. ES6 | Let And Const | TypeScript | JavaScript
9. ES6 | Let And Const | TypeScript | JavaScriptpcnmtutorials
 
[Question Paper] Fundamentals of Digital Computing (Revised Course) [May / 2016]
[Question Paper] Fundamentals of Digital Computing (Revised Course) [May / 2016][Question Paper] Fundamentals of Digital Computing (Revised Course) [May / 2016]
[Question Paper] Fundamentals of Digital Computing (Revised Course) [May / 2016]Mumbai B.Sc.IT Study
 
Cheat Sheets for Hard Problems
Cheat Sheets for Hard ProblemsCheat Sheets for Hard Problems
Cheat Sheets for Hard ProblemsNeeldhara Misra
 
Faster Practical Block Compression for Rank/Select Dictionaries
Faster Practical Block Compression for Rank/Select DictionariesFaster Practical Block Compression for Rank/Select Dictionaries
Faster Practical Block Compression for Rank/Select DictionariesRakuten Group, Inc.
 
SPIRE2013-tabei20131009
SPIRE2013-tabei20131009SPIRE2013-tabei20131009
SPIRE2013-tabei20131009Yasuo Tabei
 
A One-Pass Triclustering Approach: Is There any Room for Big Data?
A One-Pass Triclustering Approach: Is There any Room for Big Data?A One-Pass Triclustering Approach: Is There any Room for Big Data?
A One-Pass Triclustering Approach: Is There any Room for Big Data?Dmitrii Ignatov
 
2015 CMS Winter Meeting Poster
2015 CMS Winter Meeting Poster2015 CMS Winter Meeting Poster
2015 CMS Winter Meeting PosterChelsea Battell
 
CPM2013-tabei201306
CPM2013-tabei201306CPM2013-tabei201306
CPM2013-tabei201306Yasuo Tabei
 
Fast Identification of Heavy Hitters by Cached and Packed Group Testing
Fast Identification of Heavy Hitters by Cached and Packed Group TestingFast Identification of Heavy Hitters by Cached and Packed Group Testing
Fast Identification of Heavy Hitters by Cached and Packed Group TestingRakuten Group, Inc.
 
Weekends with Competitive Programming
Weekends with Competitive ProgrammingWeekends with Competitive Programming
Weekends with Competitive ProgrammingNiharikaSingh839269
 
Automatic variational inference with latent categorical variables
Automatic variational inference with latent categorical variablesAutomatic variational inference with latent categorical variables
Automatic variational inference with latent categorical variablesTomasz Kusmierczyk
 
Lesson 5 Nov 3
Lesson 5 Nov 3Lesson 5 Nov 3
Lesson 5 Nov 3ingroy
 

La actualidad más candente (20)

CS2303 Theory of computation April may 2015
CS2303 Theory of computation April may  2015CS2303 Theory of computation April may  2015
CS2303 Theory of computation April may 2015
 
9. ES6 | Let And Const | TypeScript | JavaScript
9. ES6 | Let And Const | TypeScript | JavaScript9. ES6 | Let And Const | TypeScript | JavaScript
9. ES6 | Let And Const | TypeScript | JavaScript
 
[Question Paper] Fundamentals of Digital Computing (Revised Course) [May / 2016]
[Question Paper] Fundamentals of Digital Computing (Revised Course) [May / 2016][Question Paper] Fundamentals of Digital Computing (Revised Course) [May / 2016]
[Question Paper] Fundamentals of Digital Computing (Revised Course) [May / 2016]
 
Cheat Sheets for Hard Problems
Cheat Sheets for Hard ProblemsCheat Sheets for Hard Problems
Cheat Sheets for Hard Problems
 
Faster Practical Block Compression for Rank/Select Dictionaries
Faster Practical Block Compression for Rank/Select DictionariesFaster Practical Block Compression for Rank/Select Dictionaries
Faster Practical Block Compression for Rank/Select Dictionaries
 
python gil
python gilpython gil
python gil
 
Chapter 22 Finite Field
Chapter 22 Finite FieldChapter 22 Finite Field
Chapter 22 Finite Field
 
QMC: Operator Splitting Workshop, Forward-Backward Splitting Algorithm withou...
QMC: Operator Splitting Workshop, Forward-Backward Splitting Algorithm withou...QMC: Operator Splitting Workshop, Forward-Backward Splitting Algorithm withou...
QMC: Operator Splitting Workshop, Forward-Backward Splitting Algorithm withou...
 
algo1
algo1algo1
algo1
 
1.6 all notes
1.6 all notes1.6 all notes
1.6 all notes
 
Matlab integration
Matlab integrationMatlab integration
Matlab integration
 
SPIRE2013-tabei20131009
SPIRE2013-tabei20131009SPIRE2013-tabei20131009
SPIRE2013-tabei20131009
 
A One-Pass Triclustering Approach: Is There any Room for Big Data?
A One-Pass Triclustering Approach: Is There any Room for Big Data?A One-Pass Triclustering Approach: Is There any Room for Big Data?
A One-Pass Triclustering Approach: Is There any Room for Big Data?
 
2015 CMS Winter Meeting Poster
2015 CMS Winter Meeting Poster2015 CMS Winter Meeting Poster
2015 CMS Winter Meeting Poster
 
CPM2013-tabei201306
CPM2013-tabei201306CPM2013-tabei201306
CPM2013-tabei201306
 
Multipipes
MultipipesMultipipes
Multipipes
 
Fast Identification of Heavy Hitters by Cached and Packed Group Testing
Fast Identification of Heavy Hitters by Cached and Packed Group TestingFast Identification of Heavy Hitters by Cached and Packed Group Testing
Fast Identification of Heavy Hitters by Cached and Packed Group Testing
 
Weekends with Competitive Programming
Weekends with Competitive ProgrammingWeekends with Competitive Programming
Weekends with Competitive Programming
 
Automatic variational inference with latent categorical variables
Automatic variational inference with latent categorical variablesAutomatic variational inference with latent categorical variables
Automatic variational inference with latent categorical variables
 
Lesson 5 Nov 3
Lesson 5 Nov 3Lesson 5 Nov 3
Lesson 5 Nov 3
 

Similar a N20181126

Relaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksRelaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksDavid Gleich
 
Sep logic slide
Sep logic slideSep logic slide
Sep logic sliderainoftime
 
Branch and bounding : Data structures
Branch and bounding : Data structuresBranch and bounding : Data structures
Branch and bounding : Data structuresKàŕtheek Jåvvàjí
 
zkStudyClub: PLONKUP & Reinforced Concrete [Luke Pearson, Joshua Fitzgerald, ...
zkStudyClub: PLONKUP & Reinforced Concrete [Luke Pearson, Joshua Fitzgerald, ...zkStudyClub: PLONKUP & Reinforced Concrete [Luke Pearson, Joshua Fitzgerald, ...
zkStudyClub: PLONKUP & Reinforced Concrete [Luke Pearson, Joshua Fitzgerald, ...Alex Pruden
 
Variational Bayes: A Gentle Introduction
Variational Bayes: A Gentle IntroductionVariational Bayes: A Gentle Introduction
Variational Bayes: A Gentle IntroductionFlavio Morelli
 
System overflow blocking-transients-for-queues-with-batch-arrivals-using-a-fa...
System overflow blocking-transients-for-queues-with-batch-arrivals-using-a-fa...System overflow blocking-transients-for-queues-with-batch-arrivals-using-a-fa...
System overflow blocking-transients-for-queues-with-batch-arrivals-using-a-fa...Cemal Ardil
 
Shape Safety in Tensor Programming is Easy for a Theorem Prover -SBTB 2021
Shape Safety in Tensor Programming is Easy for a Theorem Prover -SBTB 2021Shape Safety in Tensor Programming is Easy for a Theorem Prover -SBTB 2021
Shape Safety in Tensor Programming is Easy for a Theorem Prover -SBTB 2021Peng Cheng
 
Elliptic Curve Cryptography
Elliptic Curve CryptographyElliptic Curve Cryptography
Elliptic Curve CryptographyKelly Bresnahan
 
Paris data-geeks-2013-03-28
Paris data-geeks-2013-03-28Paris data-geeks-2013-03-28
Paris data-geeks-2013-03-28Ted Dunning
 
Tutorial on Object Detection (Faster R-CNN)
Tutorial on Object Detection (Faster R-CNN)Tutorial on Object Detection (Faster R-CNN)
Tutorial on Object Detection (Faster R-CNN)Hwa Pyung Kim
 
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...Seokhwan Kim
 
streamingalgo88585858585858585pppppp.pptx
streamingalgo88585858585858585pppppp.pptxstreamingalgo88585858585858585pppppp.pptx
streamingalgo88585858585858585pppppp.pptxGopiNathVelivela
 
derivative.ppt
derivative.pptderivative.ppt
derivative.pptSpyder20
 
derivative.ppt
derivative.pptderivative.ppt
derivative.pptbahbib22
 
Contrastive Divergence Learning
Contrastive Divergence LearningContrastive Divergence Learning
Contrastive Divergence Learningpenny 梁斌
 
01 knapsack using backtracking
01 knapsack using backtracking01 knapsack using backtracking
01 knapsack using backtrackingmandlapure
 
lecture01_lecture01_lecture0001_ceva.pdf
lecture01_lecture01_lecture0001_ceva.pdflecture01_lecture01_lecture0001_ceva.pdf
lecture01_lecture01_lecture0001_ceva.pdfAnaNeacsu5
 
Compilation of COSMO for GPU using LLVM
Compilation of COSMO for GPU using LLVMCompilation of COSMO for GPU using LLVM
Compilation of COSMO for GPU using LLVMLinaro
 

Similar a N20181126 (20)

Relaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksRelaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networks
 
Sep logic slide
Sep logic slideSep logic slide
Sep logic slide
 
Branch and bounding : Data structures
Branch and bounding : Data structuresBranch and bounding : Data structures
Branch and bounding : Data structures
 
zkStudyClub: PLONKUP & Reinforced Concrete [Luke Pearson, Joshua Fitzgerald, ...
zkStudyClub: PLONKUP & Reinforced Concrete [Luke Pearson, Joshua Fitzgerald, ...zkStudyClub: PLONKUP & Reinforced Concrete [Luke Pearson, Joshua Fitzgerald, ...
zkStudyClub: PLONKUP & Reinforced Concrete [Luke Pearson, Joshua Fitzgerald, ...
 
Variational Bayes: A Gentle Introduction
Variational Bayes: A Gentle IntroductionVariational Bayes: A Gentle Introduction
Variational Bayes: A Gentle Introduction
 
System overflow blocking-transients-for-queues-with-batch-arrivals-using-a-fa...
System overflow blocking-transients-for-queues-with-batch-arrivals-using-a-fa...System overflow blocking-transients-for-queues-with-batch-arrivals-using-a-fa...
System overflow blocking-transients-for-queues-with-batch-arrivals-using-a-fa...
 
Shape Safety in Tensor Programming is Easy for a Theorem Prover -SBTB 2021
Shape Safety in Tensor Programming is Easy for a Theorem Prover -SBTB 2021Shape Safety in Tensor Programming is Easy for a Theorem Prover -SBTB 2021
Shape Safety in Tensor Programming is Easy for a Theorem Prover -SBTB 2021
 
Elliptic Curve Cryptography
Elliptic Curve CryptographyElliptic Curve Cryptography
Elliptic Curve Cryptography
 
Paris data-geeks-2013-03-28
Paris data-geeks-2013-03-28Paris data-geeks-2013-03-28
Paris data-geeks-2013-03-28
 
Tutorial on Object Detection (Faster R-CNN)
Tutorial on Object Detection (Faster R-CNN)Tutorial on Object Detection (Faster R-CNN)
Tutorial on Object Detection (Faster R-CNN)
 
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...
 
streamingalgo88585858585858585pppppp.pptx
streamingalgo88585858585858585pppppp.pptxstreamingalgo88585858585858585pppppp.pptx
streamingalgo88585858585858585pppppp.pptx
 
derivative.ppt
derivative.pptderivative.ppt
derivative.ppt
 
derivative.ppt
derivative.pptderivative.ppt
derivative.ppt
 
Contrastive Divergence Learning
Contrastive Divergence LearningContrastive Divergence Learning
Contrastive Divergence Learning
 
chapter9.ppt
chapter9.pptchapter9.ppt
chapter9.ppt
 
01 knapsack using backtracking
01 knapsack using backtracking01 knapsack using backtracking
01 knapsack using backtracking
 
Codes and Isogenies
Codes and IsogeniesCodes and Isogenies
Codes and Isogenies
 
lecture01_lecture01_lecture0001_ceva.pdf
lecture01_lecture01_lecture0001_ceva.pdflecture01_lecture01_lecture0001_ceva.pdf
lecture01_lecture01_lecture0001_ceva.pdf
 
Compilation of COSMO for GPU using LLVM
Compilation of COSMO for GPU using LLVMCompilation of COSMO for GPU using LLVM
Compilation of COSMO for GPU using LLVM
 

Último

Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxDr.Ibrahim Hassaan
 

Último (20)

Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdf
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptx
 

N20181126

  • 2. Points • A neural discriminative constituency parser - F1 93.55 • Chart parser/decoder • Encoder-decoder style dcp - the architecture • Structure meaning of multi-headed self-attention for cp • 8-layer, 8-head transformer + BiLSTM decoder • Analysis by input ablation: word, POS and position • position or content (POS⟺morph, ElMo/CharConcat) • Metric of tree structure accuracy: ParsEval
  • 3. Constituency Parsing Grammar structure CKY-algorithm ChomskyCFG Transition-based Chart parser NLP tutorial 10: (11↑) • Probability as score • Bottom-up combine (bracketing per se) • Beam search Godfather Transformer . word+POS+position Decomposition, 3 4 5 6 7 A BiLSTM for 
 fence points
  • 4. Incrementally build up W0 W1 W2 W3 W4<bos> <eos> CKY ⇊fence points⇊
  • 5. Incrementally build up Score for a bracket: (decoder) How to deal with non-phrase? • CKY: little probability (PCFG) • Chen (me): <nil> tag / vector • This research: s(i, j, ∅) = 0 i, j are fence points; l is a label ↕ train with ∅ or <nil>
  • 6. Position Embedding Encoder: linguistic Information Word Embedding POS Embedding Input Zdmodel T Component-wise add zt = wt + mt + pt Since then, zt is sent to the Transformer and dmodel keeps throughout the encoder.
  • 8. Encoder: linguistic Information qt = WT Qxt kt = WT K xt vt = WT V xt p(i → j) ¯vt qi ki vi vj kj qj p(i → j) ¯vi xi “gather information from up to 8 remote locations”
  • 9. Decoder again Wi Wj… Run a BiRNN once Run a FFN several times “92.67 F1 on Penn Treebank WSJ dev set” We must be the 2018 champion! と⼼心が叫びそうだ T*(T+1)/2 times Δ
  • 10. Analysis by Input Ablation zt = wt + mt + pt Word, POS and position embeddings are added, but also overlapped: qt = WT Qzt kt = WT K zt vt = WT V zt p(i → j) ¯vt qt = WT Q pt kt = WT K pt vt = WT V zt Layer-wise disabled “it seems strange that content-based attention benefits our model to such a small degree.”
  • 11. Decomposition on i/w zt = wt + mt + pt zt = [wt + mt; pt] F1 92.60 F1 92.67 1. Decompose input 2. Decompose attention q ⋅ k q = q(c) + q(p) k = k(c) + k(p) k ⋅ q = (q(c) + q(p) ) ⋅ (k(c) + k(p) ) k(c) ⋅ q(p) + k(p) ⋅ q(c) All mix-up: An example of cross-terms: “the word the always attends to the 5th position in the sentence” xt = [x(c) ; x(p) ] c = Wx = [c(c) ; c(p) ] = [W(c) x(c) ; W(p) x(p) ] F1 93.15 (+0.5) all on dev set
  • 12. Analysis by Constrains “When we began to investigate how the model makes use of long-distance attention, we found that there are particular attention heads at some layers in our model that almost always attend to the start token.” RECALL: There are 8 heads in each of the transformer layer. “This suggests that the start token is being used as the location for some sentence-wide pooling/ processing, or perhaps as a dummy target location when a head fails to find the particular phenomenon that it’s learned to search for.” In short, it is a dustbin for redundant .attention WinA WinA + some spec ←Train with window and then test on dev 8 layers :)
  • 13. 5 Lexical ModelsPOS tags from Stanford parserzt = [wt + mt; pt] -4 layers at ELMo pneumonoultramicrosco picsilicovolcanoconiosis >> Longtu’s