SlideShare a Scribd company logo
1 of 1
Download to read offline
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions
for Punctuation Restoration
Seokhwan Kim
Adobe Research, San Jose, CA, USA
Punctuation Restoration
Aims to generate the punctuation marks from the
unpunctuated ASR outputs
A sequence labelling task
to predict the punctuation label yt for the t-th timestep
in a given word sequence X = {x1 · · · xt · · · xT }
where C ∈ {‘, COMMA , ‘.PERIOD , ‘?QMARK }
yt =



c ∈ C if a punctuation symbol c is located
between xt−1 and xt,
O otherwise,
Examples
Input: so if we make no changes today what does tomorrow
look like well i think you probably already have the picture
Output: so , if we make no changes today , what does
tomorrow look like ? well , i think you probably already have
the picture .
t xt yt
1 so O
2 if ,COMMA
3 we O
4 make O
5 no O
6 changes O
7 today O
8 what ,COMMA
9 does O
10 tomorrow O
11 look O
t xt yt
12 like O
13 well ?QMARK
14 i ,COMMA
15 think O
16 you O
17 probably O
18 already O
19 have O
20 the O
21 picture O
22 </s> .PERIOD
Related Work
Multiple fully-connected hidden layers [1]
Convolutional neural networks (CNNs) [2]
Recurrent neural networks (RNNs) [4]
RNNs with neural attention mechanisms [3]
Main Contributions
Context encoding from the stacked recurrent layers to
learn more hierarchical aspects
Neural attentions applied on every intermediate
hidden layer to capture the layer-wise features
Multi-head attentions instead of a single attention
function to diversify the layer-wise attentions further
Methods
Model Architecture
yt
st-1st-1
stst
∙∙∙∙∙∙
n-bidirectional recurrent layers
AttentionAttention AttentionAttention AttentionAttention
m-heads mm mm
Attention Attention Attention
m-heads m m
x1 ∙∙∙h1
1h1
1 h2
1h2
1 hn
1hn
1x1 ∙∙∙h1
1 h2
1 hn
1
xT ∙∙∙h1
Th1
T h2
Th2
T hn
Thn
TxT ∙∙∙h1
T h2
T hn
T
h2
th2
txt ∙∙∙h1
th1
t hn
thn
th2
txt ∙∙∙h1
t hn
t
∙∙∙
∙∙∙∙∙∙
∙∙∙∙∙∙
∙∙∙∙∙∙
∙∙∙
∙∙∙
∙∙∙
∙∙∙
∙∙∙
∙∙∙∙∙∙
∙∙∙∙∙∙
∙∙∙∙∙∙
∙∙∙
∙∙∙
∙∙∙
∙∙∙
Word Embedding
Each word xt in a given input sequence X is represented as a
d-dimensional distributed vector xt ∈ Rd
Word embeddings can be learned from scratch with random
initialization or fine-tuned from pre-trained word vectors
Bi-directional Recurrent Layers
Stacking multiple recurrent layers on top of each other
−→
h i
t =



GRU xt,
−→
h 1
t−1 if i = 1,
GRU hi−1
t ,
−→
h i
t−1 if i = 2, · · · n,
The backward state
←−
h i
t is computed also with the same way
but in the reverse order from the end of the sequence T to t.
Both directional states are concatenated into the output state
hi
t =
−→
h i
t,
←−
h i
t ∈ R2d
to represent both contexts together
Uni-directional Recurrent Layer
The top-layer outputs [hn
1, · · · , hn
T ] are forwarded to a
uni-directional recurrent layer
st = GRU (hn
t , st−1) ∈ Rd
,
Multi-head Attentions
The hidden state st constitutes a query to neural attentions
based on scaled dot-product attention
Attn (Q, K, V) = softmax
QKT
√
d
V,
Multi-head attentions are applied for every layer separately to
compute the weighted contexts from different subspaces
fi,j
= Attn S · Wi,j
Q , Hi
· Wi,j
K , Hi
· Wi,j
V ,
where S = [s1; s2; · · · ; sT ] and Hi
= hi
1; hi
2; · · · ; hi
T
Softmax Classification
yt = softmax st, f1,1
t , · · · , fn,m
t Wy + by
Experimental Setup
IWSLT dataset
English reference transcripts of TED Talks
2.1M, 296k, and 13k words for training,
development, and test on reference, respectively
Model Configurations
Number of recurrent layers: n ∈ {1, 2, 3, 4}
Number of attention heads: m ∈ {1, 2, 3, 4, 5}
Attended contexts: layer-wise or only on top
Implementation Details
Word embeddings: 50d GloVe on Wikipedia
Optimizer: Adam optimizer
Loss: Negative log likelihood loss
Hidden dimension d = 256
The same dimension across all the components
Except the word embedding layer
Mini-batch size: 128
Dropout rate: 0.5
Early stopping within 100 epochs
Results
Comparisons of the performances with
different network configurations
Single-head
Top-layer
Single-head
Layer-wise
Multi-head
Top-layer
Multi-head
Layer-wise
0.62
0.63
0.64
0.65
0.66
0.67
0.68
F-measure
1 layer 2 layers 3 layers 4 layers
Results
COMMA PERIOD QUESTION OVERALL
Models P R F P R F P R F P R F SER
DNN-A [1] 48.6 42.4 45.3 59.7 68.3 63.7 - - - 54.8 53.6 54.2 66.9
CNN-2A [1] 48.1 44.5 46.2 57.6 69.0 62.8 - - - 53.4 55.0 54.2 68.0
T-LSTM [2] 49.6 41.4 45.1 60.2 53.4 56.6 57.1 43.5 49.4 55.0 47.2 50.8 74.0
T-BRNN [3] 64.4 45.2 53.1 72.3 71.5 71.9 67.5 58.7 62.8 68.9 58.1 63.1 51.3
T-BRNN-pre [3] 65.5 47.1 54.8 73.3 72.5 72.9 70.7 63.0 66.7 70.0 59.7 64.4 49.7
Single-BiRNN [4] 62.2 47.7 54.0 74.6 72.1 73.4 67.5 52.9 59.3 69.2 59.8 64.2 51.1
Corr-BiRNN [4] 60.9 52.4 56.4 75.3 70.8 73.0 70.7 56.9 63.0 68.6 61.6 64.9 50.8
DRNN-LWMA 63.4 55.7 59.3 76.0 73.5 74.7 75.0 71.7 73.3 70.0 64.6 67.2 47.3
DRNN-LWMA-pre 62.9 60.8 61.9 77.3 73.7 75.5 69.6 69.6 69.6 69.9 67.2 68.6 46.1
so
if
wemake
nochangestodaywhatdoestomorrow
look
likewell
ithink
youprobablyalreadyhave
thepicture
so
if
we
make
no
changes
today
what
does
tomorrow
look
like
well
i
think
you
probably
already
have
the
picture
so
if
wemake
nochangestodaywhatdoestomorrow
look
likewell
ithink
youprobablyalreadyhave
thepicture
so
if
we
make
no
changes
today
what
does
tomorrow
look
like
well
i
think
you
probably
already
have
the
picture
so
if
wemake
nochangestodaywhatdoestomorrow
look
likewell
ithink
youprobablyalreadyhave
thepicture
so
if
we
make
no
changes
today
what
does
tomorrow
look
like
well
i
think
you
probably
already
have
the
picture
so
if
wemake
nochangestodaywhatdoestomorrow
look
likewell
ithink
youprobablyalreadyhave
thepicture
so
if
we
make
no
changes
today
what
does
tomorrow
look
like
well
i
think
you
probably
already
have
the
picture
References
[1] Xiaoyin Che, ChengWang, Haojin Yang, and Christoph Meinel, ‘Punctuation prediction for unsegmented transcript based on word vector’, LREC 2016.
[2] Ottokar Tilk and Tanel Alumae, ‘LSTM for punctuation restoration in speech transcripts’, INTERSPEECH 2015.
[3] Ottokar Tilk and Tanel Alumae, ‘Bidirectional recurrent neural network with attention mechanism for punctuation restoration’, INTERSPEECH 2016.
[4] Vardaan Pahuja et al., ‘Joint learning of correlated sequence labelling tasks using bidirectional recurrent neural networks’, INTERSPEECH 2017.
345 Park Avenue, San Jose, CA 95110, USA Email: seokim@adobe.com

More Related Content

What's hot

Linear Cryptanalysis Lecture 線形解読法
Linear Cryptanalysis Lecture 線形解読法Linear Cryptanalysis Lecture 線形解読法
Linear Cryptanalysis Lecture 線形解読法Kai Katsumata
 
Single source stortest path bellman ford and dijkstra
Single source stortest path bellman ford and dijkstraSingle source stortest path bellman ford and dijkstra
Single source stortest path bellman ford and dijkstraRoshan Tailor
 
Euclid's Algorithm for Greatest Common Divisor - Time Complexity Analysis
Euclid's Algorithm for Greatest Common Divisor - Time Complexity AnalysisEuclid's Algorithm for Greatest Common Divisor - Time Complexity Analysis
Euclid's Algorithm for Greatest Common Divisor - Time Complexity AnalysisAmrinder Arora
 
Prim's Algorithm on minimum spanning tree
Prim's Algorithm on minimum spanning treePrim's Algorithm on minimum spanning tree
Prim's Algorithm on minimum spanning treeoneous
 
Generalized Nonlinear Models in R
Generalized Nonlinear Models in RGeneralized Nonlinear Models in R
Generalized Nonlinear Models in Rhtstatistics
 
Dijkstra's algorithm
Dijkstra's algorithmDijkstra's algorithm
Dijkstra's algorithmgsp1294
 
Sns mid term-test2-solution
Sns mid term-test2-solutionSns mid term-test2-solution
Sns mid term-test2-solutioncheekeong1231
 
From L to N: Nonlinear Predictors in Generalized Models
From L to N: Nonlinear Predictors in Generalized ModelsFrom L to N: Nonlinear Predictors in Generalized Models
From L to N: Nonlinear Predictors in Generalized Modelshtstatistics
 
gnm: a Package for Generalized Nonlinear Models
gnm: a Package for Generalized Nonlinear Modelsgnm: a Package for Generalized Nonlinear Models
gnm: a Package for Generalized Nonlinear Modelshtstatistics
 
Aaex4 group2(中英夾雜)
Aaex4 group2(中英夾雜)Aaex4 group2(中英夾雜)
Aaex4 group2(中英夾雜)Shiang-Yun Yang
 
Multiplicative Interaction Models in R
Multiplicative Interaction Models in RMultiplicative Interaction Models in R
Multiplicative Interaction Models in Rhtstatistics
 
Deep dive into rsa
Deep dive into rsaDeep dive into rsa
Deep dive into rsaBill GU
 
Prim Algorithm and kruskal algorithm
Prim Algorithm and kruskal algorithmPrim Algorithm and kruskal algorithm
Prim Algorithm and kruskal algorithmAcad
 
lecture 30
lecture 30lecture 30
lecture 30sajinsc
 
Justesen codes alternant codes goppa codes
Justesen codes alternant codes goppa codesJustesen codes alternant codes goppa codes
Justesen codes alternant codes goppa codesMadhumita Tamhane
 
Dijkstra's Algorithm
Dijkstra's AlgorithmDijkstra's Algorithm
Dijkstra's Algorithmguest862df4e
 

What's hot (20)

Biconnectivity
BiconnectivityBiconnectivity
Biconnectivity
 
Linear Cryptanalysis Lecture 線形解読法
Linear Cryptanalysis Lecture 線形解読法Linear Cryptanalysis Lecture 線形解読法
Linear Cryptanalysis Lecture 線形解読法
 
Single source stortest path bellman ford and dijkstra
Single source stortest path bellman ford and dijkstraSingle source stortest path bellman ford and dijkstra
Single source stortest path bellman ford and dijkstra
 
Euclid's Algorithm for Greatest Common Divisor - Time Complexity Analysis
Euclid's Algorithm for Greatest Common Divisor - Time Complexity AnalysisEuclid's Algorithm for Greatest Common Divisor - Time Complexity Analysis
Euclid's Algorithm for Greatest Common Divisor - Time Complexity Analysis
 
Prim's Algorithm on minimum spanning tree
Prim's Algorithm on minimum spanning treePrim's Algorithm on minimum spanning tree
Prim's Algorithm on minimum spanning tree
 
Generalized Nonlinear Models in R
Generalized Nonlinear Models in RGeneralized Nonlinear Models in R
Generalized Nonlinear Models in R
 
Dijkstra's algorithm
Dijkstra's algorithmDijkstra's algorithm
Dijkstra's algorithm
 
Sns mid term-test2-solution
Sns mid term-test2-solutionSns mid term-test2-solution
Sns mid term-test2-solution
 
From L to N: Nonlinear Predictors in Generalized Models
From L to N: Nonlinear Predictors in Generalized ModelsFrom L to N: Nonlinear Predictors in Generalized Models
From L to N: Nonlinear Predictors in Generalized Models
 
Merge Sort
Merge SortMerge Sort
Merge Sort
 
gnm: a Package for Generalized Nonlinear Models
gnm: a Package for Generalized Nonlinear Modelsgnm: a Package for Generalized Nonlinear Models
gnm: a Package for Generalized Nonlinear Models
 
Aaex4 group2(中英夾雜)
Aaex4 group2(中英夾雜)Aaex4 group2(中英夾雜)
Aaex4 group2(中英夾雜)
 
Multiplicative Interaction Models in R
Multiplicative Interaction Models in RMultiplicative Interaction Models in R
Multiplicative Interaction Models in R
 
Deep dive into rsa
Deep dive into rsaDeep dive into rsa
Deep dive into rsa
 
Prim Algorithm and kruskal algorithm
Prim Algorithm and kruskal algorithmPrim Algorithm and kruskal algorithm
Prim Algorithm and kruskal algorithm
 
Sns pre sem
Sns pre semSns pre sem
Sns pre sem
 
lecture 30
lecture 30lecture 30
lecture 30
 
Justesen codes alternant codes goppa codes
Justesen codes alternant codes goppa codesJustesen codes alternant codes goppa codes
Justesen codes alternant codes goppa codes
 
Lec10
Lec10Lec10
Lec10
 
Dijkstra's Algorithm
Dijkstra's AlgorithmDijkstra's Algorithm
Dijkstra's Algorithm
 

Similar to Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punctuation Restoration

Attention is all you need (UPC Reading Group 2018, by Santi Pascual)
Attention is all you need (UPC Reading Group 2018, by Santi Pascual)Attention is all you need (UPC Reading Group 2018, by Santi Pascual)
Attention is all you need (UPC Reading Group 2018, by Santi Pascual)Universitat Politècnica de Catalunya
 
Divide-and-Conquer & Dynamic ProgrammingDivide-and-Conqu.docx
Divide-and-Conquer & Dynamic ProgrammingDivide-and-Conqu.docxDivide-and-Conquer & Dynamic ProgrammingDivide-and-Conqu.docx
Divide-and-Conquer & Dynamic ProgrammingDivide-and-Conqu.docxjacksnathalie
 
RNN and sequence-to-sequence processing
RNN and sequence-to-sequence processingRNN and sequence-to-sequence processing
RNN and sequence-to-sequence processingDongang (Sean) Wang
 
Skiena algorithm 2007 lecture17 edit distance
Skiena algorithm 2007 lecture17 edit distanceSkiena algorithm 2007 lecture17 edit distance
Skiena algorithm 2007 lecture17 edit distancezukun
 
Estimating structured vector autoregressive models
Estimating structured vector autoregressive modelsEstimating structured vector autoregressive models
Estimating structured vector autoregressive modelsAkira Tanimoto
 
2_EditDistance_Jan_08_2020.pptx
2_EditDistance_Jan_08_2020.pptx2_EditDistance_Jan_08_2020.pptx
2_EditDistance_Jan_08_2020.pptxAnum Zehra
 
Nondeterministic Finite Automata AFN.pdf
Nondeterministic Finite Automata AFN.pdfNondeterministic Finite Automata AFN.pdf
Nondeterministic Finite Automata AFN.pdfSergioUlisesRojasAla
 
01 - DAA - PPT.pptx
01 - DAA - PPT.pptx01 - DAA - PPT.pptx
01 - DAA - PPT.pptxKokilaK25
 
machinelearning project
machinelearning projectmachinelearning project
machinelearning projectLianli Liu
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningCastLabKAIST
 
Skiena algorithm 2007 lecture15 backtracing
Skiena algorithm 2007 lecture15 backtracingSkiena algorithm 2007 lecture15 backtracing
Skiena algorithm 2007 lecture15 backtracingzukun
 
Identification of the Mathematical Models of Complex Relaxation Processes in ...
Identification of the Mathematical Models of Complex Relaxation Processes in ...Identification of the Mathematical Models of Complex Relaxation Processes in ...
Identification of the Mathematical Models of Complex Relaxation Processes in ...Vladimir Bakhrushin
 
MVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsMVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsElvis DOHMATOB
 
LeetCode Solutions In Java .pdf
LeetCode Solutions In Java .pdfLeetCode Solutions In Java .pdf
LeetCode Solutions In Java .pdfzupsezekno
 
Chapter 06 rsa cryptosystem
Chapter 06   rsa cryptosystemChapter 06   rsa cryptosystem
Chapter 06 rsa cryptosystemAnkur Choudhary
 
Response Surface in Tensor Train format for Uncertainty Quantification
Response Surface in Tensor Train format for Uncertainty QuantificationResponse Surface in Tensor Train format for Uncertainty Quantification
Response Surface in Tensor Train format for Uncertainty QuantificationAlexander Litvinenko
 
Introduction to Tree-LSTMs
Introduction to Tree-LSTMsIntroduction to Tree-LSTMs
Introduction to Tree-LSTMsDaniel Perez
 
HW 5-RSAascii2str.mfunction str = ascii2str(ascii) .docx
HW 5-RSAascii2str.mfunction str = ascii2str(ascii)        .docxHW 5-RSAascii2str.mfunction str = ascii2str(ascii)        .docx
HW 5-RSAascii2str.mfunction str = ascii2str(ascii) .docxwellesleyterresa
 

Similar to Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punctuation Restoration (20)

Attention is all you need (UPC Reading Group 2018, by Santi Pascual)
Attention is all you need (UPC Reading Group 2018, by Santi Pascual)Attention is all you need (UPC Reading Group 2018, by Santi Pascual)
Attention is all you need (UPC Reading Group 2018, by Santi Pascual)
 
Divide-and-Conquer & Dynamic ProgrammingDivide-and-Conqu.docx
Divide-and-Conquer & Dynamic ProgrammingDivide-and-Conqu.docxDivide-and-Conquer & Dynamic ProgrammingDivide-and-Conqu.docx
Divide-and-Conquer & Dynamic ProgrammingDivide-and-Conqu.docx
 
Machine Learning 1
Machine Learning 1Machine Learning 1
Machine Learning 1
 
RNN and sequence-to-sequence processing
RNN and sequence-to-sequence processingRNN and sequence-to-sequence processing
RNN and sequence-to-sequence processing
 
2_1 Edit Distance.pptx
2_1 Edit Distance.pptx2_1 Edit Distance.pptx
2_1 Edit Distance.pptx
 
Skiena algorithm 2007 lecture17 edit distance
Skiena algorithm 2007 lecture17 edit distanceSkiena algorithm 2007 lecture17 edit distance
Skiena algorithm 2007 lecture17 edit distance
 
Estimating structured vector autoregressive models
Estimating structured vector autoregressive modelsEstimating structured vector autoregressive models
Estimating structured vector autoregressive models
 
2_EditDistance_Jan_08_2020.pptx
2_EditDistance_Jan_08_2020.pptx2_EditDistance_Jan_08_2020.pptx
2_EditDistance_Jan_08_2020.pptx
 
Nondeterministic Finite Automata AFN.pdf
Nondeterministic Finite Automata AFN.pdfNondeterministic Finite Automata AFN.pdf
Nondeterministic Finite Automata AFN.pdf
 
01 - DAA - PPT.pptx
01 - DAA - PPT.pptx01 - DAA - PPT.pptx
01 - DAA - PPT.pptx
 
machinelearning project
machinelearning projectmachinelearning project
machinelearning project
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine Learning
 
Skiena algorithm 2007 lecture15 backtracing
Skiena algorithm 2007 lecture15 backtracingSkiena algorithm 2007 lecture15 backtracing
Skiena algorithm 2007 lecture15 backtracing
 
Identification of the Mathematical Models of Complex Relaxation Processes in ...
Identification of the Mathematical Models of Complex Relaxation Processes in ...Identification of the Mathematical Models of Complex Relaxation Processes in ...
Identification of the Mathematical Models of Complex Relaxation Processes in ...
 
MVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsMVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priors
 
LeetCode Solutions In Java .pdf
LeetCode Solutions In Java .pdfLeetCode Solutions In Java .pdf
LeetCode Solutions In Java .pdf
 
Chapter 06 rsa cryptosystem
Chapter 06   rsa cryptosystemChapter 06   rsa cryptosystem
Chapter 06 rsa cryptosystem
 
Response Surface in Tensor Train format for Uncertainty Quantification
Response Surface in Tensor Train format for Uncertainty QuantificationResponse Surface in Tensor Train format for Uncertainty Quantification
Response Surface in Tensor Train format for Uncertainty Quantification
 
Introduction to Tree-LSTMs
Introduction to Tree-LSTMsIntroduction to Tree-LSTMs
Introduction to Tree-LSTMs
 
HW 5-RSAascii2str.mfunction str = ascii2str(ascii) .docx
HW 5-RSAascii2str.mfunction str = ascii2str(ascii)        .docxHW 5-RSAascii2str.mfunction str = ascii2str(ascii)        .docx
HW 5-RSAascii2str.mfunction str = ascii2str(ascii) .docx
 

More from Seokhwan Kim

The Eighth Dialog System Technology Challenge (DSTC8)
The Eighth Dialog System Technology Challenge (DSTC8)The Eighth Dialog System Technology Challenge (DSTC8)
The Eighth Dialog System Technology Challenge (DSTC8)Seokhwan Kim
 
Dynamic Memory Networks for Dialogue Topic Tracking
Dynamic Memory Networks for Dialogue Topic TrackingDynamic Memory Networks for Dialogue Topic Tracking
Dynamic Memory Networks for Dialogue Topic TrackingSeokhwan Kim
 
The Fifth Dialog State Tracking Challenge (DSTC5)
The Fifth Dialog State Tracking Challenge (DSTC5)The Fifth Dialog State Tracking Challenge (DSTC5)
The Fifth Dialog State Tracking Challenge (DSTC5)Seokhwan Kim
 
Natural Language in Human-Robot Interaction
Natural Language in Human-Robot InteractionNatural Language in Human-Robot Interaction
Natural Language in Human-Robot InteractionSeokhwan Kim
 
Exploring Convolutional and Recurrent Neural Networks in Sequential Labelling...
Exploring Convolutional and Recurrent Neural Networks in Sequential Labelling...Exploring Convolutional and Recurrent Neural Networks in Sequential Labelling...
Exploring Convolutional and Recurrent Neural Networks in Sequential Labelling...Seokhwan Kim
 
The Fourth Dialog State Tracking Challenge (DSTC4)
The Fourth Dialog State Tracking Challenge (DSTC4)The Fourth Dialog State Tracking Challenge (DSTC4)
The Fourth Dialog State Tracking Challenge (DSTC4)Seokhwan Kim
 
Wikification of Concept Mentions within Spoken Dialogues Using Domain Constra...
Wikification of Concept Mentions within Spoken Dialogues Using Domain Constra...Wikification of Concept Mentions within Spoken Dialogues Using Domain Constra...
Wikification of Concept Mentions within Spoken Dialogues Using Domain Constra...Seokhwan Kim
 
Towards Improving Dialogue Topic Tracking Performances with Wikification of C...
Towards Improving Dialogue Topic Tracking Performances with Wikification of C...Towards Improving Dialogue Topic Tracking Performances with Wikification of C...
Towards Improving Dialogue Topic Tracking Performances with Wikification of C...Seokhwan Kim
 
A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...
A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...
A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...Seokhwan Kim
 
Sequential Labeling for Tracking Dynamic Dialog States
Sequential Labeling for Tracking Dynamic Dialog StatesSequential Labeling for Tracking Dynamic Dialog States
Sequential Labeling for Tracking Dynamic Dialog StatesSeokhwan Kim
 
Wikipedia-based Kernels for Dialogue Topic Tracking
Wikipedia-based Kernels for Dialogue Topic TrackingWikipedia-based Kernels for Dialogue Topic Tracking
Wikipedia-based Kernels for Dialogue Topic TrackingSeokhwan Kim
 
A Graph-based Cross-lingual Projection Approach for Spoken Language Understan...
A Graph-based Cross-lingual Projection Approach for Spoken Language Understan...A Graph-based Cross-lingual Projection Approach for Spoken Language Understan...
A Graph-based Cross-lingual Projection Approach for Spoken Language Understan...Seokhwan Kim
 
A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...
A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...
A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...Seokhwan Kim
 
MMR-based active machine learning for Bio named entity recognition
MMR-based active machine learning for Bio named entity recognitionMMR-based active machine learning for Bio named entity recognition
MMR-based active machine learning for Bio named entity recognitionSeokhwan Kim
 
A semi-supervised method for efficient construction of statistical spoken lan...
A semi-supervised method for efficient construction of statistical spoken lan...A semi-supervised method for efficient construction of statistical spoken lan...
A semi-supervised method for efficient construction of statistical spoken lan...Seokhwan Kim
 
A spoken dialog system for electronic program guide information access
A spoken dialog system for electronic program guide information accessA spoken dialog system for electronic program guide information access
A spoken dialog system for electronic program guide information accessSeokhwan Kim
 
An alignment-based approach to semi-supervised relation extraction including ...
An alignment-based approach to semi-supervised relation extraction including ...An alignment-based approach to semi-supervised relation extraction including ...
An alignment-based approach to semi-supervised relation extraction including ...Seokhwan Kim
 
An Alignment-based Pattern Representation Model for Information Extraction
An Alignment-based Pattern Representation Model for Information ExtractionAn Alignment-based Pattern Representation Model for Information Extraction
An Alignment-based Pattern Representation Model for Information ExtractionSeokhwan Kim
 
EPG 정보 검색을 위한 예제 기반 자연어 대화 시스템
EPG 정보 검색을 위한 예제 기반 자연어 대화 시스템EPG 정보 검색을 위한 예제 기반 자연어 대화 시스템
EPG 정보 검색을 위한 예제 기반 자연어 대화 시스템Seokhwan Kim
 
A Cross-Lingual Annotation Projection Approach for Relation Detection
A Cross-Lingual Annotation Projection Approach for Relation DetectionA Cross-Lingual Annotation Projection Approach for Relation Detection
A Cross-Lingual Annotation Projection Approach for Relation DetectionSeokhwan Kim
 

More from Seokhwan Kim (20)

The Eighth Dialog System Technology Challenge (DSTC8)
The Eighth Dialog System Technology Challenge (DSTC8)The Eighth Dialog System Technology Challenge (DSTC8)
The Eighth Dialog System Technology Challenge (DSTC8)
 
Dynamic Memory Networks for Dialogue Topic Tracking
Dynamic Memory Networks for Dialogue Topic TrackingDynamic Memory Networks for Dialogue Topic Tracking
Dynamic Memory Networks for Dialogue Topic Tracking
 
The Fifth Dialog State Tracking Challenge (DSTC5)
The Fifth Dialog State Tracking Challenge (DSTC5)The Fifth Dialog State Tracking Challenge (DSTC5)
The Fifth Dialog State Tracking Challenge (DSTC5)
 
Natural Language in Human-Robot Interaction
Natural Language in Human-Robot InteractionNatural Language in Human-Robot Interaction
Natural Language in Human-Robot Interaction
 
Exploring Convolutional and Recurrent Neural Networks in Sequential Labelling...
Exploring Convolutional and Recurrent Neural Networks in Sequential Labelling...Exploring Convolutional and Recurrent Neural Networks in Sequential Labelling...
Exploring Convolutional and Recurrent Neural Networks in Sequential Labelling...
 
The Fourth Dialog State Tracking Challenge (DSTC4)
The Fourth Dialog State Tracking Challenge (DSTC4)The Fourth Dialog State Tracking Challenge (DSTC4)
The Fourth Dialog State Tracking Challenge (DSTC4)
 
Wikification of Concept Mentions within Spoken Dialogues Using Domain Constra...
Wikification of Concept Mentions within Spoken Dialogues Using Domain Constra...Wikification of Concept Mentions within Spoken Dialogues Using Domain Constra...
Wikification of Concept Mentions within Spoken Dialogues Using Domain Constra...
 
Towards Improving Dialogue Topic Tracking Performances with Wikification of C...
Towards Improving Dialogue Topic Tracking Performances with Wikification of C...Towards Improving Dialogue Topic Tracking Performances with Wikification of C...
Towards Improving Dialogue Topic Tracking Performances with Wikification of C...
 
A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...
A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...
A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...
 
Sequential Labeling for Tracking Dynamic Dialog States
Sequential Labeling for Tracking Dynamic Dialog StatesSequential Labeling for Tracking Dynamic Dialog States
Sequential Labeling for Tracking Dynamic Dialog States
 
Wikipedia-based Kernels for Dialogue Topic Tracking
Wikipedia-based Kernels for Dialogue Topic TrackingWikipedia-based Kernels for Dialogue Topic Tracking
Wikipedia-based Kernels for Dialogue Topic Tracking
 
A Graph-based Cross-lingual Projection Approach for Spoken Language Understan...
A Graph-based Cross-lingual Projection Approach for Spoken Language Understan...A Graph-based Cross-lingual Projection Approach for Spoken Language Understan...
A Graph-based Cross-lingual Projection Approach for Spoken Language Understan...
 
A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...
A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...
A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...
 
MMR-based active machine learning for Bio named entity recognition
MMR-based active machine learning for Bio named entity recognitionMMR-based active machine learning for Bio named entity recognition
MMR-based active machine learning for Bio named entity recognition
 
A semi-supervised method for efficient construction of statistical spoken lan...
A semi-supervised method for efficient construction of statistical spoken lan...A semi-supervised method for efficient construction of statistical spoken lan...
A semi-supervised method for efficient construction of statistical spoken lan...
 
A spoken dialog system for electronic program guide information access
A spoken dialog system for electronic program guide information accessA spoken dialog system for electronic program guide information access
A spoken dialog system for electronic program guide information access
 
An alignment-based approach to semi-supervised relation extraction including ...
An alignment-based approach to semi-supervised relation extraction including ...An alignment-based approach to semi-supervised relation extraction including ...
An alignment-based approach to semi-supervised relation extraction including ...
 
An Alignment-based Pattern Representation Model for Information Extraction
An Alignment-based Pattern Representation Model for Information ExtractionAn Alignment-based Pattern Representation Model for Information Extraction
An Alignment-based Pattern Representation Model for Information Extraction
 
EPG 정보 검색을 위한 예제 기반 자연어 대화 시스템
EPG 정보 검색을 위한 예제 기반 자연어 대화 시스템EPG 정보 검색을 위한 예제 기반 자연어 대화 시스템
EPG 정보 검색을 위한 예제 기반 자연어 대화 시스템
 
A Cross-Lingual Annotation Projection Approach for Relation Detection
A Cross-Lingual Annotation Projection Approach for Relation DetectionA Cross-Lingual Annotation Projection Approach for Relation Detection
A Cross-Lingual Annotation Projection Approach for Relation Detection
 

Recently uploaded

Best Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITBest Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITmanoharjgpsolutions
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsJean Silva
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingShane Coughlan
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldRoberto Pérez Alcolea
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf31events.com
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identityteam-WIBU
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...OnePlan Solutions
 
What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesVictoriaMetrics
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
Not a Kubernetes fan? The state of PaaS in 2024
Not a Kubernetes fan? The state of PaaS in 2024Not a Kubernetes fan? The state of PaaS in 2024
Not a Kubernetes fan? The state of PaaS in 2024Anthony Dahanne
 
VictoriaMetrics Anomaly Detection Updates: Q1 2024
VictoriaMetrics Anomaly Detection Updates: Q1 2024VictoriaMetrics Anomaly Detection Updates: Q1 2024
VictoriaMetrics Anomaly Detection Updates: Q1 2024VictoriaMetrics
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shardsChristopher Curtin
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogueitservices996
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Developmentvyaparkranti
 
Patterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencePatterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencessuser9e7c64
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLionel Briand
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 
SoftTeco - Software Development Company Profile
SoftTeco - Software Development Company ProfileSoftTeco - Software Development Company Profile
SoftTeco - Software Development Company Profileakrivarotava
 

Recently uploaded (20)

Best Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITBest Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh IT
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero results
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository world
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identity
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
 
What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 Updates
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
Not a Kubernetes fan? The state of PaaS in 2024
Not a Kubernetes fan? The state of PaaS in 2024Not a Kubernetes fan? The state of PaaS in 2024
Not a Kubernetes fan? The state of PaaS in 2024
 
VictoriaMetrics Anomaly Detection Updates: Q1 2024
VictoriaMetrics Anomaly Detection Updates: Q1 2024VictoriaMetrics Anomaly Detection Updates: Q1 2024
VictoriaMetrics Anomaly Detection Updates: Q1 2024
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogue
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Development
 
Patterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencePatterns for automating API delivery. API conference
Patterns for automating API delivery. API conference
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and Repair
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 
SoftTeco - Software Development Company Profile
SoftTeco - Software Development Company ProfileSoftTeco - Software Development Company Profile
SoftTeco - Software Development Company Profile
 

Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punctuation Restoration

  • 1. Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punctuation Restoration Seokhwan Kim Adobe Research, San Jose, CA, USA Punctuation Restoration Aims to generate the punctuation marks from the unpunctuated ASR outputs A sequence labelling task to predict the punctuation label yt for the t-th timestep in a given word sequence X = {x1 · · · xt · · · xT } where C ∈ {‘, COMMA , ‘.PERIOD , ‘?QMARK } yt =    c ∈ C if a punctuation symbol c is located between xt−1 and xt, O otherwise, Examples Input: so if we make no changes today what does tomorrow look like well i think you probably already have the picture Output: so , if we make no changes today , what does tomorrow look like ? well , i think you probably already have the picture . t xt yt 1 so O 2 if ,COMMA 3 we O 4 make O 5 no O 6 changes O 7 today O 8 what ,COMMA 9 does O 10 tomorrow O 11 look O t xt yt 12 like O 13 well ?QMARK 14 i ,COMMA 15 think O 16 you O 17 probably O 18 already O 19 have O 20 the O 21 picture O 22 </s> .PERIOD Related Work Multiple fully-connected hidden layers [1] Convolutional neural networks (CNNs) [2] Recurrent neural networks (RNNs) [4] RNNs with neural attention mechanisms [3] Main Contributions Context encoding from the stacked recurrent layers to learn more hierarchical aspects Neural attentions applied on every intermediate hidden layer to capture the layer-wise features Multi-head attentions instead of a single attention function to diversify the layer-wise attentions further Methods Model Architecture yt st-1st-1 stst ∙∙∙∙∙∙ n-bidirectional recurrent layers AttentionAttention AttentionAttention AttentionAttention m-heads mm mm Attention Attention Attention m-heads m m x1 ∙∙∙h1 1h1 1 h2 1h2 1 hn 1hn 1x1 ∙∙∙h1 1 h2 1 hn 1 xT ∙∙∙h1 Th1 T h2 Th2 T hn Thn TxT ∙∙∙h1 T h2 T hn T h2 th2 txt ∙∙∙h1 th1 t hn thn th2 txt ∙∙∙h1 t hn t ∙∙∙ ∙∙∙∙∙∙ ∙∙∙∙∙∙ ∙∙∙∙∙∙ ∙∙∙ ∙∙∙ ∙∙∙ ∙∙∙ ∙∙∙ ∙∙∙∙∙∙ ∙∙∙∙∙∙ ∙∙∙∙∙∙ ∙∙∙ ∙∙∙ ∙∙∙ ∙∙∙ Word Embedding Each word xt in a given input sequence X is represented as a d-dimensional distributed vector xt ∈ Rd Word embeddings can be learned from scratch with random initialization or fine-tuned from pre-trained word vectors Bi-directional Recurrent Layers Stacking multiple recurrent layers on top of each other −→ h i t =    GRU xt, −→ h 1 t−1 if i = 1, GRU hi−1 t , −→ h i t−1 if i = 2, · · · n, The backward state ←− h i t is computed also with the same way but in the reverse order from the end of the sequence T to t. Both directional states are concatenated into the output state hi t = −→ h i t, ←− h i t ∈ R2d to represent both contexts together Uni-directional Recurrent Layer The top-layer outputs [hn 1, · · · , hn T ] are forwarded to a uni-directional recurrent layer st = GRU (hn t , st−1) ∈ Rd , Multi-head Attentions The hidden state st constitutes a query to neural attentions based on scaled dot-product attention Attn (Q, K, V) = softmax QKT √ d V, Multi-head attentions are applied for every layer separately to compute the weighted contexts from different subspaces fi,j = Attn S · Wi,j Q , Hi · Wi,j K , Hi · Wi,j V , where S = [s1; s2; · · · ; sT ] and Hi = hi 1; hi 2; · · · ; hi T Softmax Classification yt = softmax st, f1,1 t , · · · , fn,m t Wy + by Experimental Setup IWSLT dataset English reference transcripts of TED Talks 2.1M, 296k, and 13k words for training, development, and test on reference, respectively Model Configurations Number of recurrent layers: n ∈ {1, 2, 3, 4} Number of attention heads: m ∈ {1, 2, 3, 4, 5} Attended contexts: layer-wise or only on top Implementation Details Word embeddings: 50d GloVe on Wikipedia Optimizer: Adam optimizer Loss: Negative log likelihood loss Hidden dimension d = 256 The same dimension across all the components Except the word embedding layer Mini-batch size: 128 Dropout rate: 0.5 Early stopping within 100 epochs Results Comparisons of the performances with different network configurations Single-head Top-layer Single-head Layer-wise Multi-head Top-layer Multi-head Layer-wise 0.62 0.63 0.64 0.65 0.66 0.67 0.68 F-measure 1 layer 2 layers 3 layers 4 layers Results COMMA PERIOD QUESTION OVERALL Models P R F P R F P R F P R F SER DNN-A [1] 48.6 42.4 45.3 59.7 68.3 63.7 - - - 54.8 53.6 54.2 66.9 CNN-2A [1] 48.1 44.5 46.2 57.6 69.0 62.8 - - - 53.4 55.0 54.2 68.0 T-LSTM [2] 49.6 41.4 45.1 60.2 53.4 56.6 57.1 43.5 49.4 55.0 47.2 50.8 74.0 T-BRNN [3] 64.4 45.2 53.1 72.3 71.5 71.9 67.5 58.7 62.8 68.9 58.1 63.1 51.3 T-BRNN-pre [3] 65.5 47.1 54.8 73.3 72.5 72.9 70.7 63.0 66.7 70.0 59.7 64.4 49.7 Single-BiRNN [4] 62.2 47.7 54.0 74.6 72.1 73.4 67.5 52.9 59.3 69.2 59.8 64.2 51.1 Corr-BiRNN [4] 60.9 52.4 56.4 75.3 70.8 73.0 70.7 56.9 63.0 68.6 61.6 64.9 50.8 DRNN-LWMA 63.4 55.7 59.3 76.0 73.5 74.7 75.0 71.7 73.3 70.0 64.6 67.2 47.3 DRNN-LWMA-pre 62.9 60.8 61.9 77.3 73.7 75.5 69.6 69.6 69.6 69.9 67.2 68.6 46.1 so if wemake nochangestodaywhatdoestomorrow look likewell ithink youprobablyalreadyhave thepicture so if we make no changes today what does tomorrow look like well i think you probably already have the picture so if wemake nochangestodaywhatdoestomorrow look likewell ithink youprobablyalreadyhave thepicture so if we make no changes today what does tomorrow look like well i think you probably already have the picture so if wemake nochangestodaywhatdoestomorrow look likewell ithink youprobablyalreadyhave thepicture so if we make no changes today what does tomorrow look like well i think you probably already have the picture so if wemake nochangestodaywhatdoestomorrow look likewell ithink youprobablyalreadyhave thepicture so if we make no changes today what does tomorrow look like well i think you probably already have the picture References [1] Xiaoyin Che, ChengWang, Haojin Yang, and Christoph Meinel, ‘Punctuation prediction for unsegmented transcript based on word vector’, LREC 2016. [2] Ottokar Tilk and Tanel Alumae, ‘LSTM for punctuation restoration in speech transcripts’, INTERSPEECH 2015. [3] Ottokar Tilk and Tanel Alumae, ‘Bidirectional recurrent neural network with attention mechanism for punctuation restoration’, INTERSPEECH 2016. [4] Vardaan Pahuja et al., ‘Joint learning of correlated sequence labelling tasks using bidirectional recurrent neural networks’, INTERSPEECH 2017. 345 Park Avenue, San Jose, CA 95110, USA Email: seokim@adobe.com