This is presentation slides of the paper "Attentional Parallel RNNs for Generating Punctuation in Transcribed Speech" in 5th International Conference on Statistical Language and Speech Processing (SLSP 2017)
Abstract
Until very recently, the generation of punctuation marks for automatic speech recognition (ASR) output has been mostly done by looking at the syntactic structure of the recognized utterances. Prosodic cues such as breaks, speech rate, pitch intonation that influence placing of punctuation marks on speech transcripts have been seldom used. We propose a method that uses recurrent neural networks, taking prosodic and lexical information into account in order to predict punctuation marks for raw ASR output. Our experiments show that an attention mechanism over parallel sequences of prosodic cues aligned with transcribed speech improves accuracy of punctuation generation.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
SLSP 2017 presentation - Attentional Parallel RNNs for Generating Punctuation in Transcribed Speech
1. Attentional Parallel RNNs for
Generating Punctuation in
Transcribed Speech
Alp Öktem, Mireia Farrús, Leo Wanner
E-mail: alp.oktem@upf.edu
Other works: https://www.researchgate.net/profile/Alp_Oktem
Github: https://github.com/alpoktem
2. Contents
1) Motivation
2) Punctuating spoken text
3) Approaches
a) Related Work
b) Our approach
4) Proposed model
5) Data and experimental setup
6) Results
7) Contributions
3. Motivation
...
so under that basis we put it out and said
look we're skeptical about this thing we
don't know but what can we do the
material looks good it feels right but we
just can't verify it and we then got a letter
just this week from the company who
wrote it wanting to track down the source
saying hey we want to track down the
source and we were like oh tell us more
what document is it precisely you're
talking about can you show that you had
legal authority over that document is it
really yours
...
ASR
4. Motivation
ASR
...
So under that basis, we put it out and
said, "Look, we're skeptical about this
thing. We don't know, but what can we
do? The material looks good, it feels
right, but we just can't verify it." And we
then got a letter just this week from the
company who wrote it, wanting to track
down the source saying, "Hey, we want
to track down the source." And we were
like, "Oh, tell us more. What document is
it, precisely, you're talking about? Can
you show that you had legal authority
over that document? Is it really yours?
...
5. Why punctuation?
Punctuation serves for:
● For human readability,
● To aid interpretation,
● For machine processing:
○ Parsing
○ Machine translation
6. Motivation
RESEARCH QUESTIONS
1. How to approach the problem of unpunctuated ASR output?
2. Which linguistic phenomena affect the placement of
punctuation marks in spoken text?
7. Contents
1) Motivation
2) Punctuating spoken text
3) Approaches
a) Related Work
b) Our approach
4) Proposed model
5) Data and experimental setup
6) Results
7) Contributions
8. Punctuating Spoken Text
What signals punctuation in speech?
1) Syntax/Orthography:
Usage of commas, which are required e.g. in seperating clauses, depend a lot on
syntax.
Today, I am giving a talk.
10. Contents
1) Motivation
2) Punctuating spoken text
3) Approaches
a) Related Work
b) Our approach
4) Proposed model
5) Data and experimental setup
6) Results
7) Contributions
11. Related Work
❖ Data-driven models → Trainable on any language
❖ Recurrent Neural Networks (RNN) employed on two kinds of data:
Written Data
Features: Lexical, POS
Written+Spoken Data
Features: Lexical, pause
durations
Training in two stages
(Ballesteros et al., 2016)
Many prosodic features contributing to punctuation usage
are neglected!
(Tilk et al., 2016)
12. Our Approach
❖ Process lexical and prosodic information in parallel.
❖ Train a model solely from spoken data
❖ Test various acoustic features contributing to prosody:
➢ Pause durations
➢ Fundemental frequency (f0)
➢ Intensity
13. Contents
1) Motivation
2) Punctuating spoken text
3) Approaches
a) Related Work
b) Our approach
4) Proposed model
5) Data and experimental setup
6) Results
7) Contributions
17. Contents
1) Motivation
2) Punctuating spoken text
3) Approaches
a) Related Work
b) Our approach
4) Proposed model
5) Data and experimental setup
6) Results
7) Contributions
18. Data
❖ 1046 TED Talks
❖ 884 English speakers
❖ 156034 sentences
❖ Manual transcription available
https://www.ted.com/talks
20. Experimental Setup
❖ Reduced punctuation set
❖ 50 words per training sample
❖ 59811 samples
❖ %70-%15-%15: Training,
testing, validation
❖ Word vocabulary: 13830
❖ Implementation using Theano
no
punctuation
21. Contents
1) Motivation
2) Punctuating spoken text
3) Approaches
a) Related Work
b) Our approach
4) Proposed model
5) Data and experimental setup
6) Results
7) Contributions
24. Results from Testing Set
julian _ welcome . it's _ been _ reported _ that _ wikileaks _ your _ baby _ has _ in _
the _ last _ few _ years _ has _ released _ more _ classified _ documents _ than _ the
_ rest _ of _ the _ world's _ media _ combined . can _ that _ possibly _ be _ true ?
yeah , can _ it _ possibly _ be _ true ? it's _ a _ worry . isn't _ it _ that _ the _ rest _ of _
the _ world's _ media _ is _ doing _ such _ a _ bad _ job _ that _ a _ little _ group _ of
_ activists _ is _ able _ to _ release _ more _ of _ that _ type _ of _ information _ than _
the _ rest _ of _ the _ world _ press _ combined . how _ does _ it _ work ? how _ do _
people _ release _ the _ documents ?
who _ was _ the _ richest _ man ? still _ is _ the _ richest _ man _ in _ kenya .
when _ we _ released _ that _ report , we _ did _ so _ three _ days _ after _ the _ new
_ president _ kibaki _ had _ decided _ to _ pal _ up _ with _ the _ man _ that _ he _
was _ going _ to _ clean _ out , daniel _ arap _ moi .
25. Contents
1) Motivation
2) Punctuating spoken text
3) Approaches
a) Related Work
b) Our approach
4) Proposed model
5) Data and experimental setup
6) Results
7) Contributions
26. Contributions
❖ A study on the effect of various acoustic features on
punctuating spoken text.
❖ A model that is able to...
➢ process lexical/prosodic features in parallel
➢ integrate any aligned feature
❖ Training solely on spoken data
❖ Improvement compared to baseline (+%9,1 in terms of
F1
-score)
Source code available at:
https://github.com/alpoktem/punkProse