2. z
Syllabus
Natural Language Processing
Natural Language Models
Syntactic Analysis
Augmented Grammar
Semantic Interpretation
Machine Translation
Ambiguity and Disambiguation
Discourse understanding
Grammar Induction
3. z
Natural Language processing (NLP)
NLP allows computer interaction with humans
A field in Artificial Intelligence
Computer LinguisticsNLP
Fig 1: NLP
Fig 2: Research Area in NLP
AI
ML
4. z
Trending Topics in Natural Language Processing
Natural Language Understanding
Natural Language Generation
Text Extraction
Language Translation
Parsing
Parts of Speech Tagging
Fig 3 : Projection of NLP projects
8. z
Research Topics in Natural Language Processing
Named Entity Recognition
Hamming Problem
Neural networkbased Transition & parsing
Ontology
Dependency parsing
Query entity recognising & Disambiguity
Sentiment analysis & mining
Text Categorization and Summarization
Online Browsing
Text Mining
Plagiarism Detection
Information retrieval
Machine translation
Speech recognition
Deep learning in NLP
Opinion analysis & mining
Text to 3D scene Generation
Sentence completion
9. z
Applications of Natural Language Processing
Biomedical
Forensic Science
Advertisement
Education
Politics
E-governance
Business Development
Marketing
10. zTools & Software : Purpose
Stanford NLP : provide models files for analysis of English, written in java
Apache Opennlp: provide support for common NLP task such as tokenization, sentence Segmentation.
Jig LDA nlp: used for parameter estimation & inference implemented in java
Scala NLP: Umbrella project for several libraries, including Breeze ,Epic
Apache Lucene Core: full-featured text Search engine library implemented in java
GateNLP: Java suits of tools which include information extraction support system to support various Lang
NLTK: build a python program to work with human language
11. z NLP - Stages
Editors
Jupyter NB
Google Collab
Pychram
Software Libraries
NLTK
TensorFlow
Keras
Pytorch Pragmatic Analysis
Disclosure Integration
Semantic Analysis
Syntax Analysis
Lexical Analysis
33. z
Parsing & Approaches – Syntax Analysis
Parsing - taking input text and giving structural representation to it after
checking the syntax as per formal grammar ( rule ).
Top-down Parsing
The parser starts constructing the parse tree from the start
symbol and then tries to transform the start symbol to the
input.
Bottom-up Parsing
The parser starts with the input symbol and tries to
construct the parser tree up to the start symbol.
35. z
How to get /Solve this structure / Grammar of POS?
Parts of Speech (POS) ?
Subject , Object , Predicate !
Grammar ?
36. z
Grammar – Example
S->NP VP
S-> VP NP
S->NP VP NP
NP -> Det Noun | NP->Noun |Nominal
VP-> Verb NP | V | Verb NP PP | V PP
V->Verb
Det-> Det | Article |Aux
L
H
S
R
H
S
Terminal
Non
Terminal
37. z
Context Free Grammar / Backus Norm
Form / Phrase Structure Gramme r
CFG has 4 components.
G={ V,T,P,S}
Set of Non-Terminals: ( It is denoted by V). The non-terminals are syntactic variables that
denote the sets of strings, such as Verb Phrase or noun phrase.( LHS of a Grammar )
Set of Terminals: ( It is denoted by T). Strings are further cannot be sub divided like Noun,
Verb, determinant, article, auxiliary ( RHS of a Grammar )
Set of Production Rule : (It is denoted by P). The rule to defines how the terminals and
non-terminals can be combined. Every production(P) consists of non-terminals, an arrow,
and terminals
Start Symbol: (( It is denoted by S)The production begins from the start symbol. Non-
terminal symbol is always designated as start symbol.
38. z
CFG - Construct the parsing
To solve :- The flight includes a meal
S -> NP VP NP
S -> NP VP
S -> VP NP
VP -> V NP
NP - > Det N
V -> Verb
Verb -> includes
Det -> the
Det ->a
N -> Flight
N -> Meal
S
NP VP
Det
Flight
NP NP
Det N
the Meala
N
includes
V
S
NP VP
Det
Flight
NP NP
Det
the a
N
includes
V
Meal
N
N
39. z
Some Difficult Examples
From the newspapers:
Squad helps dog bite victim.
Helicopter powered by human flies.
Levy won’t hurt the poor.
Once-sagging cloth diaper industry saved by full
dumps.
Ambiguities:
Lexical: meanings of ‘hot’, ‘back’.
Syntactic: I heard the music in my room.
Referential: The cat ate the mouse. It was ugly.
40. z
CKY uses C Norm Form
When Sentence contain
Ambiguity
Recursive / Repeated Sub Structure ,
How to resolve ?
CNF
Allowed Rules in CNF
S -> B ( single terminal) NP -> the N (incorrect ), so we introduce dummy variable
S->B C ( 2 non terminal ) NP -> Det N
NP -> N pp Det -> the ( Dummy Variable)
44. z
Augmented Grammar :
On the fly if the given sentence
generates a new grammar , add
that rule to the rule table , this is
referred as AG.
A -> B, B ->C, A -> C
e.g. Students like coffee.
Todd likes coffee.
Todd like coffee.
Examples
S -> NP[number] VP[number]
NP[number] -> N[number]
N[number=singular] -> “Todd”
N[number=plural] -> “students”
VP[number] -> V[number] NP
V[number=singular] -> “likes”
V[number=plural] -> “like”
46. z
Word Net
Contain DB of Nouns, Verbs, Adjectives & Adverbs)
Ambiguity : single word having multiple meaning (e.g. bank)
Synonyms: similar words (e.g. big, large)(fare/price)- oddity
Antonyms: (big, small , good bad, fast slow )
Complementary pair: ( male female, alive dead, present dead)
Relation pair: ( married- not single , single – not married )
Hyponymy: ( vehicle(car))
Meronymy : ( part of a whole (apple <- apple tree))
Homonymy: ( whole to a part (apple tree ->apple))
47. z
Semantic Analysis- Building blocks
Entities − It represents the individual such as a particular person, location etc.
For example, Haryana. India, Ram all are entities.
Concepts − It represents the general category of the individuals such as a
person, city, etc.
Relations − It represents the relationship between entities and concept. For
example, Ram is a person.
Predicates − It represents the verb structures. For example, semantic roles and
case grammar are the examples of predicates.
48. z
Semantic Analysis
Two approaches FOL / WordNet
First Order Logic
Flight 707 serve lunch S -> Np Vp ( DCL( Np Vp ))
Server lunch S -> IMP( VP NP )
Does Flight 207 server lunch S -> Aux NP VP ( YNQ( NP VP ))
Which flight server lunch S -> (WHQ (NP VP))
Atlanta’s airport S -> N VIP (GN ( N ))
I told harry to go the queen ( infinite verb phrase) S -> NP VP NP ( S-> NP λ VP NP))
50. z
Discourse Integration
Let S0 and S1 to represent the meaning of the two related sentences i.e. Text Coherence
Results: It infers that the state asserted by term S0 could cause the state asserted by S1.
e.g. Ram was caught in the fire. His skin burned.
Explanation: It infers that the state asserted by S1 could cause the state asserted by S0.
e.g. Ram fought with Shyam’s friend. He was drunk.
Parallel: It infers p(a1,a2,…) of S0 & p(b1,b2,…) from S1. Here ai and bi are similar for all i.
e.g. Ram wanted car. Shyam wanted money.
Elaboration : It infers the same proposition P from both the assertions − S0 and S1 for
e,g, Ram was from Chandigarh. Shyam was from Kerala.
Occasion: It happens when a change of state can be inferred from the assertion of S0, final state of
which can be inferred from S1 and vice-versa.
e.g. Ram picked up the book. He gave it to Shyam.
51. z
Example of Discourse Integration
S1 − Ram went to the bank to deposit money.
S2 − He then took a train to Shyam’s cloth shop.
S3 − He wanted to buy some clothes.
S4 − He do not have new clothes for party.
S5 − He also wanted to talk to Shyam regarding
his health
52. z
Grammar Induction :
Unsupervised learning of a language’s syntax from a corpus of observed
sentences
– Ability to uncover an underlying grammar
– Ability to parse
– Ability to judge grammaticality
solve using parsing 1. CFG 2. CKY form and generate a parsed tree or by
Language models like Markov Chain Model
Demonstration of sentence completion using Grammar Induction using NN
54. z
54
Speech Recognition
Human languages are limited to a set of about
40 to 50 distinct sounds called phones: e.g.,
[ey] bet
[ah] but
[oy] boy
[em] bottom
[en] button
These phones are characterized in terms of acoustic features, e.g.,
frequency and amplitude, that can be extracted from the sound
waves
55. z
55
Difficulties
Why isn't this easy?
just develop a dictionary of pronunciation
e.g., coat = [k] + [ow] + [t] = [kowt]
but: recognize speech wreck a nice beach
Problems:
homophones: different fragments sound the same
e.g., rec and wreck
segmentation: determining breaks between words
e.g., nize speech and nice beach
signal processing problems
56. 56
Speech Recognition Architecture
• Large vocabulary, continuous speech (words not separated), speaker-
independent
Speech
Waveform
Spectral
Feature
Vectors
Phone
Likelihoods
P(o|q)
Words
Feature Extraction
(Signal Processing)
Phone Likelihood
Estimation (Gaussians
or Neural Networks)
Decoding (Viterbi
or Stack Decoder)
Neural Net
N-gram Grammar
HMM Lexicon
57. z
57
Signal Processing
Sound is an analog energy source resulting from pressure waves
striking an eardrum or microphone
A device called an analog-to-digital converter can be used to record
the speech sounds
sampling rate: the number of times per second that the sound level is
measured
quantization factor: the maximum number of bits of precision for the
sound level measurements
e.g., telephone: 3 KHz (3000 times per second)
e.g., speech recognizer: 8 KHz with 8 bit samples
so that 1 minute takes about 500K bytes