This document provides an overview of natural language processing (NLP). It discusses how NLP analyzes human language input to build computational models of language. The key components of NLP are natural language understanding and natural language generation. Challenges in NLP include ambiguity, context dependence, and the creative nature of language. The document also outlines common NLP techniques like keyword analysis and syntactic parsing, as well as formal grammars and parsing approaches.
2. Natural Language Processing
• Language is means of Communication for humans. By studying language,
we come to understand more about the world.
• If we can succeed at building computational mode of language, we will have
a powerful tool for communicating about the world.
• We look at how we can exploit knowledge about the world, in combination
with linguistic facts, to build computational natural language systems.
• Natural Language Processing (NLP) is the process of computer analysis of
input provided in a human language (natural language), and conversion of
this input into a useful form of representation.
• NLP is one of field of AI that processes or analyzes written or spoken
language.
• NLP involve processing of speech, grammar and meaning.
• NLP is composed of two part: NLU (Natural Language Understanding) and
NLG (Natural Language generation).
• Understanding any language requires detailed knowledge of that language.
3. Components of NLP
• Natural Language Understanding (ambiguity is major problem)
– Lexical(word level):
Which words to choose.
– Syntactic(sentence level/parsing)
Example: “Call me a cab.” Sentence has multiple meanings for compiler.
– Referential ambiguity
Example: “Meena went to Geeta. She says that she is hungry.” Here, “she”
can refer to either Meena or Geeta in compiler.
• Natural Language Generation
– Text planning
Knowledge base is used to choose appropriate words for NLG.
– Sentence planning
It includes choosing required words, forming meaningful phrases, setting
tone of the sentence. It Arrange words in proper meaningful way.
– Text realization
It is used to make a structure of sentences and display as output.
5. Input/Source
• The input of a NLP system can be written text
or speech.
• Quality of input decides the possible errors in
language processing that is high quality input
leads to correct language understanding.
6. Segmentation/Lexical analysis
• The text inputs are divided in segments
(Chunks) and the segments are analyzed.
• Each such chunk is called frames.
7. Syntactic Analysis
• Syntactic analysis takes an input sentence and produces
a representation of its grammatical structure.
• A grammar describes the valid parts of speech of a
language and how to combine them into phrases.
• The grammar of English is nearly context free.
• Grammar: A computer grammar specifies which
sentences are in a language and their parse trees. A
parse tree is a hierarchical structure that shows how the
grammar applies to the input. Each level of the tree
corresponds to the application of one grammar rule.
8. Semantic Analysis
• Semantic analysis is a process of converting the syntactic representations into a
meaning representation.
• This involves the following tasks:
– Word sense determination
– Sentence level analysis
• Word sense: Words have different meanings in different contexts.
– Example: Mary had a bat in her office.
– bat = ‘a baseball thing’
– bat = ‘a flying mammal’
• Sentence Level Analysis:
– Once the words are understood, the sentence must be assigned some meaning
– Examples:
– She saw her duck.
– I saw a man with a telescope.
• Non-examples: Colorless green ideas sleep furiously - >This would be rejected
semantically as colorless green would make no sense
9. Pragmatic Analysis
• Pragmatics comprises aspects of meaning that
depend upon the context or upon facts about real
world.
• It deals with using and understanding sentences in
different situations and how the interpretation of the
sentence is affected.
• These aspects include:
– Pronouns and referring expressions
– Logical inferences, that can be drawn from the meanings
of a set of propositions.
– Discourse structure: the meaning of a collection of
sentences taken together.
11. i. Keyword analysis or Pattern matching
• Accept the given sentence as input.
• Segment the sentence.
• Identify keywords in each segment.
• If keyword is present
– If only one keyword is present , give suitable reply as
of keyword
– If more the one keywords are present, prioritize them
and give suitable reply as of keyword
• If no keyword present in the segment , give a
random reply.
12. ii. Syntactic driven parsing technique
• Words can fit together higher level units such
as phrases, clauses and sentences.
• Interpretations of larger groups of words are
built up out of the interpretation of their
syntactic constituent words or phrases.
• Interpretation I/P done as a whole.
• Obtained by application of grammar that
determines what sentences are legal in the
language that is being parsed.
13. NLP Problems
1. The same expression means different things in different
context.
– Where’s the water? ( Chemistry lab? Must be pure)
– Where’s the water? ( Thirsty? Must be drinking water)
– Where’s the water? ( Leaky roof? It can be dirty)
2. No natural language program can be complete because of
new words, expression, and meaning can be generated quite
freely.
– I’ll fax it to you
3. There are lots of ways to say the same thing.
– Ram was born on October 11.
– Ram’s birthday is October 11.
4. Sentence and phrases might have hidden meanings
– “Out of sight, out of mind”-> “ invisible idiot”
– “The spirit was willing but the flesh was weak” - > “ the vodka was
good, but the meat was bad”
14. 5. Problem due to syntax and semantics
6. Problem due to extensive use of pronouns.
(semantic issue)
– E.g.. Ravi went to the supermarket. He found his
favorite brand of coffee in rack. He paid for it and left.
– It denotes??
7. Use of grammatically incorrect sentence
– He rice eats. (syntax issue)
8. Use of conjunctions to avoid repetition of
phrases cause problem in NLP
– E.g.. Ram and Hari went to restaurant. While Ram had
a cup of coffee, Hari had tea.
– Hari had a cup of tea.
15. Formal Grammar
Most famous classification of grammars and
languages introduced by Noam Chomsky is
divided into four classes:
• Recursively enumerable grammars -
recognizable by a Turing machine.
• Context-sensitive grammars - recognizable by
the linear bounded automaton.
• Context-free grammars - recognizable by the
pushdown automaton.
• Regular grammars - recognizable by the finite
state automaton.
16. Type – 0 grammar
• Type-0 grammars (unrestricted grammars) include
all formal grammars.
• They generate exactly all languages that can be
recognized by a Turing machine.
• These languages are also known as the recursively
enumerable languages.
• Note that this is different from the recursive
languages which can be decided by an always-
halting Turing machine.
• Class 0 grammars are too general to describe the
syntax of programming languages and natural
languages.
17. Type – 1 Grammar
• Type-1 grammars generate the context-sensitive languages.
• These grammars have rules of the form αAβ→αγβ with A a
non-terminal and α, β and γ strings of terminals and non-
terminals.
• The strings α and β may be empty, but γ must be nonempty.
• The languages described by these grammars are exactly all
languages that can be recognized by a linear bounded
automaton.
• Example:
– AB → CDB
– AB → CdEB
– ABcd → abCDBcd
– B → b
18. Type – 2 Grammar
• Type-2 grammars generate the context-free
languages.
• These are defined by rules of the form A → γ with A a
non-terminal and γ a string of terminals and non-
terminals.
• These languages are exactly all languages that can be
recognized by a non-deterministic pushdown
automaton.
• This language is used in AI for Natural language
processing in field of machine learning.
• Context-free languages are the theoretical basis for the
syntax of most programming languages.
• Example:
– A → aBc
19. Type – 3 Grammar
• Type-3 grammars generate the regular languages.
• Such a grammar restricts its rules to a single non-terminal on the left-hand
side and a right-hand side consisting of a single terminal, possibly
followed (or preceded, but not both in the same grammar) by a single non-
terminal.
• The rule S → ε is also allowed here if S does not appear on the right side
of any rule. These languages are exactly all languages that can be decided
by a finite state automaton.
• Additionally, this family of formal languages can be obtained by regular
expressions.
• Regular languages are commonly used to define search patterns and the
lexical structure of programming languages.
• Example:
• A → ε
• A → a
• A → abc
• A → B
• A → abcB
20. Parsing
• The term parsing comes from Latin word “pars”
meaning part.
• Parsing or syntactic analysis is the process of
analyzing a string of symbols, either in natural
language or in computer languages according to
the rules of a formal grammar.
• So Parsing means determining the syntactic
structure of an expression
22. Types of Parser
• The task of the parser is essentially to
determine if and how the input can be derived
from the start symbol of the grammar.
• This can be done in essentially two ways:
1. Top-down parser
2. Bottom-up parser
23. Top-down parsers
• Top-down parsing expands a parse tree from
the start symbol to the leaves.
• Always expand the leftmost non-terminal.
24.
25.
26.
27.
28.
29. Bottom up parsing
• Start at the leaves and grow toward root
– And just as efficient
– Builds on ideas in top-down parsing
– Preferred method in practice
• Also called LR parsing
– L means that tokens are read left to right
– R means that it constructs a rightmost derivation-->