SlideShare a Scribd company logo
1 of 34
Introduction to Natural
Language Processing

Pranav Gupta
Rajat Khanduja
What is NLP ?

”Natural language processing (NLP) is a field of computer science,
 artificial intelligence (also called machine learning), and linguistics
 concerned with the interactions between computers and human
 (natural) languages. Specifically, the process of a computer
 extracting meaningful information from natural language input
 and/or producing natural language output ”

 - Wikipedia
Scope of discussion
• Language of focus :- English

• Domain of Natural Language Processing to be discussed.
   Text linguistics


• Focus on statistical methods.
Why NLP ?
Answering Questions


    ”What time is the next bus from the city after the 5:00 pm bus ?”


    ”I am a 3rd year CSE student, which classes do I have today ?”


    ”Which gene is associated with Diabetes ?”


    ”Who is Donald Knuth ?”
Information extraction
•
    Extraction of meaning from email :-

    We have decided to meet tomorrow at 10:00am in the lab.
Information extraction
•
    Extraction of meaning from email :-

    We have decided to meet tomorrow at 10:00am in the lab.



               To do : meeting
               Time : 10:00 am, 22/3/2012
               Venue : Lab
Machine Translation

मेरा नाम रजत है | => My name is Rajat.
Machine Translation

मेरा नाम रजत है | => My name is Rajat.




Grass is greener on the other side.
Machine Translation

मेरा नाम रजत है | => My name is Rajat.




Grass is greener on the other side. => दूर के ढोल सुहावने |



     Google’s Translation :            घास दूसरी तरफ हिरयाली है |
Other applications
 Text summarization
   • Extract keywords or key-phrases from a large piece of text.
   • Creating an abstract of an entire article.


 Context analysis
   • Social networking sites can ‘fairly’ understand the topic of discussion

   “ 4 of your friends posted about Indian Institute of Technology,
     Guwahati”.


 Sentiment analysis
   • Help companies analyze large number of reviews on a product
   • Help customers process the reviews provided on a product.
Tasks in NLP
• Tokenization / Segmentation

• Disambiguation

• Stemming

• Part of Speech (POS) tagging

• Contextual Analysis

• Sentiment Analysis
Segmentation
• Segmenting text into words

  “The meeting has been scheduled for this Saturday.”

  “He has agreed to co-operate with me.”

  “Indian Airlines introduces another flight on the New Delhi–Mumbai
  route.”

  “We are leaving for the U.S.A. on 26th May.”

    “Vineet is playing the role of Duke of Athens in A Midsummer Night’s
  Dream in a theatre in New Delhi.”

  •Named Entity Recognition
Stemming
• Stemming is the process for reducing inflected (or sometimes
  derived) words to their stem, base or root form.

• car, cars         -> car
• run, ran, running -> run
• stemmer, stemming, stemmed -> stem
POS tagging
• Part of speech (POS) recognition

      “ Today is a beautiful day. “

  Today        is       a       beautiful   day
  Noun         Verb     Article Adjective   Noun
POS tagging
• Part of speech (POS) recognition

      “ Today is a beautiful day. “

  Today        is       a       beautiful           day
  Noun         Verb     Article Adjective           Noun




    “Interest rates interest economists for the interest of the nation.“
 (word sense disambiguation)
Word Sense Disambiguation

•Same word different meanings.

      “He approached many banks for the loan.”
                       vs
      “IIT Guwahati is on the banks of Bhramaputra.”



      “Free lunch.”   vs   “Free speech.”
Contextual Analysis

• “The teacher pointed out that ‘Mark is the smartest person on
  Earth’ has two proper nouns.”

• “Violinist linked with JAL Crash Blossoms.”
Sentiment Analysis
• Reviews about a restaurant :-

  “Best roast chicken in New Delhi.”

  “Service was very disappointing.”
Sentiment Analysis
• Reviews about a restaurant :-

  “Best roast chicken in New Delhi.”

  “Service was very disappointing.”



• Another set of reviews

     “iPhone 4S is over-hyped.”
Sentiment Analysis
• Reviews about a restaurant :-

  “Best roast chicken in New Delhi.”

  “Service was very disappointing.”



• Another set of reviews

     “iPhone 4S is over-hyped.”

  “The hype about iPhone 4S is justified.”
Ambiguous statements
           (Crash blossoms)

• Red Tape Holds Up New Bridges

• Hospitals Are Sued by 7 Foot Doctors

• Juvenile Court to Try Shooting Defendant

• Fed raises interest rates.
Supervised vs. Unsupervised

• Supervised
  • Use of large training data to generalize patterns and rules
  • Example: Hidden Markov Models


• Unsupervised
  • Don’t require training; use in-built rules or a general algorithm;
    can work straightaway on any unknown situations or problem
  • The algorithm may be developed as a result of linguistic analysis
  • Example: ‘Text Rank’ Algorithm for text summarization
General tasks and techniques in NLP
• NLP uses machine learning as well as other AI systems in general.
  More specifically NLP Techniques fall mainly into 3 categories:
1. Symbolic
  Deep analysis of linguistic phenomena
  human verification of facts and rules
  use of inferred data – knowledge generation
2. Statistical
       Mathematical models without much use of linguistic phenomena
       use of large corpora
       Use of only observable knowledge
3. Connectionist
  Use of large corpora
  allows inferencing from the examples
Part of Speech (POS) Tagging
• Given a sentence automatically give the correct part of speech
  for each word.

• Parts of Speech – not the limited set of Nouns, Verbs,
  Adjectives, Pronouns etc. but further subdivisions – Noun-
  Singular, Noun-Plural, Noun-Proper, Verb-Supporting –
  depends on implementation

• Example:
  given : I can can a can.
  output : I_NNP can_VBS can_VB a_DT can_NP
Penn TreeBank Tagset
1. CC Coordinating conjunction                   19. RB Adverb
2. CD Cardinal number                            20. RBR Adverb, comparative
3. DT Determiner                                 21. RBS Adverb, superlative
4. EX Existential there                          22. RP Particle
5. FW Foreign word                               23. SYM Symbol
6. IN Preposition or subordinating conjunction   24. TO to
7. JJ Adjective 8. JJR Adjective, comparative    25. UH Interjection
8. JJS Adjective, superlative                    26. VB Verb, base form
9. LS List item marker                           27. VBD Verb, past tense
10. MD Modal                                     28. VBG Verb, gerund or present participle
11. NN Noun, singular or mass                    29. VBN Verb, past participle
12. NNS Noun, plural                             30. VBP Verb, non-3rd person singular present
13. NP Proper noun, singular                     31. VBZ Verb, 3rd person singular present
14. NPS Proper noun, plural                      32. WDT Wh-determiner
15. PDT Predeterminer                            33. WP Wh-pronoun
16. POS Possessive ending                        34. WP$ Possessive wh-pronoun
17. PP Personal pronoun                          35. WRB Wh-adverb
18. PP$ Possessive pronoun
Hidden Markov Models
An HMM is defined by:
•Set of states ‘S’
•Set of output symbols ‘W’
•Starting Probability Set (A) P(S = si)
•Emission Probability Set (E) P(W = wj / S = si)
•Transition probability Set (T) P(Sk / Sk-1 Sk-2 Sk-3 … S1)


Now one can use the HMM to estimate the most likely sequence of
states given the set of output symbols. (using Viterbi Algorithm)
PoS Tagging and First Order HMM
Our HMM Model of the PoS Tagging Problem

•Set of states (S) = set of PoS tags
•Set of output symbols (W) = set of words in our language
•Initial probability (A) = P(S = si) = probability of the occurrence of the
PoS Tag si in the corpus.
•Emission Probability (E) = P(W = wi / S = si) = probability of occurrence
of the word wi with the PoS Tag si.
•Transition Probability (T) = P(Sk / Sk-1 Sk-2 .. S1) = P(Sk = si/ Sk-1 = sj) =
probability of the occurrence of the PoS Tag si next to the tag sj in the
corpus.
Text Summarization

• Given a piece of text, automatically make a summary
  satisfying required constraints.

• Examples of constraints:
  • Summary should have all the information of the document
  • Summary should have only correct information of the document.
  • Summary should have information only from the document

  and so on, depending on the user’s needs!
Abstraction vs. extraction
  "The Army Corps of Engineers, in their rush to protect New
Orleans by the start of the 2006 hurricane season, installed
defective flood-control pumps despite warnings from its own
expert about the defects”

•Extractive
  "Army Corps of Engineers", "New Orleans", and "defective
flood-control pumps“

•Abstractive
 "political negligence" , "inadequate protection from floods"
Text Rank – Key phrase
Extraction
Questions
Thank You!
Resources
Links
•acl.ldc.upenn.edu/acl2004/emnlp/pdf/Mihalcea.pdf
•ilpubs.stanford.edu:8090/422/1/1999-66.pdf
•en.wikipedia.org/wiki/Automatic_summarization
•en.wikipedia.org/wiki/Viterbi_algorithm
•http://
ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=01450960
•http://nlp-class.org

Books:
•Artificial Intelligence and Intelligence Systems, N.P. Padhy

More Related Content

What's hot

Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingVeenaSKumar2
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)Kuppusamy P
 
natural language processing help at myassignmenthelp.net
natural language processing  help at myassignmenthelp.netnatural language processing  help at myassignmenthelp.net
natural language processing help at myassignmenthelp.netwww.myassignmenthelp.net
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processingrohitnayak
 
Natural lanaguage processing
Natural lanaguage processingNatural lanaguage processing
Natural lanaguage processinggulshan kumar
 
Natural language processing
Natural language processingNatural language processing
Natural language processingprashantdahake
 
Natural Language Processing
Natural Language Processing Natural Language Processing
Natural Language Processing Adarsh Saxena
 
Introduction to natural language processing
Introduction to natural language processingIntroduction to natural language processing
Introduction to natural language processingMinh Pham
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingRishikese MR
 
Natural language processing
Natural language processingNatural language processing
Natural language processingKarenVacca
 
Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)VenkateshMurugadas
 
Natural language processing (NLP)
Natural language processing (NLP) Natural language processing (NLP)
Natural language processing (NLP) ASWINKP11
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingIla Group
 
Lecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language TechnologyLecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language TechnologyMarina Santini
 
Natural Language Processing seminar review
Natural Language Processing seminar review Natural Language Processing seminar review
Natural Language Processing seminar review Jayneel Vora
 

What's hot (20)

NLP
NLPNLP
NLP
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)
 
natural language processing help at myassignmenthelp.net
natural language processing  help at myassignmenthelp.netnatural language processing  help at myassignmenthelp.net
natural language processing help at myassignmenthelp.net
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
NLP
NLPNLP
NLP
 
Natural lanaguage processing
Natural lanaguage processingNatural lanaguage processing
Natural lanaguage processing
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Natural Language Processing
Natural Language Processing Natural Language Processing
Natural Language Processing
 
Nlp
NlpNlp
Nlp
 
Introduction to natural language processing
Introduction to natural language processingIntroduction to natural language processing
Introduction to natural language processing
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)
 
Natural language processing (NLP)
Natural language processing (NLP) Natural language processing (NLP)
Natural language processing (NLP)
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Lecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language TechnologyLecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language Technology
 
Intro to nlp
Intro to nlpIntro to nlp
Intro to nlp
 
NAMED ENTITY RECOGNITION
NAMED ENTITY RECOGNITIONNAMED ENTITY RECOGNITION
NAMED ENTITY RECOGNITION
 
Natural Language Processing seminar review
Natural Language Processing seminar review Natural Language Processing seminar review
Natural Language Processing seminar review
 

Viewers also liked

Analog to digital conversion
Analog to digital conversionAnalog to digital conversion
Analog to digital conversionEngr Ahmad Khan
 
Selling technique - With NLP
Selling technique - With NLPSelling technique - With NLP
Selling technique - With NLPPratibha Mishra
 
Building effective communication skills using NLP
Building effective communication skills using NLPBuilding effective communication skills using NLP
Building effective communication skills using NLPIIBA UK Chapter
 
How People Really Hold and Touch (their Phones)
How People Really Hold and Touch (their Phones)How People Really Hold and Touch (their Phones)
How People Really Hold and Touch (their Phones)Steven Hoober
 
What 33 Successful Entrepreneurs Learned From Failure
What 33 Successful Entrepreneurs Learned From FailureWhat 33 Successful Entrepreneurs Learned From Failure
What 33 Successful Entrepreneurs Learned From FailureReferralCandy
 
Upworthy: 10 Ways To Win The Internets
Upworthy: 10 Ways To Win The InternetsUpworthy: 10 Ways To Win The Internets
Upworthy: 10 Ways To Win The InternetsUpworthy
 
Five Killer Ways to Design The Same Slide
Five Killer Ways to Design The Same SlideFive Killer Ways to Design The Same Slide
Five Killer Ways to Design The Same SlideCrispy Presentations
 
A-Z Culture Glossary 2017
A-Z Culture Glossary 2017A-Z Culture Glossary 2017
A-Z Culture Glossary 2017sparks & honey
 
Digital Strategy 101
Digital Strategy 101Digital Strategy 101
Digital Strategy 101Bud Caddell
 
How I got 2.5 Million views on Slideshare (by @nickdemey - Board of Innovation)
How I got 2.5 Million views on Slideshare (by @nickdemey - Board of Innovation)How I got 2.5 Million views on Slideshare (by @nickdemey - Board of Innovation)
How I got 2.5 Million views on Slideshare (by @nickdemey - Board of Innovation)Board of Innovation
 
The What If Technique presented by Motivate Design
The What If Technique presented by Motivate DesignThe What If Technique presented by Motivate Design
The What If Technique presented by Motivate DesignMotivate Design
 
The Seven Deadly Social Media Sins
The Seven Deadly Social Media SinsThe Seven Deadly Social Media Sins
The Seven Deadly Social Media SinsXPLAIN
 
The History of SEO
The History of SEOThe History of SEO
The History of SEOHubSpot
 
What Would Steve Do? 10 Lessons from the World's Most Captivating Presenters
What Would Steve Do? 10 Lessons from the World's Most Captivating PresentersWhat Would Steve Do? 10 Lessons from the World's Most Captivating Presenters
What Would Steve Do? 10 Lessons from the World's Most Captivating PresentersHubSpot
 
10 Powerful Body Language Tips for your next Presentation
10 Powerful Body Language Tips for your next Presentation10 Powerful Body Language Tips for your next Presentation
10 Powerful Body Language Tips for your next PresentationSOAP Presentations
 

Viewers also liked (17)

Analog to digital conversion
Analog to digital conversionAnalog to digital conversion
Analog to digital conversion
 
Selling technique - With NLP
Selling technique - With NLPSelling technique - With NLP
Selling technique - With NLP
 
Building effective communication skills using NLP
Building effective communication skills using NLPBuilding effective communication skills using NLP
Building effective communication skills using NLP
 
How People Really Hold and Touch (their Phones)
How People Really Hold and Touch (their Phones)How People Really Hold and Touch (their Phones)
How People Really Hold and Touch (their Phones)
 
What 33 Successful Entrepreneurs Learned From Failure
What 33 Successful Entrepreneurs Learned From FailureWhat 33 Successful Entrepreneurs Learned From Failure
What 33 Successful Entrepreneurs Learned From Failure
 
Upworthy: 10 Ways To Win The Internets
Upworthy: 10 Ways To Win The InternetsUpworthy: 10 Ways To Win The Internets
Upworthy: 10 Ways To Win The Internets
 
Five Killer Ways to Design The Same Slide
Five Killer Ways to Design The Same SlideFive Killer Ways to Design The Same Slide
Five Killer Ways to Design The Same Slide
 
A-Z Culture Glossary 2017
A-Z Culture Glossary 2017A-Z Culture Glossary 2017
A-Z Culture Glossary 2017
 
Digital Strategy 101
Digital Strategy 101Digital Strategy 101
Digital Strategy 101
 
How I got 2.5 Million views on Slideshare (by @nickdemey - Board of Innovation)
How I got 2.5 Million views on Slideshare (by @nickdemey - Board of Innovation)How I got 2.5 Million views on Slideshare (by @nickdemey - Board of Innovation)
How I got 2.5 Million views on Slideshare (by @nickdemey - Board of Innovation)
 
The What If Technique presented by Motivate Design
The What If Technique presented by Motivate DesignThe What If Technique presented by Motivate Design
The What If Technique presented by Motivate Design
 
The Seven Deadly Social Media Sins
The Seven Deadly Social Media SinsThe Seven Deadly Social Media Sins
The Seven Deadly Social Media Sins
 
The History of SEO
The History of SEOThe History of SEO
The History of SEO
 
Displaying Data
Displaying DataDisplaying Data
Displaying Data
 
What Would Steve Do? 10 Lessons from the World's Most Captivating Presenters
What Would Steve Do? 10 Lessons from the World's Most Captivating PresentersWhat Would Steve Do? 10 Lessons from the World's Most Captivating Presenters
What Would Steve Do? 10 Lessons from the World's Most Captivating Presenters
 
How Google Works
How Google WorksHow Google Works
How Google Works
 
10 Powerful Body Language Tips for your next Presentation
10 Powerful Body Language Tips for your next Presentation10 Powerful Body Language Tips for your next Presentation
10 Powerful Body Language Tips for your next Presentation
 

Similar to Introduction to Natural Language Processing

Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingToine Bogers
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Mustafa Jarrar
 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsRoelof Pieters
 
A Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingA Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingTed Xiao
 
NLP_guest_lecture.pdf
NLP_guest_lecture.pdfNLP_guest_lecture.pdf
NLP_guest_lecture.pdfSoha82
 
Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...RajkiranVeluri
 
NLP pipeline in machine translation
NLP pipeline in machine translationNLP pipeline in machine translation
NLP pipeline in machine translationMarcis Pinnis
 
Natural Language parsing.pptx
Natural Language parsing.pptxNatural Language parsing.pptx
Natural Language parsing.pptxsiddhantroy13
 
Why Watson Won: A cognitive perspective
Why Watson Won: A cognitive perspectiveWhy Watson Won: A cognitive perspective
Why Watson Won: A cognitive perspectiveJames Hendler
 
Natural language processing
Natural language processingNatural language processing
Natural language processingBasha Chand
 
NLP introduced and in 47 slides Lecture 1.ppt
NLP introduced and in 47 slides Lecture 1.pptNLP introduced and in 47 slides Lecture 1.ppt
NLP introduced and in 47 slides Lecture 1.pptOlusolaTop
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Saurabh Kaushik
 
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...Seth Grimes
 
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...Seth Grimes
 
Using and learning phrases
Using and learning phrasesUsing and learning phrases
Using and learning phrasesCassandra Jacobs
 
Visual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on LanguageVisual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on LanguageRoelof Pieters
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processingpunedevscom
 
Chi-Un Lei "Text Mining and Educational Discourse"
Chi-Un Lei "Text Mining and Educational Discourse"Chi-Un Lei "Text Mining and Educational Discourse"
Chi-Un Lei "Text Mining and Educational Discourse"CITE
 

Similar to Introduction to Natural Language Processing (20)

Nlp app
Nlp appNlp app
Nlp app
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing
 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word Embeddings
 
A Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingA Panorama of Natural Language Processing
A Panorama of Natural Language Processing
 
NLP_guest_lecture.pdf
NLP_guest_lecture.pdfNLP_guest_lecture.pdf
NLP_guest_lecture.pdf
 
Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...
 
NLP pipeline in machine translation
NLP pipeline in machine translationNLP pipeline in machine translation
NLP pipeline in machine translation
 
intro.ppt
intro.pptintro.ppt
intro.ppt
 
Natural Language parsing.pptx
Natural Language parsing.pptxNatural Language parsing.pptx
Natural Language parsing.pptx
 
Why Watson Won: A cognitive perspective
Why Watson Won: A cognitive perspectiveWhy Watson Won: A cognitive perspective
Why Watson Won: A cognitive perspective
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
NLP introduced and in 47 slides Lecture 1.ppt
NLP introduced and in 47 slides Lecture 1.pptNLP introduced and in 47 slides Lecture 1.ppt
NLP introduced and in 47 slides Lecture 1.ppt
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
 
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
 
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
 
Using and learning phrases
Using and learning phrasesUsing and learning phrases
Using and learning phrases
 
Visual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on LanguageVisual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on Language
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Chi-Un Lei "Text Mining and Educational Discourse"
Chi-Un Lei "Text Mining and Educational Discourse"Chi-Un Lei "Text Mining and Educational Discourse"
Chi-Un Lei "Text Mining and Educational Discourse"
 

Recently uploaded

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 

Recently uploaded (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 

Introduction to Natural Language Processing

  • 1. Introduction to Natural Language Processing Pranav Gupta Rajat Khanduja
  • 2. What is NLP ? ”Natural language processing (NLP) is a field of computer science, artificial intelligence (also called machine learning), and linguistics concerned with the interactions between computers and human (natural) languages. Specifically, the process of a computer extracting meaningful information from natural language input and/or producing natural language output ” - Wikipedia
  • 3. Scope of discussion • Language of focus :- English • Domain of Natural Language Processing to be discussed. Text linguistics • Focus on statistical methods.
  • 5. Answering Questions  ”What time is the next bus from the city after the 5:00 pm bus ?”  ”I am a 3rd year CSE student, which classes do I have today ?”  ”Which gene is associated with Diabetes ?”  ”Who is Donald Knuth ?”
  • 6. Information extraction • Extraction of meaning from email :- We have decided to meet tomorrow at 10:00am in the lab.
  • 7. Information extraction • Extraction of meaning from email :- We have decided to meet tomorrow at 10:00am in the lab. To do : meeting Time : 10:00 am, 22/3/2012 Venue : Lab
  • 8. Machine Translation मेरा नाम रजत है | => My name is Rajat.
  • 9. Machine Translation मेरा नाम रजत है | => My name is Rajat. Grass is greener on the other side.
  • 10. Machine Translation मेरा नाम रजत है | => My name is Rajat. Grass is greener on the other side. => दूर के ढोल सुहावने | Google’s Translation : घास दूसरी तरफ हिरयाली है |
  • 11. Other applications  Text summarization • Extract keywords or key-phrases from a large piece of text. • Creating an abstract of an entire article.  Context analysis • Social networking sites can ‘fairly’ understand the topic of discussion “ 4 of your friends posted about Indian Institute of Technology, Guwahati”.  Sentiment analysis • Help companies analyze large number of reviews on a product • Help customers process the reviews provided on a product.
  • 12. Tasks in NLP • Tokenization / Segmentation • Disambiguation • Stemming • Part of Speech (POS) tagging • Contextual Analysis • Sentiment Analysis
  • 13. Segmentation • Segmenting text into words “The meeting has been scheduled for this Saturday.” “He has agreed to co-operate with me.” “Indian Airlines introduces another flight on the New Delhi–Mumbai route.” “We are leaving for the U.S.A. on 26th May.” “Vineet is playing the role of Duke of Athens in A Midsummer Night’s Dream in a theatre in New Delhi.” •Named Entity Recognition
  • 14. Stemming • Stemming is the process for reducing inflected (or sometimes derived) words to their stem, base or root form. • car, cars -> car • run, ran, running -> run • stemmer, stemming, stemmed -> stem
  • 15. POS tagging • Part of speech (POS) recognition “ Today is a beautiful day. “ Today is a beautiful day Noun Verb Article Adjective Noun
  • 16. POS tagging • Part of speech (POS) recognition “ Today is a beautiful day. “ Today is a beautiful day Noun Verb Article Adjective Noun “Interest rates interest economists for the interest of the nation.“ (word sense disambiguation)
  • 17. Word Sense Disambiguation •Same word different meanings. “He approached many banks for the loan.” vs “IIT Guwahati is on the banks of Bhramaputra.” “Free lunch.” vs “Free speech.”
  • 18. Contextual Analysis • “The teacher pointed out that ‘Mark is the smartest person on Earth’ has two proper nouns.” • “Violinist linked with JAL Crash Blossoms.”
  • 19. Sentiment Analysis • Reviews about a restaurant :- “Best roast chicken in New Delhi.” “Service was very disappointing.”
  • 20. Sentiment Analysis • Reviews about a restaurant :- “Best roast chicken in New Delhi.” “Service was very disappointing.” • Another set of reviews “iPhone 4S is over-hyped.”
  • 21. Sentiment Analysis • Reviews about a restaurant :- “Best roast chicken in New Delhi.” “Service was very disappointing.” • Another set of reviews “iPhone 4S is over-hyped.” “The hype about iPhone 4S is justified.”
  • 22. Ambiguous statements (Crash blossoms) • Red Tape Holds Up New Bridges • Hospitals Are Sued by 7 Foot Doctors • Juvenile Court to Try Shooting Defendant • Fed raises interest rates.
  • 23. Supervised vs. Unsupervised • Supervised • Use of large training data to generalize patterns and rules • Example: Hidden Markov Models • Unsupervised • Don’t require training; use in-built rules or a general algorithm; can work straightaway on any unknown situations or problem • The algorithm may be developed as a result of linguistic analysis • Example: ‘Text Rank’ Algorithm for text summarization
  • 24. General tasks and techniques in NLP • NLP uses machine learning as well as other AI systems in general. More specifically NLP Techniques fall mainly into 3 categories: 1. Symbolic Deep analysis of linguistic phenomena human verification of facts and rules use of inferred data – knowledge generation 2. Statistical Mathematical models without much use of linguistic phenomena use of large corpora Use of only observable knowledge 3. Connectionist Use of large corpora allows inferencing from the examples
  • 25. Part of Speech (POS) Tagging • Given a sentence automatically give the correct part of speech for each word. • Parts of Speech – not the limited set of Nouns, Verbs, Adjectives, Pronouns etc. but further subdivisions – Noun- Singular, Noun-Plural, Noun-Proper, Verb-Supporting – depends on implementation • Example: given : I can can a can. output : I_NNP can_VBS can_VB a_DT can_NP
  • 26. Penn TreeBank Tagset 1. CC Coordinating conjunction 19. RB Adverb 2. CD Cardinal number 20. RBR Adverb, comparative 3. DT Determiner 21. RBS Adverb, superlative 4. EX Existential there 22. RP Particle 5. FW Foreign word 23. SYM Symbol 6. IN Preposition or subordinating conjunction 24. TO to 7. JJ Adjective 8. JJR Adjective, comparative 25. UH Interjection 8. JJS Adjective, superlative 26. VB Verb, base form 9. LS List item marker 27. VBD Verb, past tense 10. MD Modal 28. VBG Verb, gerund or present participle 11. NN Noun, singular or mass 29. VBN Verb, past participle 12. NNS Noun, plural 30. VBP Verb, non-3rd person singular present 13. NP Proper noun, singular 31. VBZ Verb, 3rd person singular present 14. NPS Proper noun, plural 32. WDT Wh-determiner 15. PDT Predeterminer 33. WP Wh-pronoun 16. POS Possessive ending 34. WP$ Possessive wh-pronoun 17. PP Personal pronoun 35. WRB Wh-adverb 18. PP$ Possessive pronoun
  • 27. Hidden Markov Models An HMM is defined by: •Set of states ‘S’ •Set of output symbols ‘W’ •Starting Probability Set (A) P(S = si) •Emission Probability Set (E) P(W = wj / S = si) •Transition probability Set (T) P(Sk / Sk-1 Sk-2 Sk-3 … S1) Now one can use the HMM to estimate the most likely sequence of states given the set of output symbols. (using Viterbi Algorithm)
  • 28. PoS Tagging and First Order HMM Our HMM Model of the PoS Tagging Problem •Set of states (S) = set of PoS tags •Set of output symbols (W) = set of words in our language •Initial probability (A) = P(S = si) = probability of the occurrence of the PoS Tag si in the corpus. •Emission Probability (E) = P(W = wi / S = si) = probability of occurrence of the word wi with the PoS Tag si. •Transition Probability (T) = P(Sk / Sk-1 Sk-2 .. S1) = P(Sk = si/ Sk-1 = sj) = probability of the occurrence of the PoS Tag si next to the tag sj in the corpus.
  • 29. Text Summarization • Given a piece of text, automatically make a summary satisfying required constraints. • Examples of constraints: • Summary should have all the information of the document • Summary should have only correct information of the document. • Summary should have information only from the document and so on, depending on the user’s needs!
  • 30. Abstraction vs. extraction "The Army Corps of Engineers, in their rush to protect New Orleans by the start of the 2006 hurricane season, installed defective flood-control pumps despite warnings from its own expert about the defects” •Extractive "Army Corps of Engineers", "New Orleans", and "defective flood-control pumps“ •Abstractive "political negligence" , "inadequate protection from floods"
  • 31. Text Rank – Key phrase Extraction