SlideShare una empresa de Scribd logo
1 de 22
Descargar para leer sin conexión
Natural Language Processing
Unit 1 – Introduction
Anantharaman Narayana Iyer
narayana dot anantharaman at gmail dot com
7th Aug 2015
Topics
• Motivation: Why NLP?
• Course Outline
• Grading Policy
What are the opportunities for NLP?
NLP is a hugely important topic for both industry and academia
Trends that accelerate NLP research
• Availability of web and social data
• Mobile devices as a source of data
• Need for natural language based I/O for
new devices
• ML techniques: eg deep learning
• Increasing availability of datasets in open
web e.g. Freebase, dbpedia
Motivation
• Google Search Engine
• Intelligently responding to the
query: eg, Where is India Gate?
• Predicting next word for
autocompletion
• Ability to do spelling corrections
• Segmenting words that may be
joined without space
• Ranking the search results
• Google translate
• Gmail
• Eg, Understand contents of an e-
mail through NLP and alert the
user
Speech/NLP
• What technologies
are involved here?
- Continuous Speech Recognition
- Keyword Spotting
- Text to speech
- Speech in Speech out systems
- Speaker identification
- Novel applications (to be explained on the board)
Disambiguation
• Consider an example below.
• We would like to collect tweets on a subject
(Say Rahul Gandhi) and analyse the
sentiment
• We can do a search on Twitter with the
Search API with key words: “Rahul Gandhi”
• This might miss tweets that have only the
term Rahul and not Gandhi.
• If we just search for the search terms:
[“Rahul”, “Gandhi”], we may get results that
match any Rahul (e.g Rahul Dravid or KL
Rahul)
• We can do an intelligent tweet search
using NLP techniques
Summarization
• The challenge we face is not the lack of
information but the overload.
• Summarization is a core technology that
can help address information overload
• Related Problems:
• How to validate the quality, correctness of
information?
• Summarizing multimedia
• How do we summarize social data, where:
• Data may have less signal, more noise!
• Data may be biased
• Data may not be factual
• Repetitive
• Can we autogenerate a (set of) Tweet(s)
from a news article?
Answer Evaluation
• Answer evaluation is a core
challenge for online
education systems.
• Wouldn’t it be nice if
questions can be both
descriptive as well as
objective?
• Can there be an automated
answer evaluation system
that doesn’t require peer
evaluation?
Sentiment Analysis
• Measurement of pulse of people
from social media
• Can measure sentiments against
a brand or product or events.
• Crowded space but not a fully
solved problem due to inherent
challenges in Natural Language
Processing
• Can we build a sentiment
analyser using RNNs and
evaluate the performance?
Plagiarism Detection
Dialog Systems
• Dialog systems that can be deployed
commercially?
• Natural Language Processing
• Natural language generation
Can we build a NLG library and make it open source?
Demo
• http://www.manifestation.com/neurotoys/eliza.php3
Course Structure
• Foundational
• Emerging
• Applications
Course Positioning
• Classical NLP techniques (such as Language Models, MaxEnt
classifiers, HMM, CRF etc) have proven to be effective in
addressing problems like Part of Speech tagging, Text
classification, Information Retrieval etc. However they are
inadequate when dealing with problems that involve more
semantics
• Modern approaches (such as deep learning) hold lot of
promise in addressing problems involving semantics. They
were also shown to produce results better than or equal to
classical techniques for typical NLP tasks.
• Internationally acclaimed courses like those offered by Dan
Jurafsky, Christopher Manning, Michael Collins on Coursera
and also those offered at Stanford are strong in the
traditional topics and somewhat light when discussing
emerging topics.
• The recent course by Socher at Stanford is heavy on
Recurrent network based approaches but assumes that the
student is familiar to a good extent with the traditional NLP
• Our course takes the best of both worlds and backs it up
with intense hands on work.
Key Topics
• Foundational
• Words, sentences: Tokenization, regular expressions, challenges of ambiguity, edit distance,
spelling corrections, string similarity, tf, tf-idf
• Stemming, Lemmatization
• Language models, smoothing, applications to speech, metrics
• Tagging problems: Viterbi Algorithm (HMM), POS, NER tagging, SRL
• Parsing: PCFG, CKY algorithm
• Information Retrieval, Information Extraction, Word Sense disambiguation, Summarization,
Q&A systems, Dialogue Systems
• Natural Language Generation
• Emerging Approaches:
• Deep Learning and Vector Space approaches to: Word representation, Sentence and text
compositionality, LM, Parsing, Parsing, Q&A Systems
• Applications:
• Modern approaches to many exciting applications including speech
Course Grading Policy
• Unit Evaluations (3 out of 5): 30%
• Lab sessions (2 out of 5): 10%
• T1: 15%
• Final Exam: 3 days, 6 to 8 hours per day of product development (Will
be run like a hackathon with a 90 minutes objective type written test
on day 1): 15% (for test) + 25% (for hands on)
• Attendance: 5%
Challenges: Why NLP is hard?
The central challenge of Natural Language Processing is ambiguity and
it exists at every level or stage of NLP
Poets and writers thrive on ambiguity in the language semantics while
most of us abhor ambiguity!
Can the NLP understand poetry or better still, can it generate one?
That seems to be the ultimate!
Another challenge is the representation: How to represent words?
Sentences? Large text? How to model the real world knowledge?
One prayer, 25 interpretations! (Ref: Raghuvamsa
by Kalidasa)
Vagarthaviva sampriktau vagarthah pratipattaye | Jagatah pitarau
vande parvathiparameshwarau || – Raghuvamsha 1.1
• Common Meaning: I pray parents of the world, Lord Shiva and
Mother Parvathi, who are inseparable as speech and its meaning to
gain knowledge of speech and its meaning.
Ambiguity – some examples
• Homophones: Words with same pronunciation but with different meanings
• Peace, piece: A spoken sentence like “The PM attended the peace summit” has an ambiguity at the term “peace”, as
a speech to text translation might translate this as “piece”
• Knew, new
• Weak, week
• Word boundary
• It’s all ready, looking great!
• It’s already looking great!
• Syntactic Ambiguity: Arises due to different parse trees for the same input
• Phrase boundary
• Ananth created the presentation with video from web: ‘with video’ can be attached as “Ananth created the presentation, ‘with video’ “ or to
“Ananth created the ‘presentation with video’”
• Semantic level ambiguity: Many ways to interpret a sentence
• John and Susan are married (to each other? Separately?)
• Ram had a smooth sailing.
• Prices have gone through the roof
• India says it can’t accept the proposal
Representation: Text, Images, Audio, Video
• What are the distinguishing characteristics of text data and what are the unique challenges?
• Text is made of words, images of pixels, audio with sampled and digitized audio signal, video with
image frames in motion
• How do we represent a piece of text in the computer?
• Let’s do a simple exercise: What are the thoughts, emotions that cross your mind when you hear
the following words?
• Kalam
• Brilliant
• Pleasant
• Destruction
• Perfume
• Code
• Test
• Run
• Signal
• Words can be used in different contexts and the context is key to interpreting the
meaning of the word

Más contenido relacionado

La actualidad más candente

Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
Yasir Khan
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
Mariana Soffer
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing
Mustafa Jarrar
 
Natural Language Processing in Alternative and Augmentative Communication
Natural Language Processing in Alternative and Augmentative CommunicationNatural Language Processing in Alternative and Augmentative Communication
Natural Language Processing in Alternative and Augmentative Communication
Divya Sugumar
 

La actualidad más candente (20)

Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Natural language processing (NLP) introduction
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introduction
 
UCU NLP Summer Workshops 2017 - Part 2
UCU NLP Summer Workshops 2017 - Part 2UCU NLP Summer Workshops 2017 - Part 2
UCU NLP Summer Workshops 2017 - Part 2
 
Big Data and Natural Language Processing
Big Data and Natural Language ProcessingBig Data and Natural Language Processing
Big Data and Natural Language Processing
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Lecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language TechnologyLecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language Technology
 
Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Nlp
NlpNlp
Nlp
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing
 
Recent Advances in NLP
  Recent Advances in NLP  Recent Advances in NLP
Recent Advances in NLP
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
NLP
NLPNLP
NLP
 
Anthiil Inside workshop on NLP
Anthiil Inside workshop on NLPAnthiil Inside workshop on NLP
Anthiil Inside workshop on NLP
 
A Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingA Panorama of Natural Language Processing
A Panorama of Natural Language Processing
 
Natural Language Processing in Alternative and Augmentative Communication
Natural Language Processing in Alternative and Augmentative CommunicationNatural Language Processing in Alternative and Augmentative Communication
Natural Language Processing in Alternative and Augmentative Communication
 

Destacado

Practical Natural Language Processing
Practical Natural Language ProcessingPractical Natural Language Processing
Practical Natural Language Processing
Jaganadh Gopinadhan
 
Finalpresentation
FinalpresentationFinalpresentation
Finalpresentation
Andrea Hill
 

Destacado (20)

Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Practical Natural Language Processing
Practical Natural Language ProcessingPractical Natural Language Processing
Practical Natural Language Processing
 
Deep Learning Primer - a brief introduction
Deep Learning Primer - a brief introductionDeep Learning Primer - a brief introduction
Deep Learning Primer - a brief introduction
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
NLP
NLPNLP
NLP
 
Natural Language Processing: L03 maths fornlp
Natural Language Processing: L03 maths fornlpNatural Language Processing: L03 maths fornlp
Natural Language Processing: L03 maths fornlp
 
Deep Learning For Practitioners, lecture 2: Selecting the right applications...
Deep Learning For Practitioners,  lecture 2: Selecting the right applications...Deep Learning For Practitioners,  lecture 2: Selecting the right applications...
Deep Learning For Practitioners, lecture 2: Selecting the right applications...
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Overview of TensorFlow For Natural Language Processing
Overview of TensorFlow For Natural Language ProcessingOverview of TensorFlow For Natural Language Processing
Overview of TensorFlow For Natural Language Processing
 
L05 language model_part2
L05 language model_part2L05 language model_part2
L05 language model_part2
 
Convolutional Neural Networks: Part 1
Convolutional Neural Networks: Part 1Convolutional Neural Networks: Part 1
Convolutional Neural Networks: Part 1
 
Deep Learning For Speech Recognition
Deep Learning For Speech RecognitionDeep Learning For Speech Recognition
Deep Learning For Speech Recognition
 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRU
 
Nlp
NlpNlp
Nlp
 
Finalpresentation
FinalpresentationFinalpresentation
Finalpresentation
 
Natural Language Processing
Natural Language Processing Natural Language Processing
Natural Language Processing
 
Natural Language Processing glossary for Coders
Natural Language Processing glossary for CodersNatural Language Processing glossary for Coders
Natural Language Processing glossary for Coders
 
ADO.NET Introduction
ADO.NET IntroductionADO.NET Introduction
ADO.NET Introduction
 

Similar a Natural Language Processing: L01 introduction

Natural Language Processing for development
Natural Language Processing for developmentNatural Language Processing for development
Natural Language Processing for development
Aravind Reddy
 
Natural Language Processing for development
Natural Language Processing for developmentNatural Language Processing for development
Natural Language Processing for development
Aravind Reddy
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)
Kuppusamy P
 
Natural Language Processing (NLP).pptx
Natural Language Processing (NLP).pptxNatural Language Processing (NLP).pptx
Natural Language Processing (NLP).pptx
SHIBDASDUTTA
 
Technology that enhances classroom learning
Technology that enhances classroom learningTechnology that enhances classroom learning
Technology that enhances classroom learning
Carrie Davenport
 

Similar a Natural Language Processing: L01 introduction (20)

Introduction to NLP.pptx
Introduction to NLP.pptxIntroduction to NLP.pptx
Introduction to NLP.pptx
 
Natural Language Processing for development
Natural Language Processing for developmentNatural Language Processing for development
Natural Language Processing for development
 
Natural Language Processing for development
Natural Language Processing for developmentNatural Language Processing for development
Natural Language Processing for development
 
Natural language processing and search
Natural language processing and searchNatural language processing and search
Natural language processing and search
 
NOVA Data Science Meetup 1/19/2017 - Presentation 2
NOVA Data Science Meetup 1/19/2017 - Presentation 2NOVA Data Science Meetup 1/19/2017 - Presentation 2
NOVA Data Science Meetup 1/19/2017 - Presentation 2
 
Text analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATEText analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATE
 
Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...
Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...
Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...
 
Nlp presentation
Nlp presentationNlp presentation
Nlp presentation
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)
 
Building NLP solutions for Davidson ML Group
Building NLP solutions for Davidson ML GroupBuilding NLP solutions for Davidson ML Group
Building NLP solutions for Davidson ML Group
 
Addis Ababa University.pptx
Addis Ababa University.pptxAddis Ababa University.pptx
Addis Ababa University.pptx
 
Natural Language Processing (NLP).pptx
Natural Language Processing (NLP).pptxNatural Language Processing (NLP).pptx
Natural Language Processing (NLP).pptx
 
NLP,expert,robotics.pptx
NLP,expert,robotics.pptxNLP,expert,robotics.pptx
NLP,expert,robotics.pptx
 
GATE: a text analysis tool for social media
GATE: a text analysis tool for social mediaGATE: a text analysis tool for social media
GATE: a text analysis tool for social media
 
Open Creativity Scoring Tutorial
Open Creativity Scoring TutorialOpen Creativity Scoring Tutorial
Open Creativity Scoring Tutorial
 
introduction to natural language processing(NLP).ppt
introduction to natural language processing(NLP).pptintroduction to natural language processing(NLP).ppt
introduction to natural language processing(NLP).ppt
 
Introduction to nlp
Introduction to nlpIntroduction to nlp
Introduction to nlp
 
Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...
 
Beyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLPBeyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLP
 
Technology that enhances classroom learning
Technology that enhances classroom learningTechnology that enhances classroom learning
Technology that enhances classroom learning
 

Más de ananth

Más de ananth (16)

Generative Adversarial Networks : Basic architecture and variants
Generative Adversarial Networks : Basic architecture and variantsGenerative Adversarial Networks : Basic architecture and variants
Generative Adversarial Networks : Basic architecture and variants
 
Convolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular ArchitecturesConvolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular Architectures
 
Foundations: Artificial Neural Networks
Foundations: Artificial Neural NetworksFoundations: Artificial Neural Networks
Foundations: Artificial Neural Networks
 
Overview of Convolutional Neural Networks
Overview of Convolutional Neural NetworksOverview of Convolutional Neural Networks
Overview of Convolutional Neural Networks
 
Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models
 
An Overview of Naïve Bayes Classifier
An Overview of Naïve Bayes Classifier An Overview of Naïve Bayes Classifier
An Overview of Naïve Bayes Classifier
 
Mathematical Background for Artificial Intelligence
Mathematical Background for Artificial IntelligenceMathematical Background for Artificial Intelligence
Mathematical Background for Artificial Intelligence
 
Search problems in Artificial Intelligence
Search problems in Artificial IntelligenceSearch problems in Artificial Intelligence
Search problems in Artificial Intelligence
 
Introduction to Artificial Intelligence
Introduction to Artificial IntelligenceIntroduction to Artificial Intelligence
Introduction to Artificial Intelligence
 
Machine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision TreesMachine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision Trees
 
Machine Learning Lecture 2 Basics
Machine Learning Lecture 2 BasicsMachine Learning Lecture 2 Basics
Machine Learning Lecture 2 Basics
 
Introduction To Applied Machine Learning
Introduction To Applied Machine LearningIntroduction To Applied Machine Learning
Introduction To Applied Machine Learning
 
MaxEnt (Loglinear) Models - Overview
MaxEnt (Loglinear) Models - OverviewMaxEnt (Loglinear) Models - Overview
MaxEnt (Loglinear) Models - Overview
 
An overview of Hidden Markov Models (HMM)
An overview of Hidden Markov Models (HMM)An overview of Hidden Markov Models (HMM)
An overview of Hidden Markov Models (HMM)
 
L06 stemmer and edit distance
L06 stemmer and edit distanceL06 stemmer and edit distance
L06 stemmer and edit distance
 
L05 word representation
L05 word representationL05 word representation
L05 word representation
 

Último

Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
VictoriaMetrics
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
masabamasaba
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Último (20)

Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 

Natural Language Processing: L01 introduction

  • 1. Natural Language Processing Unit 1 – Introduction Anantharaman Narayana Iyer narayana dot anantharaman at gmail dot com 7th Aug 2015
  • 2. Topics • Motivation: Why NLP? • Course Outline • Grading Policy
  • 3. What are the opportunities for NLP?
  • 4. NLP is a hugely important topic for both industry and academia
  • 5. Trends that accelerate NLP research • Availability of web and social data • Mobile devices as a source of data • Need for natural language based I/O for new devices • ML techniques: eg deep learning • Increasing availability of datasets in open web e.g. Freebase, dbpedia
  • 6. Motivation • Google Search Engine • Intelligently responding to the query: eg, Where is India Gate? • Predicting next word for autocompletion • Ability to do spelling corrections • Segmenting words that may be joined without space • Ranking the search results • Google translate • Gmail • Eg, Understand contents of an e- mail through NLP and alert the user
  • 7. Speech/NLP • What technologies are involved here? - Continuous Speech Recognition - Keyword Spotting - Text to speech - Speech in Speech out systems - Speaker identification - Novel applications (to be explained on the board)
  • 8. Disambiguation • Consider an example below. • We would like to collect tweets on a subject (Say Rahul Gandhi) and analyse the sentiment • We can do a search on Twitter with the Search API with key words: “Rahul Gandhi” • This might miss tweets that have only the term Rahul and not Gandhi. • If we just search for the search terms: [“Rahul”, “Gandhi”], we may get results that match any Rahul (e.g Rahul Dravid or KL Rahul) • We can do an intelligent tweet search using NLP techniques
  • 9. Summarization • The challenge we face is not the lack of information but the overload. • Summarization is a core technology that can help address information overload • Related Problems: • How to validate the quality, correctness of information? • Summarizing multimedia • How do we summarize social data, where: • Data may have less signal, more noise! • Data may be biased • Data may not be factual • Repetitive • Can we autogenerate a (set of) Tweet(s) from a news article?
  • 10. Answer Evaluation • Answer evaluation is a core challenge for online education systems. • Wouldn’t it be nice if questions can be both descriptive as well as objective? • Can there be an automated answer evaluation system that doesn’t require peer evaluation?
  • 11. Sentiment Analysis • Measurement of pulse of people from social media • Can measure sentiments against a brand or product or events. • Crowded space but not a fully solved problem due to inherent challenges in Natural Language Processing • Can we build a sentiment analyser using RNNs and evaluate the performance?
  • 13. Dialog Systems • Dialog systems that can be deployed commercially? • Natural Language Processing • Natural language generation Can we build a NLG library and make it open source?
  • 15. Course Structure • Foundational • Emerging • Applications
  • 16. Course Positioning • Classical NLP techniques (such as Language Models, MaxEnt classifiers, HMM, CRF etc) have proven to be effective in addressing problems like Part of Speech tagging, Text classification, Information Retrieval etc. However they are inadequate when dealing with problems that involve more semantics • Modern approaches (such as deep learning) hold lot of promise in addressing problems involving semantics. They were also shown to produce results better than or equal to classical techniques for typical NLP tasks. • Internationally acclaimed courses like those offered by Dan Jurafsky, Christopher Manning, Michael Collins on Coursera and also those offered at Stanford are strong in the traditional topics and somewhat light when discussing emerging topics. • The recent course by Socher at Stanford is heavy on Recurrent network based approaches but assumes that the student is familiar to a good extent with the traditional NLP • Our course takes the best of both worlds and backs it up with intense hands on work.
  • 17. Key Topics • Foundational • Words, sentences: Tokenization, regular expressions, challenges of ambiguity, edit distance, spelling corrections, string similarity, tf, tf-idf • Stemming, Lemmatization • Language models, smoothing, applications to speech, metrics • Tagging problems: Viterbi Algorithm (HMM), POS, NER tagging, SRL • Parsing: PCFG, CKY algorithm • Information Retrieval, Information Extraction, Word Sense disambiguation, Summarization, Q&A systems, Dialogue Systems • Natural Language Generation • Emerging Approaches: • Deep Learning and Vector Space approaches to: Word representation, Sentence and text compositionality, LM, Parsing, Parsing, Q&A Systems • Applications: • Modern approaches to many exciting applications including speech
  • 18. Course Grading Policy • Unit Evaluations (3 out of 5): 30% • Lab sessions (2 out of 5): 10% • T1: 15% • Final Exam: 3 days, 6 to 8 hours per day of product development (Will be run like a hackathon with a 90 minutes objective type written test on day 1): 15% (for test) + 25% (for hands on) • Attendance: 5%
  • 19. Challenges: Why NLP is hard? The central challenge of Natural Language Processing is ambiguity and it exists at every level or stage of NLP Poets and writers thrive on ambiguity in the language semantics while most of us abhor ambiguity! Can the NLP understand poetry or better still, can it generate one? That seems to be the ultimate! Another challenge is the representation: How to represent words? Sentences? Large text? How to model the real world knowledge?
  • 20. One prayer, 25 interpretations! (Ref: Raghuvamsa by Kalidasa) Vagarthaviva sampriktau vagarthah pratipattaye | Jagatah pitarau vande parvathiparameshwarau || – Raghuvamsha 1.1 • Common Meaning: I pray parents of the world, Lord Shiva and Mother Parvathi, who are inseparable as speech and its meaning to gain knowledge of speech and its meaning.
  • 21. Ambiguity – some examples • Homophones: Words with same pronunciation but with different meanings • Peace, piece: A spoken sentence like “The PM attended the peace summit” has an ambiguity at the term “peace”, as a speech to text translation might translate this as “piece” • Knew, new • Weak, week • Word boundary • It’s all ready, looking great! • It’s already looking great! • Syntactic Ambiguity: Arises due to different parse trees for the same input • Phrase boundary • Ananth created the presentation with video from web: ‘with video’ can be attached as “Ananth created the presentation, ‘with video’ “ or to “Ananth created the ‘presentation with video’” • Semantic level ambiguity: Many ways to interpret a sentence • John and Susan are married (to each other? Separately?) • Ram had a smooth sailing. • Prices have gone through the roof • India says it can’t accept the proposal
  • 22. Representation: Text, Images, Audio, Video • What are the distinguishing characteristics of text data and what are the unique challenges? • Text is made of words, images of pixels, audio with sampled and digitized audio signal, video with image frames in motion • How do we represent a piece of text in the computer? • Let’s do a simple exercise: What are the thoughts, emotions that cross your mind when you hear the following words? • Kalam • Brilliant • Pleasant • Destruction • Perfume • Code • Test • Run • Signal • Words can be used in different contexts and the context is key to interpreting the meaning of the word