SlideShare una empresa de Scribd logo
1 de 29
Descargar para leer sin conexión
Natural Language Processing
                                   Using Python




                                    Presented by:-
                                    Sumit Kumar Raj
                                    1DS09IS082

ISE,DSCE-2013
Table of Contents




        •
          Introduction
        •
          History
        •
          Methods in NLP
        •
          Natural Language Toolkit
        •
          Sample Codes
        •
          Feeling Lonely ?
        •
          Building a Spam Filter
        •
          Applications
        •
          References


ISE,DSCE-2013                        1
What is Natural Language Processing ?




    •Computer     aided text analysis of human language.

    •The    goal is to enable machines to understand human
          language and extract meaning from text.

    •It   is a field of study which falls under the category of
          machine learning and more specifically computational
          linguistics.




ISE,DSCE-2013                                                     2
History


  •
      1948- 1st NLP application
         – dictionary look-up system
         – developed at Birkbeck College, London

  •
  l   1949- American interest
         –WWII code breaker Warren Weaver
         – He viewed German as English in code.

  •
      1966- Over-promised under-delivered
         – Machine Translation worked only word by word
       l
         – NLP brought the first hostility of research funding
       l
         – NLP gave AI a bad name before AI had a name.
ISE,DSCE-2013                                                    3
Natural language processing is heavily used throughout all web
                         technologies


                           Search engines


    Consumer behavior analysis              Site recommendations



     Sentiment analysis                         Spam filtering



           Automated customer        Knowledge bases and
             support systems            expert systems


ISE,DSCE-2013                                                      4
Context


   Little sister: What’s your name?

   Me: Uhh….Sumit..?

   Sister: Can you spell it?

   Me: yes. S-U-M-I-T…..
ISE,DSCE-2013                         5
Sister: WRONG! It’s spelled “I-
     T”



ISE,DSCE-2013                          6
Ambiguity

   “I shot the man with ice cream.“
   -
    A man with ice cream was shot
   -
    A man had ice cream shot at him




ISE,DSCE-2013                         7
Methods :-

       1) POS Tagging :-

      •In  corpus linguistics, Parts-of-speech tagging also called
          grammatical tagging or word-category disambiguation.
      •It is the process of marking up a word in a text corres-
          ponding to a particular POS.
      •POS tagging is harder than just having a list of words
          and their parts of speech.
      •Consider the example:
             l
               The sailor dogs the barmaid.



ISE,DSCE-2013                                                        8
2) Parsing :-


  •In
    context of NLP, parsing may be defined as the process of
   assigning structural descriptions to sequences of words in
   a natural language.
  Applications of parsing include
     simple phrase finding, eg. for proper name recognition
     Full semantic analysis of text, e.g. information extraction or
                                         machine translation




ISE,DSCE-2013                                                    9
3) Speech Recognition:-



  •
    It is concerned with the mapping a continuous speech signal
  into a sequence of recognized words.
  •
    Problem is variation in pronunciation, homonyms.
  •
    In sentence “the boy eats”, a bi-gram model sufficient to
        model the relationship b/w boy and eats.
          “The boy on the hill by the lake in our town…eats”
  •
    Bi-gram and Trigram have proven extremely effective in
        obvious dependencies.




ISE,DSCE-2013                                                 10
4) Machine Translation:-



 •
   It involves translating text from one NL to another.
 •
   Approaches:-
        -simple word substitution,with some changes in ordering to
         account for grammatical differences
        -translate the source language into underlying meaning
         representation or interlingua




ISE,DSCE-2013                                                    11
5) Stemming:-




  •
      In linguistic morphology and information retrieval, stemming is
            the process for reducing inflected words to their stem.
    •
      The stem need not be identical to the morphological root of the
                                    word.
  •
    Many search engines treat words with same stem as synonyms
          as a kind of query broadening, a process called conflation.




ISE,DSCE-2013                                                     12
Natural Language Toolkit

    •
      NLTK is a leading platform for building Python program to
    work with human language data.
    •
      Provides a suite of text processing libraries for
      classification, tokenization, stemming, tagging, parsing,
      and semantic reasoning.

    •
      Currently only available for Python 2.5 – 2.6
    http://www.nltk.org/download
    •
      `easy_install nltk
    •
      Prerequisites
       –
         NumPy
       –
         SciPy

ISE,DSCE-2013                                                     13
Let’s dive into some code!




ISE,DSCE-2013                       14
Part of Speech Tagging

from nltk import pos_tag,word_tokenize

sentence1 = 'this is a demo that will show you how
to detects parts of speech with little effort
using NLTK!'

tokenized_sent = word_tokenize(sentence1)
print pos_tag(tokenized_sent)


[('this', 'DT'), ('is', 'VBZ'), ('a', 'DT'), ('demo', 'NN'), ('that', 'WDT'),
('will', 'MD'), ('show', 'VB'), ('you', 'PRP'), ('how', 'WRB'), ('to', 'TO'),
('detects', 'NNS'), ('parts', 'NNS'), ('of', 'IN'), ('speech', 'NN'), ('with',
'IN'), ('little', 'JJ'), ('effort', 'NN'), ('using', 'VBG'), ('NLTK', 'NNP'),('!',
'.')]
ISE,DSCE-2013                                                                  15
Fun things to Try




ISE,DSCE-2013                       16
Feeling lonely?

  Eliza is there to talk to you all day! What human could ever do that
  for you??
    from nltk.chat import eliza
    eliza.eliza_chat()
    ……starts the chatbot
   Therapist
   ---------
   Talk to the program by typing in plain English, using normal upper-
   and lower-case letters and punctuation. Enter "quit" when done.
   ============================================================
   ============
   Hello. How are you feeling today?


ISE,DSCE-2013                                                            17
Let’s build something even
    cooler




ISE,DSCE-2013                    18
Lets write a Spam filter!

   A program that analyzes legitimate emails “Ham” as well as
   “Spam” and learns the features that are associated with
   each.

   Once trained, we should be able to run this program on
   incoming mail and have it reliably label each one with the
   appropriate category.




ISE,DSCE-2013                                                   19
“Spambot.py” (continued)



  1.   Extract one of the archives from the site into your working directory.

  2.   Create a python script, lets call it “spambot.py”.

   Your working directory should contain the “spambot” script and the
  3.

  folders “spam” and “ham”.


from nltk import word_tokenize,
WordNetLemmatizer,NaiveBayesClassifier
,classify,MaxentClassifier

from nltk.corpus import stopwords
import random
ISE,DSCE-2013                                                                   20
“Spambot.py” (continued)

label each item with the appropriate label and store them as a list of tuples


mixedemails = ([(email,'spam') for email in spamtexts]
mixedemails += [(email,'ham') for email in hamtexts])

From this list of random but labeled emails, we will defined a “feature
extractor” which outputs a feature set that our program can use to statistically
compare spam and ham.



random.shuffle(mixedemails)
                                  lets give them a nice shuffle




ISE,DSCE-2013                                                                   21
“Spambot.py” (continued)


def email_features(sent):
    features = {}
    wordtokens = [wordlemmatizer.lemmatize(word.lower()) for
word in word_tokenize(sent)]         Normalize words
    for word in wordtokens:
         if word not in commonwords:
              features[word] = True
    return features
                     If the word is not a stop-word then lets
                     consider it a “feature”




featuresets = [(email_features(n), g) for (n,g) in mixedemails]

ISE,DSCE-2013
“Spambot.py” (continued)



While True:
   featset = email_features(raw_input("Enter text to classify: "))
   print classifier.classify(featset)



We can now directly input new email and have it classified as either Spam or
Ham




ISE,DSCE-2013                                                              23
Applications :-



  •
    Conversion from natural language to computer language
      and vice-versa.
  •
    Translation from one human language to another.
  •
    Automatic checking for grammar and writing techniques.
  •
    Spam filtering
  •
    Sentiment Analysis




ISE,DSCE-2013                                                24
Conclusion:-



 NLP takes a very important role in new machine human interfaces. When we look at
 Some of the products based on technologies with NLP we can see that they are very
 advanced but very useful.

 But there are many limitations, For example language we speak is highly ambiguous.
 This makes it very difficult to understand and analyze. Also with so many languages
 spoken all over the world it is very difficult to design a system that is 100% accurate.

 These problems get more complicated when we think of different people speaking the
 same language with different styles.

 Intelligent systems are being experimented right now.
 We will be able to see improved applications of NLP in the near future.


ISE,DSCE-2013                                                                          25
References :-


•
  http://en.wikipedia.org/wiki/Natural_language_processing
•
  An overview of Empirical Natural Language Processing
      by Eric Brill and Raymond J. Mooney
•
  Investigating classification for natural language processing tasks
     by Ben W. Medlock, University of Cambridge
•
  Natural Language Processing and Machine Learning using Python
     by Shankar Ambady.
•
  http://www.slideshare.net
•
  http://www.doc.ic.ac.uk/~nd/surprise_97/journal/vol1/hks/index.html
l
  http://googlesystem.blogspot.in/2012/10/google-improves-results-for-natural/
    Codes from :https://github.com/shanbady/NLTK-Boston-Python-Meetup




ISE,DSCE-2013                                                               26
Any Questions ???




ISE,DSCE-2013                       27
Thank You...

                Reach me @:
                facebook.com/sumit12dec

                sumit786raj@gmail.com

                9590 285 524

ISE,DSCE-2013

Más contenido relacionado

La actualidad más candente

Natural language processing (NLP) introduction
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introductionRobert Lujo
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language ProcessingPranav Gupta
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingRishikese MR
 
Natural Language Processing in AI
Natural Language Processing in AINatural Language Processing in AI
Natural Language Processing in AISaurav Shrestha
 
Natural Language Processing
Natural Language Processing Natural Language Processing
Natural Language Processing Adarsh Saxena
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDevashish Shanker
 
Natural language processing PPT presentation
Natural language processing PPT presentationNatural language processing PPT presentation
Natural language processing PPT presentationSai Mohith
 
Parts of Speect Tagging
Parts of Speect TaggingParts of Speect Tagging
Parts of Speect Taggingtheyaseen51
 
Natural Language Processing (NLP) - Introduction
Natural Language Processing (NLP) - IntroductionNatural Language Processing (NLP) - Introduction
Natural Language Processing (NLP) - IntroductionAritra Mukherjee
 
Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Alia Hamwi
 
Natural language processing (NLP)
Natural language processing (NLP) Natural language processing (NLP)
Natural language processing (NLP) ASWINKP11
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingToine Bogers
 
Natural language processing
Natural language processingNatural language processing
Natural language processingYogendra Tamang
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingVeenaSKumar2
 

La actualidad más candente (20)

NLP
NLPNLP
NLP
 
Natural language processing (NLP) introduction
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introduction
 
NLP_KASHK:Finite-State Automata
NLP_KASHK:Finite-State AutomataNLP_KASHK:Finite-State Automata
NLP_KASHK:Finite-State Automata
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Natural Language Processing in AI
Natural Language Processing in AINatural Language Processing in AI
Natural Language Processing in AI
 
Nlp
NlpNlp
Nlp
 
NLP_KASHK:N-Grams
NLP_KASHK:N-GramsNLP_KASHK:N-Grams
NLP_KASHK:N-Grams
 
Natural Language Processing
Natural Language Processing Natural Language Processing
Natural Language Processing
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
 
Natural language processing PPT presentation
Natural language processing PPT presentationNatural language processing PPT presentation
Natural language processing PPT presentation
 
Parts of Speect Tagging
Parts of Speect TaggingParts of Speect Tagging
Parts of Speect Tagging
 
Natural Language Processing (NLP) - Introduction
Natural Language Processing (NLP) - IntroductionNatural Language Processing (NLP) - Introduction
Natural Language Processing (NLP) - Introduction
 
Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)
 
Natural language processing (NLP)
Natural language processing (NLP) Natural language processing (NLP)
Natural language processing (NLP)
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Information Extraction
Information ExtractionInformation Extraction
Information Extraction
 
Introduction to NLTK
Introduction to NLTKIntroduction to NLTK
Introduction to NLTK
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 

Destacado

Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...Jimmy Lai
 
Statistical Machine Learning for Text Classification with scikit-learn and NLTK
Statistical Machine Learning for Text Classification with scikit-learn and NLTKStatistical Machine Learning for Text Classification with scikit-learn and NLTK
Statistical Machine Learning for Text Classification with scikit-learn and NLTKOlivier Grisel
 
Real-time Natural Language Processing for Crowdsourced Road Traffic Alerts
Real-time Natural Language Processing for Crowdsourced Road Traffic AlertsReal-time Natural Language Processing for Crowdsourced Road Traffic Alerts
Real-time Natural Language Processing for Crowdsourced Road Traffic Alertscdathuraliya
 
Hands on Session on Python
Hands on Session on PythonHands on Session on Python
Hands on Session on PythonSumit Raj
 
Thinking about nlp
Thinking about nlpThinking about nlp
Thinking about nlpPan Xiaotong
 
Summarization and opinion detection of product reviews (1)
Summarization and opinion detection of product reviews (1)Summarization and opinion detection of product reviews (1)
Summarization and opinion detection of product reviews (1)Lokesh Mittal
 
Deep learning for text analytics
Deep learning for text analyticsDeep learning for text analytics
Deep learning for text analyticsErik Tromp
 
NLP@Work Conference: email persuasion
NLP@Work Conference: email persuasionNLP@Work Conference: email persuasion
NLP@Work Conference: email persuasionevolutionpd
 
Rahul Naik-Resume
Rahul Naik-ResumeRahul Naik-Resume
Rahul Naik-ResumeRahul Naik
 
Using Deep Learning And NLP To Predict Performance From Resumes
Using Deep Learning And NLP To Predict Performance From ResumesUsing Deep Learning And NLP To Predict Performance From Resumes
Using Deep Learning And NLP To Predict Performance From ResumesBenjamin Taylor
 
AI Reality: Where are we now? Data for Good? - Bill Boorman
AI Reality: Where are we now? Data for Good? - Bill  BoormanAI Reality: Where are we now? Data for Good? - Bill  Boorman
AI Reality: Where are we now? Data for Good? - Bill BoormanTextkernel
 
Practical Deep Learning for NLP
Practical Deep Learning for NLP Practical Deep Learning for NLP
Practical Deep Learning for NLP Textkernel
 

Destacado (20)

Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
 
Statistical Machine Learning for Text Classification with scikit-learn and NLTK
Statistical Machine Learning for Text Classification with scikit-learn and NLTKStatistical Machine Learning for Text Classification with scikit-learn and NLTK
Statistical Machine Learning for Text Classification with scikit-learn and NLTK
 
NLTK in 20 minutes
NLTK in 20 minutesNLTK in 20 minutes
NLTK in 20 minutes
 
Real-time Natural Language Processing for Crowdsourced Road Traffic Alerts
Real-time Natural Language Processing for Crowdsourced Road Traffic AlertsReal-time Natural Language Processing for Crowdsourced Road Traffic Alerts
Real-time Natural Language Processing for Crowdsourced Road Traffic Alerts
 
Hands on Session on Python
Hands on Session on PythonHands on Session on Python
Hands on Session on Python
 
Thinking about nlp
Thinking about nlpThinking about nlp
Thinking about nlp
 
NLTK
NLTKNLTK
NLTK
 
Summarization and opinion detection of product reviews (1)
Summarization and opinion detection of product reviews (1)Summarization and opinion detection of product reviews (1)
Summarization and opinion detection of product reviews (1)
 
resume
resumeresume
resume
 
DeepeshRehi
DeepeshRehiDeepeshRehi
DeepeshRehi
 
Deep learning for text analytics
Deep learning for text analyticsDeep learning for text analytics
Deep learning for text analytics
 
NLP
NLPNLP
NLP
 
Nltk
NltkNltk
Nltk
 
NLP@Work Conference: email persuasion
NLP@Work Conference: email persuasionNLP@Work Conference: email persuasion
NLP@Work Conference: email persuasion
 
Rahul Naik-Resume
Rahul Naik-ResumeRahul Naik-Resume
Rahul Naik-Resume
 
Using Deep Learning And NLP To Predict Performance From Resumes
Using Deep Learning And NLP To Predict Performance From ResumesUsing Deep Learning And NLP To Predict Performance From Resumes
Using Deep Learning And NLP To Predict Performance From Resumes
 
AI Reality: Where are we now? Data for Good? - Bill Boorman
AI Reality: Where are we now? Data for Good? - Bill  BoormanAI Reality: Where are we now? Data for Good? - Bill  Boorman
AI Reality: Where are we now? Data for Good? - Bill Boorman
 
Python NLTK
Python NLTKPython NLTK
Python NLTK
 
Practical Deep Learning for NLP
Practical Deep Learning for NLP Practical Deep Learning for NLP
Practical Deep Learning for NLP
 

Similar a Natural language processing (Python)

Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)Abdullah al Mamun
 
A Strong Object Recognition Using Lbp, Ltp And Rlbp
A Strong Object Recognition Using Lbp, Ltp And RlbpA Strong Object Recognition Using Lbp, Ltp And Rlbp
A Strong Object Recognition Using Lbp, Ltp And RlbpRikki Wright
 
Tutorial: Text Analytics for Security
Tutorial: Text Analytics for SecurityTutorial: Text Analytics for Security
Tutorial: Text Analytics for SecurityTao Xie
 
An Overview Of Natural Language Processing
An Overview Of Natural Language ProcessingAn Overview Of Natural Language Processing
An Overview Of Natural Language ProcessingScott Faria
 
Prolog (present)
Prolog (present) Prolog (present)
Prolog (present) Melody Joey
 
HotSoS16 Tutorial "Text Analytics for Security" by Tao Xie and William Enck
HotSoS16 Tutorial "Text Analytics for Security" by Tao Xie and William EnckHotSoS16 Tutorial "Text Analytics for Security" by Tao Xie and William Enck
HotSoS16 Tutorial "Text Analytics for Security" by Tao Xie and William EnckTao Xie
 
Module 8: Natural language processing Pt 1
Module 8:  Natural language processing Pt 1Module 8:  Natural language processing Pt 1
Module 8: Natural language processing Pt 1Sara Hooker
 
Natural Language Processing (NLP).pptx
Natural Language Processing (NLP).pptxNatural Language Processing (NLP).pptx
Natural Language Processing (NLP).pptxSHIBDASDUTTA
 
EXPLORING NATURAL LANGUAGE PROCESSING (1).pptx
EXPLORING NATURAL LANGUAGE PROCESSING (1).pptxEXPLORING NATURAL LANGUAGE PROCESSING (1).pptx
EXPLORING NATURAL LANGUAGE PROCESSING (1).pptxAtulKumarUpadhyay4
 
IRJET - Analysis of Paraphrase Detection using NLP Techniques
IRJET - Analysis of Paraphrase Detection using NLP TechniquesIRJET - Analysis of Paraphrase Detection using NLP Techniques
IRJET - Analysis of Paraphrase Detection using NLP TechniquesIRJET Journal
 
Beyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLPBeyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLPMENGSAYLOEM1
 
IRJET- Querying Database using Natural Language Interface
IRJET-  	  Querying Database using Natural Language InterfaceIRJET-  	  Querying Database using Natural Language Interface
IRJET- Querying Database using Natural Language InterfaceIRJET Journal
 
Spell checker for Kannada OCR
Spell checker for Kannada OCRSpell checker for Kannada OCR
Spell checker for Kannada OCRdbpublications
 
IRJET - Storytelling App for Children with Hearing Impairment using Natur...
IRJET -  	  Storytelling App for Children with Hearing Impairment using Natur...IRJET -  	  Storytelling App for Children with Hearing Impairment using Natur...
IRJET - Storytelling App for Children with Hearing Impairment using Natur...IRJET Journal
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)Kuppusamy P
 
Wed 1430 kartik_subramanian_color
Wed 1430 kartik_subramanian_colorWed 1430 kartik_subramanian_color
Wed 1430 kartik_subramanian_colorDATAVERSITY
 

Similar a Natural language processing (Python) (20)

Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 
A Strong Object Recognition Using Lbp, Ltp And Rlbp
A Strong Object Recognition Using Lbp, Ltp And RlbpA Strong Object Recognition Using Lbp, Ltp And Rlbp
A Strong Object Recognition Using Lbp, Ltp And Rlbp
 
Tutorial: Text Analytics for Security
Tutorial: Text Analytics for SecurityTutorial: Text Analytics for Security
Tutorial: Text Analytics for Security
 
An Overview Of Natural Language Processing
An Overview Of Natural Language ProcessingAn Overview Of Natural Language Processing
An Overview Of Natural Language Processing
 
Prolog (present)
Prolog (present) Prolog (present)
Prolog (present)
 
HotSoS16 Tutorial "Text Analytics for Security" by Tao Xie and William Enck
HotSoS16 Tutorial "Text Analytics for Security" by Tao Xie and William EnckHotSoS16 Tutorial "Text Analytics for Security" by Tao Xie and William Enck
HotSoS16 Tutorial "Text Analytics for Security" by Tao Xie and William Enck
 
Module 8: Natural language processing Pt 1
Module 8:  Natural language processing Pt 1Module 8:  Natural language processing Pt 1
Module 8: Natural language processing Pt 1
 
Natural Language Processing (NLP).pptx
Natural Language Processing (NLP).pptxNatural Language Processing (NLP).pptx
Natural Language Processing (NLP).pptx
 
EXPLORING NATURAL LANGUAGE PROCESSING (1).pptx
EXPLORING NATURAL LANGUAGE PROCESSING (1).pptxEXPLORING NATURAL LANGUAGE PROCESSING (1).pptx
EXPLORING NATURAL LANGUAGE PROCESSING (1).pptx
 
IRJET - Analysis of Paraphrase Detection using NLP Techniques
IRJET - Analysis of Paraphrase Detection using NLP TechniquesIRJET - Analysis of Paraphrase Detection using NLP Techniques
IRJET - Analysis of Paraphrase Detection using NLP Techniques
 
Antlr Conexaojava
Antlr ConexaojavaAntlr Conexaojava
Antlr Conexaojava
 
Beyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLPBeyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLP
 
IRJET- Querying Database using Natural Language Interface
IRJET-  	  Querying Database using Natural Language InterfaceIRJET-  	  Querying Database using Natural Language Interface
IRJET- Querying Database using Natural Language Interface
 
Spell checker for Kannada OCR
Spell checker for Kannada OCRSpell checker for Kannada OCR
Spell checker for Kannada OCR
 
IRJET - Storytelling App for Children with Hearing Impairment using Natur...
IRJET -  	  Storytelling App for Children with Hearing Impairment using Natur...IRJET -  	  Storytelling App for Children with Hearing Impairment using Natur...
IRJET - Storytelling App for Children with Hearing Impairment using Natur...
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)
 
Wed 1430 kartik_subramanian_color
Wed 1430 kartik_subramanian_colorWed 1430 kartik_subramanian_color
Wed 1430 kartik_subramanian_color
 
NLP PPT.pptx
NLP PPT.pptxNLP PPT.pptx
NLP PPT.pptx
 
NLP.pptx
NLP.pptxNLP.pptx
NLP.pptx
 
The State of #NLProc
The State of #NLProcThe State of #NLProc
The State of #NLProc
 

Último

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 

Último (20)

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 

Natural language processing (Python)

  • 1. Natural Language Processing Using Python Presented by:- Sumit Kumar Raj 1DS09IS082 ISE,DSCE-2013
  • 2. Table of Contents • Introduction • History • Methods in NLP • Natural Language Toolkit • Sample Codes • Feeling Lonely ? • Building a Spam Filter • Applications • References ISE,DSCE-2013 1
  • 3. What is Natural Language Processing ? •Computer aided text analysis of human language. •The goal is to enable machines to understand human language and extract meaning from text. •It is a field of study which falls under the category of machine learning and more specifically computational linguistics. ISE,DSCE-2013 2
  • 4. History • 1948- 1st NLP application – dictionary look-up system – developed at Birkbeck College, London • l 1949- American interest –WWII code breaker Warren Weaver – He viewed German as English in code. • 1966- Over-promised under-delivered – Machine Translation worked only word by word l – NLP brought the first hostility of research funding l – NLP gave AI a bad name before AI had a name. ISE,DSCE-2013 3
  • 5. Natural language processing is heavily used throughout all web technologies Search engines Consumer behavior analysis Site recommendations Sentiment analysis Spam filtering Automated customer Knowledge bases and support systems expert systems ISE,DSCE-2013 4
  • 6. Context Little sister: What’s your name? Me: Uhh….Sumit..? Sister: Can you spell it? Me: yes. S-U-M-I-T….. ISE,DSCE-2013 5
  • 7. Sister: WRONG! It’s spelled “I- T” ISE,DSCE-2013 6
  • 8. Ambiguity “I shot the man with ice cream.“ - A man with ice cream was shot - A man had ice cream shot at him ISE,DSCE-2013 7
  • 9. Methods :- 1) POS Tagging :- •In corpus linguistics, Parts-of-speech tagging also called grammatical tagging or word-category disambiguation. •It is the process of marking up a word in a text corres- ponding to a particular POS. •POS tagging is harder than just having a list of words and their parts of speech. •Consider the example: l The sailor dogs the barmaid. ISE,DSCE-2013 8
  • 10. 2) Parsing :- •In context of NLP, parsing may be defined as the process of assigning structural descriptions to sequences of words in a natural language. Applications of parsing include simple phrase finding, eg. for proper name recognition Full semantic analysis of text, e.g. information extraction or machine translation ISE,DSCE-2013 9
  • 11. 3) Speech Recognition:- • It is concerned with the mapping a continuous speech signal into a sequence of recognized words. • Problem is variation in pronunciation, homonyms. • In sentence “the boy eats”, a bi-gram model sufficient to model the relationship b/w boy and eats. “The boy on the hill by the lake in our town…eats” • Bi-gram and Trigram have proven extremely effective in obvious dependencies. ISE,DSCE-2013 10
  • 12. 4) Machine Translation:- • It involves translating text from one NL to another. • Approaches:- -simple word substitution,with some changes in ordering to account for grammatical differences -translate the source language into underlying meaning representation or interlingua ISE,DSCE-2013 11
  • 13. 5) Stemming:- • In linguistic morphology and information retrieval, stemming is the process for reducing inflected words to their stem. • The stem need not be identical to the morphological root of the word. • Many search engines treat words with same stem as synonyms as a kind of query broadening, a process called conflation. ISE,DSCE-2013 12
  • 14. Natural Language Toolkit • NLTK is a leading platform for building Python program to work with human language data. • Provides a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning. • Currently only available for Python 2.5 – 2.6 http://www.nltk.org/download • `easy_install nltk • Prerequisites – NumPy – SciPy ISE,DSCE-2013 13
  • 15. Let’s dive into some code! ISE,DSCE-2013 14
  • 16. Part of Speech Tagging from nltk import pos_tag,word_tokenize sentence1 = 'this is a demo that will show you how to detects parts of speech with little effort using NLTK!' tokenized_sent = word_tokenize(sentence1) print pos_tag(tokenized_sent) [('this', 'DT'), ('is', 'VBZ'), ('a', 'DT'), ('demo', 'NN'), ('that', 'WDT'), ('will', 'MD'), ('show', 'VB'), ('you', 'PRP'), ('how', 'WRB'), ('to', 'TO'), ('detects', 'NNS'), ('parts', 'NNS'), ('of', 'IN'), ('speech', 'NN'), ('with', 'IN'), ('little', 'JJ'), ('effort', 'NN'), ('using', 'VBG'), ('NLTK', 'NNP'),('!', '.')] ISE,DSCE-2013 15
  • 17. Fun things to Try ISE,DSCE-2013 16
  • 18. Feeling lonely? Eliza is there to talk to you all day! What human could ever do that for you?? from nltk.chat import eliza eliza.eliza_chat() ……starts the chatbot Therapist --------- Talk to the program by typing in plain English, using normal upper- and lower-case letters and punctuation. Enter "quit" when done. ============================================================ ============ Hello. How are you feeling today? ISE,DSCE-2013 17
  • 19. Let’s build something even cooler ISE,DSCE-2013 18
  • 20. Lets write a Spam filter! A program that analyzes legitimate emails “Ham” as well as “Spam” and learns the features that are associated with each. Once trained, we should be able to run this program on incoming mail and have it reliably label each one with the appropriate category. ISE,DSCE-2013 19
  • 21. “Spambot.py” (continued) 1. Extract one of the archives from the site into your working directory. 2. Create a python script, lets call it “spambot.py”. Your working directory should contain the “spambot” script and the 3. folders “spam” and “ham”. from nltk import word_tokenize, WordNetLemmatizer,NaiveBayesClassifier ,classify,MaxentClassifier from nltk.corpus import stopwords import random ISE,DSCE-2013 20
  • 22. “Spambot.py” (continued) label each item with the appropriate label and store them as a list of tuples mixedemails = ([(email,'spam') for email in spamtexts] mixedemails += [(email,'ham') for email in hamtexts]) From this list of random but labeled emails, we will defined a “feature extractor” which outputs a feature set that our program can use to statistically compare spam and ham. random.shuffle(mixedemails) lets give them a nice shuffle ISE,DSCE-2013 21
  • 23. “Spambot.py” (continued) def email_features(sent): features = {} wordtokens = [wordlemmatizer.lemmatize(word.lower()) for word in word_tokenize(sent)] Normalize words for word in wordtokens: if word not in commonwords: features[word] = True return features If the word is not a stop-word then lets consider it a “feature” featuresets = [(email_features(n), g) for (n,g) in mixedemails] ISE,DSCE-2013
  • 24. “Spambot.py” (continued) While True: featset = email_features(raw_input("Enter text to classify: ")) print classifier.classify(featset) We can now directly input new email and have it classified as either Spam or Ham ISE,DSCE-2013 23
  • 25. Applications :- • Conversion from natural language to computer language and vice-versa. • Translation from one human language to another. • Automatic checking for grammar and writing techniques. • Spam filtering • Sentiment Analysis ISE,DSCE-2013 24
  • 26. Conclusion:- NLP takes a very important role in new machine human interfaces. When we look at Some of the products based on technologies with NLP we can see that they are very advanced but very useful. But there are many limitations, For example language we speak is highly ambiguous. This makes it very difficult to understand and analyze. Also with so many languages spoken all over the world it is very difficult to design a system that is 100% accurate. These problems get more complicated when we think of different people speaking the same language with different styles. Intelligent systems are being experimented right now. We will be able to see improved applications of NLP in the near future. ISE,DSCE-2013 25
  • 27. References :- • http://en.wikipedia.org/wiki/Natural_language_processing • An overview of Empirical Natural Language Processing by Eric Brill and Raymond J. Mooney • Investigating classification for natural language processing tasks by Ben W. Medlock, University of Cambridge • Natural Language Processing and Machine Learning using Python by Shankar Ambady. • http://www.slideshare.net • http://www.doc.ic.ac.uk/~nd/surprise_97/journal/vol1/hks/index.html l http://googlesystem.blogspot.in/2012/10/google-improves-results-for-natural/ Codes from :https://github.com/shanbady/NLTK-Boston-Python-Meetup ISE,DSCE-2013 26
  • 29. Thank You... Reach me @: facebook.com/sumit12dec sumit786raj@gmail.com 9590 285 524 ISE,DSCE-2013