SlideShare una empresa de Scribd logo
1 de 65
Descargar para leer sin conexión
Natural Language Processing
Natural Language Processing
Natural Language Processing
● Natural-language processing (NLP) is an area of computer science and
artificial intelligence concerned with the interactions between
computers and human languages
● In particular how to program computers to fruitfully process large
amounts of natural language data
Natural Language Processing
Natural Language Processing
In 1950, Alan Turing published an article titled "Computing Machinery and
Intelligence" which proposed what is now called the Turing test as a
criterion of intelligence.
Natural Language Processing
Basic Structure of a NLP application( chatbot considered below )
Natural Language Processing
Natural Language Processing
Natural Language Processing
Knowledge Base – It contains the database
of information that is used to equip chatbots
with the information needed to respond to
queries of customers request.
Data Store – It contains interaction history
of chatbot with users.
Natural Language Processing
Natural Language Processing
NLP Layer – It translates
users queries (free form)
into information that can
be used for appropriate
responses.
Application Layer – It is
the application interface
that is used to interact with
the user
Natural Language Processing
Natural Language Processing - Applications
● Speech Recognition - The task of speech recognition is to map an
acoustic signal containing a spoken natural language utterance into the
corresponding sequence of words intended by the speaker.
Natural Language Processing
Natural Language Processing - Applications
● Text Classification - Given an example of text, predict a predefined
class label
Natural Language Processing
Natural Language Processing - Applications
● Caption Generation - It is the problem of describing the contents of
an image.
Natural Language Processing
● Machine Translation - Machine translation is the problem of
converting a source text in one language to another language.
Natural Language Processing - Applications
Natural Language Processing
Natural Language Processing - Applications
● Question Answering - It is the problem where given a
subject, such as a document of text, answer a specific question
about the subject.
Natural Language Processing
The most popular Natural Language Processing Tools are:
● Stanford's Core NLP Suite
● Natural Language Toolkit
● Apache Lucene and Solr
● Apache OpenNLP
● Text Blob which is a wrapper over the NLTK library
Natural Language Processing - Tools
Natural Language Processing
Natural Language Processing - Tools
Let us use the TextBlob library of Python to Build a program that
makes a quiz out of a provided text.
It is basically a usage of NER - Named-entity recognition
Natural Language Processing
Natural Language Processing - TextBlob
Let us begin by importing TextBlob and then selecting a text.
>>> from textblob import TextBlob
Now you can either load a text from a file as
>>> f = open(‘filename.txt’)
>>> text = f.read()
Or assign the text to a variable as
>>> text = “World War II (often abbreviated to
WWII or WW2), also known as the Second World War,
was a ….. “
Natural Language Processing
Natural Language Processing - TextBlob
Next we’ll convert our text to a TextBlob object.
>>> text = TextBlob(text)
Now we are ready to apply different methods on our text.
Natural Language Processing
Natural Language Processing - TextBlob
Let us understand a few things about the TextBlob api -
text.sentences - gives the sentences in a text
sentences.tags - gives the tags for each of the word in sentence.
It returns a list of tuples with the word being the first element of
the tuple and the tag being the second.
Natural Language Processing
Natural Language Processing - TextBlob
Now to generate our quiz
● We will extract each sentence
● We will replace all the nouns and proper nouns
with a blank from each sentences.
● To make it easy we will remove only after the
fourth word in the sentence.
Natural Language Processing
Natural Language Processing - TextBlob
>>> ww2b = TextBlob(ww2)
>>> for sentence in ww2b.sentences:
new_sentence = sentence
for index, tag in enumerate(sentence.tags):
if tag[1] in ('NN', 'NNP') and index > 3:
new_sentence =
new_sentence.replace(tag[0], "____")
print(new_sentence)
print("n==================n")
Run it on Notebook
Natural Language Processing
Natural Language Processing - Tools
Let’s write a program to find Related Posts using Python’s
Scikit Learn
Natural Language Processing
Natural Language Processing - Tools
We are given the task of finding the most related posts from a bunch
of posts.
The tricky thing that we have to tackle first is how to turn
text into something on which we can calculate similarity ??
Natural Language Processing
Natural Language Processing - Tools
How to do it ??
Bag of Words Approach -
It totally ignores the order of words and simply uses word counts as
their basis.
In this model, a text, such as a sentence or a document is represented
as the bag (multiset) of its words, disregarding grammar and even
word order but keeping multiplicity.
Natural Language Processing
Natural Language Processing - Tools
Vectorization
● For each word in the post, its occurrence is counted and noted in a
vector.
● This step is also called vectorization.
● The vector is typically huge as it contains as many elements as
words occur in the whole dataset.
Natural Language Processing
Natural Language Processing - Tools
Vectorization - Example
For the two statements - "How to format my hard disk" and " Hard
disk format problems " the vectors are shown below
aka Term Document Matrix
Natural Language Processing
Natural Language Processing - Tools
Vectorization - Using Scikit learn
>>> from sklearn.feature_extraction.text import
CountVectorizer
>>> vectorizer = CountVectorizer(min_df=1)
The min_df parameter determines how CountVectorizer treats seldom words
● If it is set to an integer, all words occurring less than that value will be
dropped
● If it is a fraction, all words that occur in less than that fraction of the overall
dataset will be dropped.
Natural Language Processing
Natural Language Processing - Tools
Vectorization - Using Scikit learn
>>> content = ["How to format my hard disk", " Hard
disk format problems "]
>>> X = vectorizer.fit_transform(content)
>>> vectorizer.get_feature_names()
['disk', 'format', 'hard', 'how', 'my', 'problems',
'to']
Run it on Notebook
Natural Language Processing
Natural Language Processing - Tools
Vectorization - Using Scikit learn
>>> print(X.toarray().transpose())
[[1 1]
[1 1]
[1 1]
[1 0]
[1 0]
[0 1]
[1 0]]
This means that the first sentence contains all the
words except "problems",
while the second contains all but "how", "my", and
"to".
Natural Language Processing
Natural Language Processing - Tools
Finding Distance
We can measure distance between two vectors using the Euclidean
Distance.
But first we will normalize each vectors.
The scipy.linalg module provides a function called norm.
The norm() function calculates the Euclidean norm (shortest
distance)
Natural Language Processing
Natural Language Processing - Tools
Finding Distance
>>> def dist_norm(v1, v2):
v1_normalized = v1/sp.linalg.norm(v1.toarray())
v2_normalized = v2/sp.linalg.norm(v2.toarray())
delta = v1_normalized - v2_normalized
return sp.linalg.norm(delta.toarray())
Natural Language Processing
Natural Language Processing - Tools
Applying Everything we learnt on a toy dataset
Now we will consider 5 toy posts and find the similarity with a given
post.
>>> post1 = "This is a toy post about machine learning.
Actually, it contains not much interesting stuff."
>>> post2 = "Imaging databases can get huge."
>>> post3 = "Most imaging databases save images
permanently."
>>> post4 = "Imaging databases store images."
>>>post5 = "Imaging databases store images. Imaging
databases store images. Imaging databases store images."
Natural Language Processing
Natural Language Processing - Tools
Applying Everything we learnt on a toy dataset
Now we will build our vectorizer
>>> posts = [post1, post2, post3, post4, post5]
>>> X_train = vectorizer.fit_transform(posts)
>>> num_samples, num_features = X_train.shape
>>> print("#samples: %d, #features: %d" %
(num_samples, num_features))
#samples: 5, #features: 24
As we provided 5 different posts and there are 24 different words
in them.
Natural Language Processing
Natural Language Processing - Tools
Applying Everything we learnt on a toy dataset
Finally we will iterate through all the vectors of
the posts and find their distance with the new post.
Perform on Notebook
Natural Language Processing
Natural Language Processing
Now let us analyse a collection of text documents using Scikit
Learn
Natural Language Processing
Natural Language Processing
In this section we will see how to:
● Load the file contents and the categories
● Extract feature vectors suitable for machine learning
● Train a linear model to perform categorization
● Use a grid search strategy to find a good configuration of both the
feature extraction components and the classifier
Natural Language Processing
Natural Language Processing
Loading the 20 newsgroups dataset
To load the dataset use the code-
>>> categories = ['alt.atheism',
'soc.religion.christian', 'comp.graphics', 'sci.med']
>>> from sklearn.datasets import fetch_20newsgroups
>>> twenty_train = fetch_20newsgroups(subset='train',
categories=categories, shuffle=True, random_state=42)
Natural Language Processing
Natural Language Processing
Loading the 20 newsgroups dataset
To load the dataset use the code-
>>> categories = ['alt.atheism',
'soc.religion.christian', 'comp.graphics', 'sci.med']
>>> from sklearn.datasets import fetch_20newsgroups
>>> twenty_train = fetch_20newsgroups(subset='train',
categories=categories, shuffle=True, random_state=42)
Natural Language Processing
Natural Language Processing
Analysing our dataset
The target_names holds the list of the requested category names:
>>> twenty_train.target_names
['alt.atheism', 'comp.graphics', 'sci.med',
'soc.religion.christian']
The files themselves are loaded in memory in the data attribute
>>> len(twenty_train.data)
2257
>>> len(twenty_train.filenames)
2257
Natural Language Processing
Natural Language Processing
Analysing our dataset
Content of the first lines of the first loaded file
>>> print("n".join(twenty_train.data[0].split("n")[:3]))
From: sd345@city.ac.uk (Michael Collier)
Subject: Converting images to HP LaserJet III?
Nntp-Posting-Host: hampton
The category integer id of each sample is stored in the target attribute
>>> twenty_train.target[:10]
array([1, 1, 3, 3, 3, 3, 3, 2, 2, 2])
Natural Language Processing
Natural Language Processing
Now we will apply the bag of word approach
Tokenizing text with scikit-learn
>>> from sklearn.feature_extraction.text import
CountVectorizer
>>> count_vect = CountVectorizer()
>>> X_train_counts =
count_vect.fit_transform(twenty_train.data)
>>> X_train_counts.shape
(2257, 35788)
Natural Language Processing
Natural Language Processing
Occurrence count is a good start but there is an issue -
● longer documents will have higher average count values than shorter
documents
To avoid these potential discrepancies we
● Divide the number of occurrences of each word in a document by the total
number of words in the document: these new features are called tf for
Term Frequencies.
Natural Language Processing
Natural Language Processing
How can we improve tf ?
Downscale weights for words that occur in many documents in the corpus and
are therefore less informative than those that occur only in a smaller portion of
the corpus.
This downscaling is called tf–idf for “Term Frequency times Inverse
Document Frequency”.
Natural Language Processing
Natural Language Processing - Tf-idf
● Tf-idf stands for term frequency-inverse document frequency
● Tf-idf weight is often used in
○ Information retrieval and
○ Text mining
● This weight is a used to evaluate
○ How important a word is to a
○ Document in a collection or corpus
Natural Language Processing
Natural Language Processing - Tf
Term Frequency
● Measures how frequently a term occurs in a document
● It is possible that a term would appear
○ Much more times in long documents than shorter ones
○ This is why we normalize TF
TF(t) = (Number of times term t appears in a document) / (Total
number of terms in the document)
Natural Language Processing
Natural Language Processing - IDF
Inverse Document Frequency
● Measures how important a term is
● In TF, all terms are considered equally important
● How ever some words and stop words appears lot of time
○ But have least importance
● In IDF we weight down frequent terms
○ And scale up rare terms
IDF(t) = log_e(Total number of documents / Number of
documents with term t in it)
Natural Language Processing
Natural Language Processing - Tf-idf
Example
● Consider a document containing 100 words and the word cat appears 3
times
● The term frequency(tf) for cat is
○ (3 / 100) = 0.03
Natural Language Processing
Natural Language Processing - Tf-idf
Example
● Now, assume we have 10 million documents and
○ The word cat appears in 1,000 of these
● The inverse document frequency(idf) is
○ log(10,000,000 / 1,000) = 4
● Tf-idf weight is the product of tf and idf
○ 0.03 * 4 = 0.12
Natural Language Processing
Natural Language Processing
Now let us apply tf-idf to our example
tfidf_transformer = TfidfTransformer()
>>> X_train_tfidf =
tfidf_transformer.fit_transform(X_train_counts)
>>> X_train_tfidf.shape
(2257, 35788)
Natural Language Processing
Natural Language Processing
Training a classifier
We’ll start with a naïve Bayes classifier, which provides a nice baseline
for this task.
>>> from sklearn.naive_bayes import MultinomialNB
>>> clf = MultinomialNB().fit(X_train_tfidf,
twenty_train.target)
The multinomial variant of Naive Bayes is one the most suitable for word
counts tasks.
Natural Language Processing
Natural Language Processing
Now let us make a prediction on a new document
To try to predict the outcome on a new document we need to extract the
features using almost the same feature extracting chain as before
● We will first transform the new document to count vectors
● Then we’ll transform it with the tfidf_transformer.
● Finally we’ll call the predict method of the classifier
Natural Language Processing
Natural Language Processing
>>> docs_new = ['God is love', 'OpenGL on the GPU is fast']
>>> X_new_counts = count_vect.transform(docs_new)
>>> X_new_tfidf = tfidf_transformer.transform(X_new_counts)
>>> predicted = clf.predict(X_new_tfidf)
>>> for doc, category in zip(docs_new, predicted):
... print('%r => %s' % (doc, twenty_train.target_names[category]))
...
'God is love' => soc.religion.christian
'OpenGL on the GPU is fast' => comp.graphics
Run it on Notebook
Natural Language Processing
Natural Language Processing
Now let us combine all the steps in the form of a pipeline
>>> from sklearn.pipeline import Pipeline
>>> text_clf = Pipeline([('vect', CountVectorizer()),
('tfidf', TfidfTransformer()),
('clf', MultinomialNB()),])
We can now train the model with a single command
>>> text_clf.fit(twenty_train.data, twenty_train.target)
Pipeline(...)
Natural Language Processing
Natural Language Processing
Now let us perform performance evaluation on the test set
>>> import numpy as np
>>> twenty_test = fetch_20newsgroups(subset='test',
... categories=categories, shuffle=True,
random_state=42)
>>> docs_test = twenty_test.data
>>> predicted = text_clf.predict(docs_test)
>>> np.mean(predicted == twenty_test.target)
0.834…
I.e., we achieved 83.4% accuracy
Natural Language Processing
Natural Language Processing
Now we’ll use a different classifier and compute the performance
metrics
>>> from sklearn.linear_model import SGDClassifier
>>> text_clf = Pipeline([('vect', CountVectorizer()),
... ('tfidf', TfidfTransformer()),
... ('clf', SGDClassifier(loss='hinge', penalty='l2',
... alpha=1e-3, random_state=42,
... max_iter=5, tol=None)),
... ])
Natural Language Processing
Natural Language Processing
Now we’ll use a different classifier and compute the performance
metrics
>>> text_clf.fit(twenty_train.data, twenty_train.target)
Pipeline(...)
>>> predicted = text_clf.predict(docs_test)
>>> np.mean(predicted == twenty_test.target)
0.912...
Natural Language Processing
Natural Language Processing
Parameter tuning using grid search
Since there are different parameters which we can choose, we’ll apply grid
search to find the best parameters
>>> parameters = {'vect__ngram_range': [(1, 1), (1, 2)],
... 'tfidf__use_idf': (True, False),
... 'clf__alpha': (1e-2, 1e-3),
... }
Here we’ll be applying grid search for the parameters - ngram_range,
use_idf and alpha.
Natural Language Processing
Natural Language Processing
Parameter tuning using grid search
If we have multiple CPU cores at our disposal, we can tell the grid searcher
to try these eight parameter combinations in parallel with the n_jobs
parameter.
>>> gs_clf = GridSearchCV(text_clf, parameters,
n_jobs=-1)
>>> gs_clf = gs_clf.fit(twenty_train.data[:400],
twenty_train.target[:400])
Natural Language Processing
Natural Language Processing
Predicting and finding best score
>>> twenty_train.target_names[gs_clf.predict(['God is
love'])[0]]
'soc.religion.christian'
>>> gs_clf.best_score_
0.900...
Natural Language Processing
Natural Language Processing
Predicting and finding best score
>>> for param_name in sorted(parameters.keys()):
... print("%s: %r" % (param_name,
gs_clf.best_params_[param_name]))
...
clf__alpha: 0.001
tfidf__use_idf: True
vect__ngram_range: (1, 1)
Natural Language Processing
Natural Language Processing - Tools
Overview of Stanford Core NLP
Natural Language Processing
Natural Language Processing - Stanford NLP
● Stanford CoreNLP provides a set of human language technology
tools.
● It can give
○ The base forms of words,
○ Their parts of speech,
○ Whether they are names of companies, people, etc.,
○ Mark up the structure of sentences in terms of phrases and
syntactic dependencies,
○ Indicate which noun phrases refer to the same entities, indicate
sentiment
Natural Language Processing
Natural Language Processing - Stanford NLP
Choose Stanford CoreNLP if you need:
● An integrated NLP toolkit with a broad range of grammatical
analysis tools
● A fast, robust annotator for arbitrary texts, widely used in
production
● A modern, regularly updated package, with the overall highest
quality text analytics
Natural Language Processing
Natural Language Processing - Stanford NLP
Choose Stanford CoreNLP if you need:
● Support for a number of major (human) languages
● Available APIs for most major modern programming languages
● Ability to run as a simple web service
Natural Language Processing
Natural Language Processing - Stanford NLP
Programming languages and operating systems
Stanford CoreNLP is written in Java; recent releases require Java
1.8+.
You can interact with CoreNLP via the command-line or its
web service using languages like Javascript, Python etc.
Natural Language Processing
Natural Language Processing - Stanford NLP
Programming languages and operating systems
You can use Stanford CoreNLP from
● The command-line, via its original Java programmatic API,
● Via the object-oriented simple API,
● Via third party APIs for most major modern programming languages,
Or via a web service.
It works on Linux, macOS, and Windows
Natural Language Processing
Natural Language Processing
More coming up on CloudxLab
● Word2vec - Vector Representations of Words
● Deep Learning - LSTM - Long Short-Term Memory
● GloVe - Global Vectors for Word Representation
● spaCY - Industrial-Strength Natural Language Processing in Python
● Hands-on using Stanford CoreNLP
● List of APIs available for chatbots etc
Thank You
https://discuss.cloudxlab.com
reachus@cloudxlab.com

Más contenido relacionado

La actualidad más candente

Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)Kuppusamy P
 
Natural language processing
Natural language processingNatural language processing
Natural language processingYogendra Tamang
 
natural language processing help at myassignmenthelp.net
natural language processing  help at myassignmenthelp.netnatural language processing  help at myassignmenthelp.net
natural language processing help at myassignmenthelp.netwww.myassignmenthelp.net
 
Natural Language Processing in AI
Natural Language Processing in AINatural Language Processing in AI
Natural Language Processing in AISaurav Shrestha
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingRishikese MR
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingYasir Khan
 
Natural Language Processing (NLP) - Introduction
Natural Language Processing (NLP) - IntroductionNatural Language Processing (NLP) - Introduction
Natural Language Processing (NLP) - IntroductionAritra Mukherjee
 
Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Alia Hamwi
 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsRoelof Pieters
 
Natural Language Processing seminar review
Natural Language Processing seminar review Natural Language Processing seminar review
Natural Language Processing seminar review Jayneel Vora
 
Introduction to natural language processing, history and origin
Introduction to natural language processing, history and originIntroduction to natural language processing, history and origin
Introduction to natural language processing, history and originShubhankar Mohan
 
Introduction to natural language processing
Introduction to natural language processingIntroduction to natural language processing
Introduction to natural language processingMinh Pham
 

La actualidad más candente (20)

Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
natural language processing help at myassignmenthelp.net
natural language processing  help at myassignmenthelp.netnatural language processing  help at myassignmenthelp.net
natural language processing help at myassignmenthelp.net
 
NLP
NLPNLP
NLP
 
Natural Language Processing in AI
Natural Language Processing in AINatural Language Processing in AI
Natural Language Processing in AI
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
NLP
NLPNLP
NLP
 
Word embedding
Word embedding Word embedding
Word embedding
 
Natural Language Processing (NLP) - Introduction
Natural Language Processing (NLP) - IntroductionNatural Language Processing (NLP) - Introduction
Natural Language Processing (NLP) - Introduction
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)
 
Nlp
NlpNlp
Nlp
 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word Embeddings
 
Natural Language Processing seminar review
Natural Language Processing seminar review Natural Language Processing seminar review
Natural Language Processing seminar review
 
Introduction to natural language processing, history and origin
Introduction to natural language processing, history and originIntroduction to natural language processing, history and origin
Introduction to natural language processing, history and origin
 
Introduction to natural language processing
Introduction to natural language processingIntroduction to natural language processing
Introduction to natural language processing
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 

Similar a Natural Language Processing

Natural Language Processing: Comparing NLTK and OpenNLP
Natural Language Processing: Comparing NLTK and OpenNLPNatural Language Processing: Comparing NLTK and OpenNLP
Natural Language Processing: Comparing NLTK and OpenNLPCodeOps Technologies LLP
 
Open nlp presentationss
Open nlp presentationssOpen nlp presentationss
Open nlp presentationssChandan Deb
 
Designing A Project Using Java Programming
Designing A Project Using Java ProgrammingDesigning A Project Using Java Programming
Designing A Project Using Java ProgrammingKaty Allen
 
Introduction to python
Introduction to pythonIntroduction to python
Introduction to pythonmckennadglyn
 
Text classification-php-v4
Text classification-php-v4Text classification-php-v4
Text classification-php-v4Glenn De Backer
 
Nltk - Boston Text Analytics
Nltk - Boston Text AnalyticsNltk - Boston Text Analytics
Nltk - Boston Text Analyticsshanbady
 
Mining the social web ch8 - 1
Mining the social web ch8 - 1Mining the social web ch8 - 1
Mining the social web ch8 - 1scor7910
 
The Ring programming language version 1.7 book - Part 43 of 196
The Ring programming language version 1.7 book - Part 43 of 196The Ring programming language version 1.7 book - Part 43 of 196
The Ring programming language version 1.7 book - Part 43 of 196Mahmoud Samir Fayed
 
Nltk:a tool for_nlp - py_con-dhaka-2014
Nltk:a tool for_nlp - py_con-dhaka-2014Nltk:a tool for_nlp - py_con-dhaka-2014
Nltk:a tool for_nlp - py_con-dhaka-2014Fasihul Kabir
 
Hack Like It's 2013 (The Workshop)
Hack Like It's 2013 (The Workshop)Hack Like It's 2013 (The Workshop)
Hack Like It's 2013 (The Workshop)Itzik Kotler
 
summer training report on python
summer training report on pythonsummer training report on python
summer training report on pythonShubham Yadav
 
pythonlibrariesandmodules-210530042906.docx
pythonlibrariesandmodules-210530042906.docxpythonlibrariesandmodules-210530042906.docx
pythonlibrariesandmodules-210530042906.docxRameshMishra84
 
Data analysis in R
Data analysis in RData analysis in R
Data analysis in RAndrew Lowe
 
Introduction to Python for Bioinformatics
Introduction to Python for BioinformaticsIntroduction to Python for Bioinformatics
Introduction to Python for BioinformaticsJosé Héctor Gálvez
 

Similar a Natural Language Processing (20)

Natural Language Processing: Comparing NLTK and OpenNLP
Natural Language Processing: Comparing NLTK and OpenNLPNatural Language Processing: Comparing NLTK and OpenNLP
Natural Language Processing: Comparing NLTK and OpenNLP
 
Open nlp presentationss
Open nlp presentationssOpen nlp presentationss
Open nlp presentationss
 
Designing A Project Using Java Programming
Designing A Project Using Java ProgrammingDesigning A Project Using Java Programming
Designing A Project Using Java Programming
 
PYTHON PPT.pptx
PYTHON PPT.pptxPYTHON PPT.pptx
PYTHON PPT.pptx
 
Nltk
NltkNltk
Nltk
 
Introduction to python
Introduction to pythonIntroduction to python
Introduction to python
 
Text classification-php-v4
Text classification-php-v4Text classification-php-v4
Text classification-php-v4
 
Nltk - Boston Text Analytics
Nltk - Boston Text AnalyticsNltk - Boston Text Analytics
Nltk - Boston Text Analytics
 
Mining the social web ch8 - 1
Mining the social web ch8 - 1Mining the social web ch8 - 1
Mining the social web ch8 - 1
 
The Ring programming language version 1.7 book - Part 43 of 196
The Ring programming language version 1.7 book - Part 43 of 196The Ring programming language version 1.7 book - Part 43 of 196
The Ring programming language version 1.7 book - Part 43 of 196
 
Nltk:a tool for_nlp - py_con-dhaka-2014
Nltk:a tool for_nlp - py_con-dhaka-2014Nltk:a tool for_nlp - py_con-dhaka-2014
Nltk:a tool for_nlp - py_con-dhaka-2014
 
Hack Like It's 2013 (The Workshop)
Hack Like It's 2013 (The Workshop)Hack Like It's 2013 (The Workshop)
Hack Like It's 2013 (The Workshop)
 
CPPDS Slide.pdf
CPPDS Slide.pdfCPPDS Slide.pdf
CPPDS Slide.pdf
 
summer training report on python
summer training report on pythonsummer training report on python
summer training report on python
 
NLP CHEAT SHEET.pdf
NLP CHEAT SHEET.pdfNLP CHEAT SHEET.pdf
NLP CHEAT SHEET.pdf
 
Dynamic Python
Dynamic PythonDynamic Python
Dynamic Python
 
Python
PythonPython
Python
 
pythonlibrariesandmodules-210530042906.docx
pythonlibrariesandmodules-210530042906.docxpythonlibrariesandmodules-210530042906.docx
pythonlibrariesandmodules-210530042906.docx
 
Data analysis in R
Data analysis in RData analysis in R
Data analysis in R
 
Introduction to Python for Bioinformatics
Introduction to Python for BioinformaticsIntroduction to Python for Bioinformatics
Introduction to Python for Bioinformatics
 

Más de CloudxLab

Understanding computer vision with Deep Learning
Understanding computer vision with Deep LearningUnderstanding computer vision with Deep Learning
Understanding computer vision with Deep LearningCloudxLab
 
Deep Learning Overview
Deep Learning OverviewDeep Learning Overview
Deep Learning OverviewCloudxLab
 
Recurrent Neural Networks
Recurrent Neural NetworksRecurrent Neural Networks
Recurrent Neural NetworksCloudxLab
 
Autoencoders
AutoencodersAutoencoders
AutoencodersCloudxLab
 
Training Deep Neural Nets
Training Deep Neural NetsTraining Deep Neural Nets
Training Deep Neural NetsCloudxLab
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement LearningCloudxLab
 
Apache Spark - Key Value RDD - Transformations | Big Data Hadoop Spark Tutori...
Apache Spark - Key Value RDD - Transformations | Big Data Hadoop Spark Tutori...Apache Spark - Key Value RDD - Transformations | Big Data Hadoop Spark Tutori...
Apache Spark - Key Value RDD - Transformations | Big Data Hadoop Spark Tutori...CloudxLab
 
Advanced Spark Programming - Part 2 | Big Data Hadoop Spark Tutorial | CloudxLab
Advanced Spark Programming - Part 2 | Big Data Hadoop Spark Tutorial | CloudxLabAdvanced Spark Programming - Part 2 | Big Data Hadoop Spark Tutorial | CloudxLab
Advanced Spark Programming - Part 2 | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
 
Apache Spark - Dataframes & Spark SQL - Part 2 | Big Data Hadoop Spark Tutori...
Apache Spark - Dataframes & Spark SQL - Part 2 | Big Data Hadoop Spark Tutori...Apache Spark - Dataframes & Spark SQL - Part 2 | Big Data Hadoop Spark Tutori...
Apache Spark - Dataframes & Spark SQL - Part 2 | Big Data Hadoop Spark Tutori...CloudxLab
 
Apache Spark - Dataframes & Spark SQL - Part 1 | Big Data Hadoop Spark Tutori...
Apache Spark - Dataframes & Spark SQL - Part 1 | Big Data Hadoop Spark Tutori...Apache Spark - Dataframes & Spark SQL - Part 1 | Big Data Hadoop Spark Tutori...
Apache Spark - Dataframes & Spark SQL - Part 1 | Big Data Hadoop Spark Tutori...CloudxLab
 
Apache Spark - Running on a Cluster | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Running on a Cluster | Big Data Hadoop Spark Tutorial | CloudxLabApache Spark - Running on a Cluster | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Running on a Cluster | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
 
Introduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
 
Introduction to NoSQL | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to NoSQL | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to NoSQL | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to NoSQL | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
 
Introduction to MapReduce - Hadoop Streaming | Big Data Hadoop Spark Tutorial...
Introduction to MapReduce - Hadoop Streaming | Big Data Hadoop Spark Tutorial...Introduction to MapReduce - Hadoop Streaming | Big Data Hadoop Spark Tutorial...
Introduction to MapReduce - Hadoop Streaming | Big Data Hadoop Spark Tutorial...CloudxLab
 
Introduction To TensorFlow | Deep Learning Using TensorFlow | CloudxLab
Introduction To TensorFlow | Deep Learning Using TensorFlow | CloudxLabIntroduction To TensorFlow | Deep Learning Using TensorFlow | CloudxLab
Introduction To TensorFlow | Deep Learning Using TensorFlow | CloudxLabCloudxLab
 
Introduction to Deep Learning | CloudxLab
Introduction to Deep Learning | CloudxLabIntroduction to Deep Learning | CloudxLab
Introduction to Deep Learning | CloudxLabCloudxLab
 
Dimensionality Reduction | Machine Learning | CloudxLab
Dimensionality Reduction | Machine Learning | CloudxLabDimensionality Reduction | Machine Learning | CloudxLab
Dimensionality Reduction | Machine Learning | CloudxLabCloudxLab
 
Ensemble Learning and Random Forests
Ensemble Learning and Random ForestsEnsemble Learning and Random Forests
Ensemble Learning and Random ForestsCloudxLab
 
Decision Trees
Decision TreesDecision Trees
Decision TreesCloudxLab
 

Más de CloudxLab (20)

Understanding computer vision with Deep Learning
Understanding computer vision with Deep LearningUnderstanding computer vision with Deep Learning
Understanding computer vision with Deep Learning
 
Deep Learning Overview
Deep Learning OverviewDeep Learning Overview
Deep Learning Overview
 
Recurrent Neural Networks
Recurrent Neural NetworksRecurrent Neural Networks
Recurrent Neural Networks
 
Naive Bayes
Naive BayesNaive Bayes
Naive Bayes
 
Autoencoders
AutoencodersAutoencoders
Autoencoders
 
Training Deep Neural Nets
Training Deep Neural NetsTraining Deep Neural Nets
Training Deep Neural Nets
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Apache Spark - Key Value RDD - Transformations | Big Data Hadoop Spark Tutori...
Apache Spark - Key Value RDD - Transformations | Big Data Hadoop Spark Tutori...Apache Spark - Key Value RDD - Transformations | Big Data Hadoop Spark Tutori...
Apache Spark - Key Value RDD - Transformations | Big Data Hadoop Spark Tutori...
 
Advanced Spark Programming - Part 2 | Big Data Hadoop Spark Tutorial | CloudxLab
Advanced Spark Programming - Part 2 | Big Data Hadoop Spark Tutorial | CloudxLabAdvanced Spark Programming - Part 2 | Big Data Hadoop Spark Tutorial | CloudxLab
Advanced Spark Programming - Part 2 | Big Data Hadoop Spark Tutorial | CloudxLab
 
Apache Spark - Dataframes & Spark SQL - Part 2 | Big Data Hadoop Spark Tutori...
Apache Spark - Dataframes & Spark SQL - Part 2 | Big Data Hadoop Spark Tutori...Apache Spark - Dataframes & Spark SQL - Part 2 | Big Data Hadoop Spark Tutori...
Apache Spark - Dataframes & Spark SQL - Part 2 | Big Data Hadoop Spark Tutori...
 
Apache Spark - Dataframes & Spark SQL - Part 1 | Big Data Hadoop Spark Tutori...
Apache Spark - Dataframes & Spark SQL - Part 1 | Big Data Hadoop Spark Tutori...Apache Spark - Dataframes & Spark SQL - Part 1 | Big Data Hadoop Spark Tutori...
Apache Spark - Dataframes & Spark SQL - Part 1 | Big Data Hadoop Spark Tutori...
 
Apache Spark - Running on a Cluster | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Running on a Cluster | Big Data Hadoop Spark Tutorial | CloudxLabApache Spark - Running on a Cluster | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Running on a Cluster | Big Data Hadoop Spark Tutorial | CloudxLab
 
Introduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLab
 
Introduction to NoSQL | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to NoSQL | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to NoSQL | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to NoSQL | Big Data Hadoop Spark Tutorial | CloudxLab
 
Introduction to MapReduce - Hadoop Streaming | Big Data Hadoop Spark Tutorial...
Introduction to MapReduce - Hadoop Streaming | Big Data Hadoop Spark Tutorial...Introduction to MapReduce - Hadoop Streaming | Big Data Hadoop Spark Tutorial...
Introduction to MapReduce - Hadoop Streaming | Big Data Hadoop Spark Tutorial...
 
Introduction To TensorFlow | Deep Learning Using TensorFlow | CloudxLab
Introduction To TensorFlow | Deep Learning Using TensorFlow | CloudxLabIntroduction To TensorFlow | Deep Learning Using TensorFlow | CloudxLab
Introduction To TensorFlow | Deep Learning Using TensorFlow | CloudxLab
 
Introduction to Deep Learning | CloudxLab
Introduction to Deep Learning | CloudxLabIntroduction to Deep Learning | CloudxLab
Introduction to Deep Learning | CloudxLab
 
Dimensionality Reduction | Machine Learning | CloudxLab
Dimensionality Reduction | Machine Learning | CloudxLabDimensionality Reduction | Machine Learning | CloudxLab
Dimensionality Reduction | Machine Learning | CloudxLab
 
Ensemble Learning and Random Forests
Ensemble Learning and Random ForestsEnsemble Learning and Random Forests
Ensemble Learning and Random Forests
 
Decision Trees
Decision TreesDecision Trees
Decision Trees
 

Último

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 

Último (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

Natural Language Processing

  • 2. Natural Language Processing Natural Language Processing ● Natural-language processing (NLP) is an area of computer science and artificial intelligence concerned with the interactions between computers and human languages ● In particular how to program computers to fruitfully process large amounts of natural language data
  • 3. Natural Language Processing Natural Language Processing In 1950, Alan Turing published an article titled "Computing Machinery and Intelligence" which proposed what is now called the Turing test as a criterion of intelligence.
  • 4. Natural Language Processing Basic Structure of a NLP application( chatbot considered below ) Natural Language Processing
  • 5. Natural Language Processing Natural Language Processing Knowledge Base – It contains the database of information that is used to equip chatbots with the information needed to respond to queries of customers request. Data Store – It contains interaction history of chatbot with users.
  • 6. Natural Language Processing Natural Language Processing NLP Layer – It translates users queries (free form) into information that can be used for appropriate responses. Application Layer – It is the application interface that is used to interact with the user
  • 7. Natural Language Processing Natural Language Processing - Applications ● Speech Recognition - The task of speech recognition is to map an acoustic signal containing a spoken natural language utterance into the corresponding sequence of words intended by the speaker.
  • 8. Natural Language Processing Natural Language Processing - Applications ● Text Classification - Given an example of text, predict a predefined class label
  • 9. Natural Language Processing Natural Language Processing - Applications ● Caption Generation - It is the problem of describing the contents of an image.
  • 10. Natural Language Processing ● Machine Translation - Machine translation is the problem of converting a source text in one language to another language. Natural Language Processing - Applications
  • 11. Natural Language Processing Natural Language Processing - Applications ● Question Answering - It is the problem where given a subject, such as a document of text, answer a specific question about the subject.
  • 12. Natural Language Processing The most popular Natural Language Processing Tools are: ● Stanford's Core NLP Suite ● Natural Language Toolkit ● Apache Lucene and Solr ● Apache OpenNLP ● Text Blob which is a wrapper over the NLTK library Natural Language Processing - Tools
  • 13. Natural Language Processing Natural Language Processing - Tools Let us use the TextBlob library of Python to Build a program that makes a quiz out of a provided text. It is basically a usage of NER - Named-entity recognition
  • 14. Natural Language Processing Natural Language Processing - TextBlob Let us begin by importing TextBlob and then selecting a text. >>> from textblob import TextBlob Now you can either load a text from a file as >>> f = open(‘filename.txt’) >>> text = f.read() Or assign the text to a variable as >>> text = “World War II (often abbreviated to WWII or WW2), also known as the Second World War, was a ….. “
  • 15. Natural Language Processing Natural Language Processing - TextBlob Next we’ll convert our text to a TextBlob object. >>> text = TextBlob(text) Now we are ready to apply different methods on our text.
  • 16. Natural Language Processing Natural Language Processing - TextBlob Let us understand a few things about the TextBlob api - text.sentences - gives the sentences in a text sentences.tags - gives the tags for each of the word in sentence. It returns a list of tuples with the word being the first element of the tuple and the tag being the second.
  • 17. Natural Language Processing Natural Language Processing - TextBlob Now to generate our quiz ● We will extract each sentence ● We will replace all the nouns and proper nouns with a blank from each sentences. ● To make it easy we will remove only after the fourth word in the sentence.
  • 18. Natural Language Processing Natural Language Processing - TextBlob >>> ww2b = TextBlob(ww2) >>> for sentence in ww2b.sentences: new_sentence = sentence for index, tag in enumerate(sentence.tags): if tag[1] in ('NN', 'NNP') and index > 3: new_sentence = new_sentence.replace(tag[0], "____") print(new_sentence) print("n==================n") Run it on Notebook
  • 19. Natural Language Processing Natural Language Processing - Tools Let’s write a program to find Related Posts using Python’s Scikit Learn
  • 20. Natural Language Processing Natural Language Processing - Tools We are given the task of finding the most related posts from a bunch of posts. The tricky thing that we have to tackle first is how to turn text into something on which we can calculate similarity ??
  • 21. Natural Language Processing Natural Language Processing - Tools How to do it ?? Bag of Words Approach - It totally ignores the order of words and simply uses word counts as their basis. In this model, a text, such as a sentence or a document is represented as the bag (multiset) of its words, disregarding grammar and even word order but keeping multiplicity.
  • 22. Natural Language Processing Natural Language Processing - Tools Vectorization ● For each word in the post, its occurrence is counted and noted in a vector. ● This step is also called vectorization. ● The vector is typically huge as it contains as many elements as words occur in the whole dataset.
  • 23. Natural Language Processing Natural Language Processing - Tools Vectorization - Example For the two statements - "How to format my hard disk" and " Hard disk format problems " the vectors are shown below aka Term Document Matrix
  • 24. Natural Language Processing Natural Language Processing - Tools Vectorization - Using Scikit learn >>> from sklearn.feature_extraction.text import CountVectorizer >>> vectorizer = CountVectorizer(min_df=1) The min_df parameter determines how CountVectorizer treats seldom words ● If it is set to an integer, all words occurring less than that value will be dropped ● If it is a fraction, all words that occur in less than that fraction of the overall dataset will be dropped.
  • 25. Natural Language Processing Natural Language Processing - Tools Vectorization - Using Scikit learn >>> content = ["How to format my hard disk", " Hard disk format problems "] >>> X = vectorizer.fit_transform(content) >>> vectorizer.get_feature_names() ['disk', 'format', 'hard', 'how', 'my', 'problems', 'to'] Run it on Notebook
  • 26. Natural Language Processing Natural Language Processing - Tools Vectorization - Using Scikit learn >>> print(X.toarray().transpose()) [[1 1] [1 1] [1 1] [1 0] [1 0] [0 1] [1 0]] This means that the first sentence contains all the words except "problems", while the second contains all but "how", "my", and "to".
  • 27. Natural Language Processing Natural Language Processing - Tools Finding Distance We can measure distance between two vectors using the Euclidean Distance. But first we will normalize each vectors. The scipy.linalg module provides a function called norm. The norm() function calculates the Euclidean norm (shortest distance)
  • 28. Natural Language Processing Natural Language Processing - Tools Finding Distance >>> def dist_norm(v1, v2): v1_normalized = v1/sp.linalg.norm(v1.toarray()) v2_normalized = v2/sp.linalg.norm(v2.toarray()) delta = v1_normalized - v2_normalized return sp.linalg.norm(delta.toarray())
  • 29. Natural Language Processing Natural Language Processing - Tools Applying Everything we learnt on a toy dataset Now we will consider 5 toy posts and find the similarity with a given post. >>> post1 = "This is a toy post about machine learning. Actually, it contains not much interesting stuff." >>> post2 = "Imaging databases can get huge." >>> post3 = "Most imaging databases save images permanently." >>> post4 = "Imaging databases store images." >>>post5 = "Imaging databases store images. Imaging databases store images. Imaging databases store images."
  • 30. Natural Language Processing Natural Language Processing - Tools Applying Everything we learnt on a toy dataset Now we will build our vectorizer >>> posts = [post1, post2, post3, post4, post5] >>> X_train = vectorizer.fit_transform(posts) >>> num_samples, num_features = X_train.shape >>> print("#samples: %d, #features: %d" % (num_samples, num_features)) #samples: 5, #features: 24 As we provided 5 different posts and there are 24 different words in them.
  • 31. Natural Language Processing Natural Language Processing - Tools Applying Everything we learnt on a toy dataset Finally we will iterate through all the vectors of the posts and find their distance with the new post. Perform on Notebook
  • 32. Natural Language Processing Natural Language Processing Now let us analyse a collection of text documents using Scikit Learn
  • 33. Natural Language Processing Natural Language Processing In this section we will see how to: ● Load the file contents and the categories ● Extract feature vectors suitable for machine learning ● Train a linear model to perform categorization ● Use a grid search strategy to find a good configuration of both the feature extraction components and the classifier
  • 34. Natural Language Processing Natural Language Processing Loading the 20 newsgroups dataset To load the dataset use the code- >>> categories = ['alt.atheism', 'soc.religion.christian', 'comp.graphics', 'sci.med'] >>> from sklearn.datasets import fetch_20newsgroups >>> twenty_train = fetch_20newsgroups(subset='train', categories=categories, shuffle=True, random_state=42)
  • 35. Natural Language Processing Natural Language Processing Loading the 20 newsgroups dataset To load the dataset use the code- >>> categories = ['alt.atheism', 'soc.religion.christian', 'comp.graphics', 'sci.med'] >>> from sklearn.datasets import fetch_20newsgroups >>> twenty_train = fetch_20newsgroups(subset='train', categories=categories, shuffle=True, random_state=42)
  • 36. Natural Language Processing Natural Language Processing Analysing our dataset The target_names holds the list of the requested category names: >>> twenty_train.target_names ['alt.atheism', 'comp.graphics', 'sci.med', 'soc.religion.christian'] The files themselves are loaded in memory in the data attribute >>> len(twenty_train.data) 2257 >>> len(twenty_train.filenames) 2257
  • 37. Natural Language Processing Natural Language Processing Analysing our dataset Content of the first lines of the first loaded file >>> print("n".join(twenty_train.data[0].split("n")[:3])) From: sd345@city.ac.uk (Michael Collier) Subject: Converting images to HP LaserJet III? Nntp-Posting-Host: hampton The category integer id of each sample is stored in the target attribute >>> twenty_train.target[:10] array([1, 1, 3, 3, 3, 3, 3, 2, 2, 2])
  • 38. Natural Language Processing Natural Language Processing Now we will apply the bag of word approach Tokenizing text with scikit-learn >>> from sklearn.feature_extraction.text import CountVectorizer >>> count_vect = CountVectorizer() >>> X_train_counts = count_vect.fit_transform(twenty_train.data) >>> X_train_counts.shape (2257, 35788)
  • 39. Natural Language Processing Natural Language Processing Occurrence count is a good start but there is an issue - ● longer documents will have higher average count values than shorter documents To avoid these potential discrepancies we ● Divide the number of occurrences of each word in a document by the total number of words in the document: these new features are called tf for Term Frequencies.
  • 40. Natural Language Processing Natural Language Processing How can we improve tf ? Downscale weights for words that occur in many documents in the corpus and are therefore less informative than those that occur only in a smaller portion of the corpus. This downscaling is called tf–idf for “Term Frequency times Inverse Document Frequency”.
  • 41. Natural Language Processing Natural Language Processing - Tf-idf ● Tf-idf stands for term frequency-inverse document frequency ● Tf-idf weight is often used in ○ Information retrieval and ○ Text mining ● This weight is a used to evaluate ○ How important a word is to a ○ Document in a collection or corpus
  • 42. Natural Language Processing Natural Language Processing - Tf Term Frequency ● Measures how frequently a term occurs in a document ● It is possible that a term would appear ○ Much more times in long documents than shorter ones ○ This is why we normalize TF TF(t) = (Number of times term t appears in a document) / (Total number of terms in the document)
  • 43. Natural Language Processing Natural Language Processing - IDF Inverse Document Frequency ● Measures how important a term is ● In TF, all terms are considered equally important ● How ever some words and stop words appears lot of time ○ But have least importance ● In IDF we weight down frequent terms ○ And scale up rare terms IDF(t) = log_e(Total number of documents / Number of documents with term t in it)
  • 44. Natural Language Processing Natural Language Processing - Tf-idf Example ● Consider a document containing 100 words and the word cat appears 3 times ● The term frequency(tf) for cat is ○ (3 / 100) = 0.03
  • 45. Natural Language Processing Natural Language Processing - Tf-idf Example ● Now, assume we have 10 million documents and ○ The word cat appears in 1,000 of these ● The inverse document frequency(idf) is ○ log(10,000,000 / 1,000) = 4 ● Tf-idf weight is the product of tf and idf ○ 0.03 * 4 = 0.12
  • 46. Natural Language Processing Natural Language Processing Now let us apply tf-idf to our example tfidf_transformer = TfidfTransformer() >>> X_train_tfidf = tfidf_transformer.fit_transform(X_train_counts) >>> X_train_tfidf.shape (2257, 35788)
  • 47. Natural Language Processing Natural Language Processing Training a classifier We’ll start with a naïve Bayes classifier, which provides a nice baseline for this task. >>> from sklearn.naive_bayes import MultinomialNB >>> clf = MultinomialNB().fit(X_train_tfidf, twenty_train.target) The multinomial variant of Naive Bayes is one the most suitable for word counts tasks.
  • 48. Natural Language Processing Natural Language Processing Now let us make a prediction on a new document To try to predict the outcome on a new document we need to extract the features using almost the same feature extracting chain as before ● We will first transform the new document to count vectors ● Then we’ll transform it with the tfidf_transformer. ● Finally we’ll call the predict method of the classifier
  • 49. Natural Language Processing Natural Language Processing >>> docs_new = ['God is love', 'OpenGL on the GPU is fast'] >>> X_new_counts = count_vect.transform(docs_new) >>> X_new_tfidf = tfidf_transformer.transform(X_new_counts) >>> predicted = clf.predict(X_new_tfidf) >>> for doc, category in zip(docs_new, predicted): ... print('%r => %s' % (doc, twenty_train.target_names[category])) ... 'God is love' => soc.religion.christian 'OpenGL on the GPU is fast' => comp.graphics Run it on Notebook
  • 50. Natural Language Processing Natural Language Processing Now let us combine all the steps in the form of a pipeline >>> from sklearn.pipeline import Pipeline >>> text_clf = Pipeline([('vect', CountVectorizer()), ('tfidf', TfidfTransformer()), ('clf', MultinomialNB()),]) We can now train the model with a single command >>> text_clf.fit(twenty_train.data, twenty_train.target) Pipeline(...)
  • 51. Natural Language Processing Natural Language Processing Now let us perform performance evaluation on the test set >>> import numpy as np >>> twenty_test = fetch_20newsgroups(subset='test', ... categories=categories, shuffle=True, random_state=42) >>> docs_test = twenty_test.data >>> predicted = text_clf.predict(docs_test) >>> np.mean(predicted == twenty_test.target) 0.834… I.e., we achieved 83.4% accuracy
  • 52. Natural Language Processing Natural Language Processing Now we’ll use a different classifier and compute the performance metrics >>> from sklearn.linear_model import SGDClassifier >>> text_clf = Pipeline([('vect', CountVectorizer()), ... ('tfidf', TfidfTransformer()), ... ('clf', SGDClassifier(loss='hinge', penalty='l2', ... alpha=1e-3, random_state=42, ... max_iter=5, tol=None)), ... ])
  • 53. Natural Language Processing Natural Language Processing Now we’ll use a different classifier and compute the performance metrics >>> text_clf.fit(twenty_train.data, twenty_train.target) Pipeline(...) >>> predicted = text_clf.predict(docs_test) >>> np.mean(predicted == twenty_test.target) 0.912...
  • 54. Natural Language Processing Natural Language Processing Parameter tuning using grid search Since there are different parameters which we can choose, we’ll apply grid search to find the best parameters >>> parameters = {'vect__ngram_range': [(1, 1), (1, 2)], ... 'tfidf__use_idf': (True, False), ... 'clf__alpha': (1e-2, 1e-3), ... } Here we’ll be applying grid search for the parameters - ngram_range, use_idf and alpha.
  • 55. Natural Language Processing Natural Language Processing Parameter tuning using grid search If we have multiple CPU cores at our disposal, we can tell the grid searcher to try these eight parameter combinations in parallel with the n_jobs parameter. >>> gs_clf = GridSearchCV(text_clf, parameters, n_jobs=-1) >>> gs_clf = gs_clf.fit(twenty_train.data[:400], twenty_train.target[:400])
  • 56. Natural Language Processing Natural Language Processing Predicting and finding best score >>> twenty_train.target_names[gs_clf.predict(['God is love'])[0]] 'soc.religion.christian' >>> gs_clf.best_score_ 0.900...
  • 57. Natural Language Processing Natural Language Processing Predicting and finding best score >>> for param_name in sorted(parameters.keys()): ... print("%s: %r" % (param_name, gs_clf.best_params_[param_name])) ... clf__alpha: 0.001 tfidf__use_idf: True vect__ngram_range: (1, 1)
  • 58. Natural Language Processing Natural Language Processing - Tools Overview of Stanford Core NLP
  • 59. Natural Language Processing Natural Language Processing - Stanford NLP ● Stanford CoreNLP provides a set of human language technology tools. ● It can give ○ The base forms of words, ○ Their parts of speech, ○ Whether they are names of companies, people, etc., ○ Mark up the structure of sentences in terms of phrases and syntactic dependencies, ○ Indicate which noun phrases refer to the same entities, indicate sentiment
  • 60. Natural Language Processing Natural Language Processing - Stanford NLP Choose Stanford CoreNLP if you need: ● An integrated NLP toolkit with a broad range of grammatical analysis tools ● A fast, robust annotator for arbitrary texts, widely used in production ● A modern, regularly updated package, with the overall highest quality text analytics
  • 61. Natural Language Processing Natural Language Processing - Stanford NLP Choose Stanford CoreNLP if you need: ● Support for a number of major (human) languages ● Available APIs for most major modern programming languages ● Ability to run as a simple web service
  • 62. Natural Language Processing Natural Language Processing - Stanford NLP Programming languages and operating systems Stanford CoreNLP is written in Java; recent releases require Java 1.8+. You can interact with CoreNLP via the command-line or its web service using languages like Javascript, Python etc.
  • 63. Natural Language Processing Natural Language Processing - Stanford NLP Programming languages and operating systems You can use Stanford CoreNLP from ● The command-line, via its original Java programmatic API, ● Via the object-oriented simple API, ● Via third party APIs for most major modern programming languages, Or via a web service. It works on Linux, macOS, and Windows
  • 64. Natural Language Processing Natural Language Processing More coming up on CloudxLab ● Word2vec - Vector Representations of Words ● Deep Learning - LSTM - Long Short-Term Memory ● GloVe - Global Vectors for Word Representation ● spaCY - Industrial-Strength Natural Language Processing in Python ● Hands-on using Stanford CoreNLP ● List of APIs available for chatbots etc