1. February14, 2019
[NATURAL LANGUAGE PROCESSING(NLP)|ARTIFICIAL
INTELLIGENCE]
1 | D E P T O F E C E | H A R I S H P M
TABEL OF CONTENTS
1. Introduction to NLP
1.1.Natural Language Processing
1.2.Spam Filter
1.3.Sentiment analysis
1.4.Different levels of linguistic analysis
2. Working of NLP
2.1.Natural Language Understanding(NLU)
2.2.Natural Language Generation (NLG)
3. Understanding NLTK
3.1.NLTK Structure
3.2.NLTK: Example Modules
3.3.Tokenization
3.4.NLP For log analysis and log mining
3.5.Techniques used for log analysis and log mining
3.6.Role of log analysis and log mining
3.7.Big data
3.8.NLP for big data is the next big thing
4. Deep Learning in NLP
4.1. Why deep learning needed in NLP
5. Applications of NLP
6. References
2. February14, 2019
[NATURAL LANGUAGE PROCESSING(NLP)|ARTIFICIAL
INTELLIGENCE]
2 | D E P T O F E C E | H A R I S H P M
LIST OF FIGURES
Fig 1.1 : The ultimate goal of NLP
Fig 1.2 : NLP
Fig 1.3 : Google Assistant
Fig 1.4 : Cortana
Fig 1.5 : Spam Filter
Fig 1.6 : Sentiment Analysis
Fig 2.1 : NLU
Fig 4.1 Flow graph of DL in NLP
Fig 4.2 : Tasks of DL in NLP
Fig 5.1 Deeper applications of NLP
3. February14, 2019
[NATURAL LANGUAGE PROCESSING(NLP)|ARTIFICIAL
INTELLIGENCE]
3 | D E P T O F E C E | H A R I S H P M
1.INTRODUCTION TO NATURAL
LANGUAGE PROCESSING (NLP)
1.1 Natural Language Processing (NLP) :
Natural Language Processing (NLP) is “ability of machines to understand and
interpret human language the way it is written or spoken”. The objective of NLP is
to make computer/machines as intelligent as human beings in understanding
language.
Fig 1.1 : The ultimate goal of NLP
The ultimate goal of NLP is to the fill the gap how the humans
communicate(natural language) and what the computer understands(machine
language).
4. February14, 2019
[NATURAL LANGUAGE PROCESSING(NLP)|ARTIFICIAL
INTELLIGENCE]
4 | D E P T O F E C E | H A R I S H P M
Natural language processing is a field concerned with the ability of a
computer to understand, analyse, manipulate, and potentially generate
human language. By human language, we’re simply referring to any
language used for everyday communication.
It refers to the way we communicate to each other using speech and text.
Fig 1.2 : NLP
You probably experience natural language processing on a daily basis.
You may not really even know it. So here are a few examples that you
may see on a day to day basis. The first would be a spam filter, so this is
just where your email server is determining whether an incoming email is
spam or not, based on the content of the body, the subject, and maybe the
email domain. The second is auto-complete, where Google is basically
predicting what you’re interested in searching for based on what you’ve
already entered and what others commonly search for with those same
phrases.
5. February14, 2019
[NATURAL LANGUAGE PROCESSING(NLP)|ARTIFICIAL
INTELLIGENCE]
5 | D E P T O F E C E | H A R I S H P M
We have many examples around for the same like,
Google assistant,Cortana,Spam Filter, Sentiment Analysis,
Alexa, Google Translator, Text Summarization etc.
Fig 1.3 : Google Assistant
6. February14, 2019
[NATURAL LANGUAGE PROCESSING(NLP)|ARTIFICIAL
INTELLIGENCE]
6 | D E P T O F E C E | H A R I S H P M
Fig 1.4 : Cortana
1.2 Spam Filter:
Service designed to block spam. Looks for certain criteria on which it
takes decision and blocks spam (unwanted mails).
Fig 1.5 : Spam Filter
7. February14, 2019
[NATURAL LANGUAGE PROCESSING(NLP)|ARTIFICIAL
INTELLIGENCE]
7 | D E P T O F E C E | H A R I S H P M
1.3 Sentiment Analysis:
It is like data mining and also used as opinion mining which measures
the people’s opinion via NLP and also used to extract subjective
information from the web.
Fig 1.6 : Sentiment Analysis
1.4 Different levels of linguistic analysis done before
performing NLP:
Syntax – What part of given text is grammatically true.
Semantics – What is the meaning of given text?
Pragmatics – What is the purpose of the text?
8. February14, 2019
[NATURAL LANGUAGE PROCESSING(NLP)|ARTIFICIAL
INTELLIGENCE]
8 | D E P T O F E C E | H A R I S H P M
2.WORKING OF NLP
In case the text is composed of speech, speech-to-text conversion is
performed. The mechanism of Natural Language Processing involves
two processes: Natural Language Understanding and Natural Language
Generation
NLU — Natural LanguageUnderstanding
NLG — Natural Language Generation
Both the parts are different from each other and are achieved by using
different methods.
2.1 NATURAL LANGUAGE UNDERSTANDING (NLU)
NLU or Natural Language Understanding tries to understand the
meaning of given text. The nature and structure of each word inside text
must be understood for NLU. For understanding structure, NLU tries to
resolve following ambiguity present in natural language:
Lexical Ambiguity – Words have multiple meanings
Syntactic Ambiguity – Sentence having multiple parse trees.
Semantic Ambiguity – Sentence having multiple meanings
Anaphoric Ambiguity – Phrase or word which is previously
mentioned but has a different meaning.
9. February14, 2019
[NATURAL LANGUAGE PROCESSING(NLP)|ARTIFICIAL
INTELLIGENCE]
9 | D E P T O F E C E | H A R I S H P M
Next, the meaning of each word is understood by using lexicons
(vocabulary) and set of grammatical rules. However, there are certain
different words having similar meaning (synonyms) and words having
more than one meaning (polysemy).
It uses computer software to understand I/P (input) language in the form
of Text/Speech.
It is collection of APIs that offers text analysis through NLP.
In NLU the set of APIs help to understand I/P language. Summarizing
NLU in short as it tries to understand I/P text.
Fig 2.1 : NLU
10. February14, 2019
[NATURAL LANGUAGE PROCESSING(NLP)|ARTIFICIAL
INTELLIGENCE]
10 | D E P T O F E C E | H A R I S H P M
2.2 NATURAL LANGUAGE GENERATION (NLG)
It is the process of automatically producing text from structured data in a
readable format with meaningful phrases and sentences. The problem of
natural language generation is hard to deal with. It is subset of NLP.
Natural language generation divided into three proposed stages:
1. Text Planning – Ordering of the basic content in structured data is
done.
2. Sentence Planning – The sentences are combined from structured
data to represent the flow of information.
3. Realization – Grammatically correct sentences are produced
finally to represent text.
It translates computer’s Artificial language into text/ audible speech as an
O/P (output).
It uses mathematical models in the background and determines which
text/audio is to be generated.
It uses speech database to put all the recorded phonemes together to form
a coherent string. Summarizing NLG in short as it tries to generate its
Artificial Language as an O/P.
11. February14, 2019
[NATURAL LANGUAGE PROCESSING(NLP)|ARTIFICIAL
INTELLIGENCE]
11 | D E P T O F E C E | H A R I S H P M
3. Understanding NLTK
• A software package for manipulating linguistic data and performing
NLP tasks
•Advanced tasks are possible from an early stage
• Permits projects at various levels
• Consistent interfaces
• Facilitates reusability of modules
• Implemented in Python
3.1 NLTK Structure:
NLTK is organized as a flat hierarchy of packages and modules.
• Each module provides the tools necessary to address a specific task
• Modules contain two types of classes:
– Data-oriented classes are used to represent information relevant to
natural languageprocessing.
– Task-oriented classes encapsulate the resources and methods needed
to perform a specific task.
12. February14, 2019
[NATURAL LANGUAGE PROCESSING(NLP)|ARTIFICIAL
INTELLIGENCE]
12 | D E P T O F E C E | H A R I S H P M
• Task Modules
–Tokenising
–Parsing
–Other NLP tasks
3.2 NLTK: Example Modules
nltk.token: processing individual elements of text, such as words or
sentences.
• nltk.probability: modeling frequency distributions and probabilistic
systems.
• nltk.tagger: tagging tokens with supplemental information, such as
parts of speech or wordnet sense tags.
• nltk.parser: high-level interface for parsing texts.
• nltk.chartparser: a chart-based implementation of the parser interface.
nltk.chunkparser: a regular-expression based surface parser.
Now Let us understand the Token Module
• It is often useful to think of a text in terms of smaller elements, such as
words or sentences.
13. February14, 2019
[NATURAL LANGUAGE PROCESSING(NLP)|ARTIFICIAL
INTELLIGENCE]
13 | D E P T O F E C E | H A R I S H P M
• The nltk.token module defines classes for representing and processing
these smaller elements.
• What might be other useful smaller elements?
• The term word can be used in two different ways:
1. To refer to an individual occurrence of a word
2. To refer to an abstract vocabulary item
• For example, the sentence “my dog likes his dog” contains five
occurrences of words, but four vocabulary items.
• To avoid confusion use more precise terminology:
1. Word token: an occurrence of a word
2. Word Type: a vocabulary item
14. February14, 2019
[NATURAL LANGUAGE PROCESSING(NLP)|ARTIFICIAL
INTELLIGENCE]
14 | D E P T O F E C E | H A R I S H P M
3.3 Tokenization
• The simplest way to represent a text is with a single string.
• Difficult to process text in this format.
• Often, it is more convenient to work with a list of tokens.
The task of converting a text from a single string to a list of tokens is
known as tokenization.
3.4 NLP FOR LOG ANALYSIS AND LOG MINING
What is Log?
A collection of messages from different network devices and hardware
in time sequence represents a log. Logs may be directed to files present
on hard disks or can be sent over the network as a stream of messages to
log collector. Logs provide the process to maintain and track the
hardware performance, parameters tuning, emergency and recovery of
systems and optimization of applications and infrastructure. You may
also love to read – Understanding Log Analytics, Log Mining and
Anomaly Detection
What is Log Analysis?
Log analysis is the process of extracting information from logs
considering the different syntax and semantics of messages in the log
files and interpreting the context with application to have a comparative
15. February14, 2019
[NATURAL LANGUAGE PROCESSING(NLP)|ARTIFICIAL
INTELLIGENCE]
15 | D E P T O F E C E | H A R I S H P M
analysis of log files coming from different sources for anomaly detection
and finding correlations.
What is Log Mining?
Log mining or log knowledge discovery is the process of extracting
patterns and correlations in logs to reveal knowledge and predict
anomaly detection if any inside log messages.
3.5 TECHNIQUES USED FOR LOG ANALYSIS
AND LOG MINING
Different techniques used for performing log analysis are described
below
Pattern recognition – It is one such technique which involves
comparing log messages with messages stored in pattern book to
filter out messages.
Normalization – Normalization of log messages is done to
convert different messages into the same format. This is done
when different log messages having different terminology but
same interpretation is coming from different sources like
applications or operating systems.
Classification & Tagging – Classification & Tagging of different
log messages involves ordering of messages and tagging them with
different keywords for later analysis.
16. February14, 2019
[NATURAL LANGUAGE PROCESSING(NLP)|ARTIFICIAL
INTELLIGENCE]
16 | D E P T O F E C E | H A R I S H P M
Artificial Ignorance – It is a kind of technique using machine
learning algorithms to discard uninteresting log messages. It is also
used to detect an anomaly in the normal working of systems.
3.6 ROLE OF NLP IN LOG ANALYSIS & LOG
MINING
Natural Language processing techniques are widely used in log analysis
and log mining. The different techniques such as tokenization,
stemming, lemmatization, parsing etc are used to convert log messages
into structured form. Once logs are available in the well-documented
form, log analysis, and log mining is performed to extract useful
information and knowledge is discovered from information. The
example in case of error log caused due to server failure.
3.7 BIG DATA
According to the Author Dr. Kirk Borne, Principal Data Scientist, Big
Data Definition is described as big data is everything, quantified, and
tracked.
3.8 NLP for Big Data is the Next Big Thing
Today around 80 % of total data is available in the raw form. Big Data comes
from information stored in big organizations as well as enterprises. Examples
include information of employees, company purchase, sale records, business
transactions, the previous record of organizations, social media etc. Though
humans use language, which is ambiguous and unstructured to be interpreted.
17. February14, 2019
[NATURAL LANGUAGE PROCESSING(NLP)|ARTIFICIAL
INTELLIGENCE]
17 | D E P T O F E C E | H A R I S H P M
4.DEEP LEARNING IN NLP
4.1 WHY DEEP LEARNING NEEDED IN NLP ?
It uses a rule-based approach that represents Words as ‘One-Hot’
encoded vectors. The traditional method focuses on syntactic
representation instead of semantic representation. Bag of words
classification model is unable to distinguish certain contexts.
Fig 4.1 Flow graph of DL in NLP
18. February14, 2019
[NATURAL LANGUAGE PROCESSING(NLP)|ARTIFICIAL
INTELLIGENCE]
18 | D E P T O F E C E | H A R I S H P M
THREE CAPABILITIES OF DEEP LEARNING
Expressibility – This quality describes how well a machine can
approximate universal functions.
Trainability – How well and quickly a DL system can learn its problem.
Generalizability – How well the machine can perform predictions on
data that it has not been trained on.
There are of course other capabilities that also need to be considered in
Deep Learning such as interpretability, modularity, transferability,
latency, adversarial stability, and security. But these are the main ones.
Sentence segmentation – It identifies sentence boundaries in the
given text i.e where one sentence ends and where another sentence
begins. Sentences are often marked ended with punctuation mark
‘.’
Tokenization – It identifies different words, numbers, and other
punctuation symbols.
Stemming – It strips the ending of words like ‘eating’ is reduced
to ‘eat.’
Part of speech (POS) tagging – It assigns each word in a sentence
its respective part-of-speech tag such as designating word as noun
or adverb.
Parsing – It involves dividing given text into different categories.
To answer a question like this part of sentence modify another part
of the sentence.
19. February14, 2019
[NATURAL LANGUAGE PROCESSING(NLP)|ARTIFICIAL
INTELLIGENCE]
19 | D E P T O F E C E | H A R I S H P M
Fig 4.2 : Tasks of DL in NLP
20. February14, 2019
[NATURAL LANGUAGE PROCESSING(NLP)|ARTIFICIAL
INTELLIGENCE]
20 | D E P T O F E C E | H A R I S H P M
5.APPLICATIONS OF NLP
Apart from application in Big Data, Log Mining, and Log Analysis it
has other major application areas. Although the term ‘NLP’ is not as
popular as ‘big data’ ‘machine learning’ but we are using NLP every
day.
Automatic summarizer – Given the input text, the task is to write a
summary of text discarding irrelevant points.
Sentimental analysis – It is done on the given text to predict the subject
of the text eg: whether the text conveys judgment, opinion or reviews
etc.
Text classification – It is performed to categorize different journals,
news stories according to their domain. Multi-document classification is
also possible. A popular example of text classification is spam detection
in emails.
Based on the style of the writing in the journal, its attribute can be used
to detect its author name.
Information Extraction – Information extraction is something which
proposes email program to automatically add events to the calendar.
21. February14, 2019
[NATURAL LANGUAGE PROCESSING(NLP)|ARTIFICIAL
INTELLIGENCE]
21 | D E P T O F E C E | H A R I S H P M
Fig 5.1 Deeper applications of NLP
22. February14, 2019
[NATURAL LANGUAGE PROCESSING(NLP)|ARTIFICIAL
INTELLIGENCE]
22 | D E P T O F E C E | H A R I S H P M
6.REFERENCES
Bates, M (1995). "Models of natural language understanding". Proceedings of the National Academy of
Sciences of the United States of America. 92 (22): 9977–
9982. doi:10.1073/pnas.92.22.9977. PMC 40721. PMID 7479812.
Steven Bird, Ewan Klein, and Edward Loper (2009). Natural Language Processing with Python. O'Reilly
Media. ISBN 978-0-596-51649-9.
Daniel Jurafsky and James H. Martin (2008). Speech and Language Processing, 2nd edition. Pearson
Prentice Hall. ISBN 978-0-13-187321-6.
Mohamed Zakaria Kurdi (2016). Natural Language Processing and Computational Linguistics: speech,
morphology, and syntax, Volume 1. ISTE-Wiley. ISBN 978-1848218482.
Mohamed Zakaria Kurdi (2017). Natural Language Processing and Computational Linguistics: semantics,
discourse, and applications, Volume 2. ISTE-Wiley. ISBN 978-1848219212.
Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze (2008). Introduction to Information
Retrieval. Cambridge University Press. ISBN 978-0-521-86571-5. Official html and pdf versions available
without charge.
Christopher D. Manning and Hinrich Schütze (1999). Foundations of Statistical Natural Language
Processing. The MIT Press. ISBN 978-0-262-13360-9.
David M. W. Powers and Christopher C. R. Turk (1989). Machine Learning of Natural Language. Springer-
Verlag. ISBN 978-0-387-19557-5.