SlideShare una empresa de Scribd logo
1 de 17
Text Summarization Using NLP
Presented by,
Akash N. Karwande
(2016MNS011)
Guided by
Prof. R.K. Chavan
Introduction
The goal of summarization is to produce a shorter version of a source text by
preserving the meaning and the key contents of the original document.
A well written summary can significantly reduce the amount of work
needed to digest large amounts of text.
Types of Text summarization
There are two types summaries
1. Extractive summaries
2. Abstractive summaries
Extractive summaries
 Extractive summaries are created by reusing portions (words,
sentences, etc.) of the input text document
 The system extracts text from the entire collection, without modifying
the text document.
 Most of the summarization research today is on extractive
summarization.
Abstractive summaries
 Requires deep understanding and reasoning over the text
 It Provides own summary over input text without using same word or
sentence in the input text
 Determines the actual and short meaning of each element, such as
words,sentences and paragraphs
Natural Language Toolkit
 leading platform for building Python programs to work with human
language data
 NLP is a field of computer science, artificial intelligence (also called
machine learning), and linguistics processing
 Interactions between computers and human (natural) languages
 It provides suite of text processing libraries for classification, tokenization,
stemming, tagging, parsing, and semantic reasoning

Continued…
Following are the NLTK Libraries used in text summarization
 Word tokenizer
 Sentence tokenizer
 stopwords
 BeautifulSoup
 numpy library
 Tagging
 Parsing
Linguistic Preprocessing for Automatic Summarization
Fig.Pipeline architecture of an information extraction process
Sentence Segmentation
 Converts raw text into sentences
 List of strings
 Sentence tokenizer
Input Text: John owns a car. It is a Toyota.
Output: Segm1: John owns a car.
Segm2: It is a Toyota.
Tokenization
 Identifies the word tokens from given sentence
 Provides a list of tokens as output
 Word tokenizer
Input: John owns a car.
Output: [[John], [owns], [a], [car], [.]]
Part of speech tagging (POS Tagging)
 Assigns appropriate part of speech tag to each word
 POS is useful in extraction of nouns, adverbs, adjective, which provide
some meaningful information about text
 Generates a list of tuples with POS annotation

Input: [[John], [owns], [a], [car], [.]]
Output: (NP (NNP John)) (VP (VBZ owns) (NP (DT a) (NN car))) (. .)
Entity detection
 Identification of predefined categories such as person, location,
quantities, organizations etc
 NER provides the entity detection for linguistic processing
 NER system uses linguistic grammar-based techniques and also
statistical model to identify the entity
Input: (NP (NNP John)) (VP (VBZ owns) (NP (DT a) (NN car))) (. .)
Output: John->Person
Relation detection
 Identifies the possible relation between two or more chunked sentences
 Co-reference chain provides a relation between two or more sentences
 Provides the link between pronouns and its corresponding nouns
 Replacement of the pronouns with proper nouns
Input Text: John owns a car. It is a Toyota. (In form of parse tree)
Output: "a car" -> "a Toyota"; "It" -> "a Toyota"
Conclusion
 Automatic Text Summarization has been shown to be useful for Natural
Language Processing tasks such as Question Answering or Text
Classification and other related fields of computer science such as
Information Retrieval. And the access time for information searching will
be improved.
Future work
 From our summarization result we have found that by reducing all
sentences that do not contain any geographic information may lead to a
loss of information, since there may exist links between that reduced
sentences. Therefore, we will analyse this issue in detail, by studying
graph based algorithms that capture the relationship between
sentences.
References
 https://www.researchgate.net/publication/315667326 Extractive Based Automatic
Text Summarization
 https://github.com/shreyans29/ The semicolon Data Analytics youtube tutorials on The
Semicolon
 https://gist.github.com/shlomibabluki/5473521 summary_tool.py
 https://thetokenizer.com/2013/04/28/build-your-own-summary-tool/
 https://glowingpython.blogspot.in/2014/09/text-summarization-with-nltk.html
 http://www.nltk.org/ NLTK 3.2.5 documentation
Thank You !

Más contenido relacionado

La actualidad más candente

Text clustering
Text clusteringText clustering
Text clustering
KU Leuven
 
Information retrieval-systems notes
Information retrieval-systems notesInformation retrieval-systems notes
Information retrieval-systems notes
BAIRAVI T
 

La actualidad más candente (20)

Image captioning
Image captioningImage captioning
Image captioning
 
Text clustering
Text clusteringText clustering
Text clustering
 
Natural Language Processing in AI
Natural Language Processing in AINatural Language Processing in AI
Natural Language Processing in AI
 
Natural Language Processing: L02 words
Natural Language Processing: L02 wordsNatural Language Processing: L02 words
Natural Language Processing: L02 words
 
Lecture: Word Sense Disambiguation
Lecture: Word Sense DisambiguationLecture: Word Sense Disambiguation
Lecture: Word Sense Disambiguation
 
Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
NLP
NLPNLP
NLP
 
text summarization using amr
text summarization using amrtext summarization using amr
text summarization using amr
 
Presentation on Sentiment Analysis
Presentation on Sentiment AnalysisPresentation on Sentiment Analysis
Presentation on Sentiment Analysis
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 
Information retrieval-systems notes
Information retrieval-systems notesInformation retrieval-systems notes
Information retrieval-systems notes
 
Text MIning
Text MIningText MIning
Text MIning
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Spam email detection using machine learning PPT.pptx
Spam email detection using machine learning PPT.pptxSpam email detection using machine learning PPT.pptx
Spam email detection using machine learning PPT.pptx
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 

Similar a Text summarization

Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...
Edmond Lepedus
 
PATENT DOCUMENT SUMMARIZATION USING CONCEPTUAL GRAPHS
PATENT DOCUMENT SUMMARIZATION USING CONCEPTUAL GRAPHSPATENT DOCUMENT SUMMARIZATION USING CONCEPTUAL GRAPHS
PATENT DOCUMENT SUMMARIZATION USING CONCEPTUAL GRAPHS
kevig
 
PATENT DOCUMENT SUMMARIZATION USING CONCEPTUAL GRAPHS
PATENT DOCUMENT SUMMARIZATION USING CONCEPTUAL GRAPHSPATENT DOCUMENT SUMMARIZATION USING CONCEPTUAL GRAPHS
PATENT DOCUMENT SUMMARIZATION USING CONCEPTUAL GRAPHS
ijnlc
 

Similar a Text summarization (20)

DOC-20230121-WA0030. (1).pptx
DOC-20230121-WA0030. (1).pptxDOC-20230121-WA0030. (1).pptx
DOC-20230121-WA0030. (1).pptx
 
Conceptual framework for abstractive text summarization
Conceptual framework for abstractive text summarizationConceptual framework for abstractive text summarization
Conceptual framework for abstractive text summarization
 
Open nlp presentationss
Open nlp presentationssOpen nlp presentationss
Open nlp presentationss
 
DOCUMENT SUMMARIZATION IN KANNADA USING KEYWORD EXTRACTION
DOCUMENT SUMMARIZATION IN KANNADA USING KEYWORD EXTRACTION DOCUMENT SUMMARIZATION IN KANNADA USING KEYWORD EXTRACTION
DOCUMENT SUMMARIZATION IN KANNADA USING KEYWORD EXTRACTION
 
Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...
 
ResearchPaper
ResearchPaperResearchPaper
ResearchPaper
 
ALGORITHM FOR TEXT TO GRAPH CONVERSION
ALGORITHM FOR TEXT TO GRAPH CONVERSION ALGORITHM FOR TEXT TO GRAPH CONVERSION
ALGORITHM FOR TEXT TO GRAPH CONVERSION
 
ALGORITHM FOR TEXT TO GRAPH CONVERSION AND SUMMARIZING USING NLP: A NEW APPRO...
ALGORITHM FOR TEXT TO GRAPH CONVERSION AND SUMMARIZING USING NLP: A NEW APPRO...ALGORITHM FOR TEXT TO GRAPH CONVERSION AND SUMMARIZING USING NLP: A NEW APPRO...
ALGORITHM FOR TEXT TO GRAPH CONVERSION AND SUMMARIZING USING NLP: A NEW APPRO...
 
Improvement of Text Summarization using Fuzzy Logic Based Method
Improvement of Text Summarization using Fuzzy Logic Based  MethodImprovement of Text Summarization using Fuzzy Logic Based  Method
Improvement of Text Summarization using Fuzzy Logic Based Method
 
Article Summarizer
Article SummarizerArticle Summarizer
Article Summarizer
 
IMPROVE THE QUALITY OF IMPORTANT SENTENCES FOR AUTOMATIC TEXT SUMMARIZATION
IMPROVE THE QUALITY OF IMPORTANT SENTENCES FOR AUTOMATIC TEXT SUMMARIZATIONIMPROVE THE QUALITY OF IMPORTANT SENTENCES FOR AUTOMATIC TEXT SUMMARIZATION
IMPROVE THE QUALITY OF IMPORTANT SENTENCES FOR AUTOMATIC TEXT SUMMARIZATION
 
Challenge of Image Retrieval, Brighton, 2000 1 ANVIL: a System for the Retrie...
Challenge of Image Retrieval, Brighton, 2000 1 ANVIL: a System for the Retrie...Challenge of Image Retrieval, Brighton, 2000 1 ANVIL: a System for the Retrie...
Challenge of Image Retrieval, Brighton, 2000 1 ANVIL: a System for the Retrie...
 
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUECOMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
 
PATENT DOCUMENT SUMMARIZATION USING CONCEPTUAL GRAPHS
PATENT DOCUMENT SUMMARIZATION USING CONCEPTUAL GRAPHSPATENT DOCUMENT SUMMARIZATION USING CONCEPTUAL GRAPHS
PATENT DOCUMENT SUMMARIZATION USING CONCEPTUAL GRAPHS
 
PATENT DOCUMENT SUMMARIZATION USING CONCEPTUAL GRAPHS
PATENT DOCUMENT SUMMARIZATION USING CONCEPTUAL GRAPHSPATENT DOCUMENT SUMMARIZATION USING CONCEPTUAL GRAPHS
PATENT DOCUMENT SUMMARIZATION USING CONCEPTUAL GRAPHS
 
NLP
NLPNLP
NLP
 
Keyword Extraction Based Summarization of Categorized Kannada Text Documents
Keyword Extraction Based Summarization of Categorized Kannada Text Documents Keyword Extraction Based Summarization of Categorized Kannada Text Documents
Keyword Extraction Based Summarization of Categorized Kannada Text Documents
 
An Approach To Automatic Text Summarization Using Simplified Lesk Algorithm A...
An Approach To Automatic Text Summarization Using Simplified Lesk Algorithm A...An Approach To Automatic Text Summarization Using Simplified Lesk Algorithm A...
An Approach To Automatic Text Summarization Using Simplified Lesk Algorithm A...
 
IRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET- Short-Text Semantic Similarity using Glove Word EmbeddingIRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET- Short-Text Semantic Similarity using Glove Word Embedding
 
Automatic keyword extraction.pptx
Automatic keyword extraction.pptxAutomatic keyword extraction.pptx
Automatic keyword extraction.pptx
 

Último

Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
Health
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
ranjankumarbehera14
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
SayantanBiswas37
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 

Último (20)

Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 

Text summarization

  • 1. Text Summarization Using NLP Presented by, Akash N. Karwande (2016MNS011) Guided by Prof. R.K. Chavan
  • 2. Introduction The goal of summarization is to produce a shorter version of a source text by preserving the meaning and the key contents of the original document. A well written summary can significantly reduce the amount of work needed to digest large amounts of text.
  • 3. Types of Text summarization There are two types summaries 1. Extractive summaries 2. Abstractive summaries
  • 4. Extractive summaries  Extractive summaries are created by reusing portions (words, sentences, etc.) of the input text document  The system extracts text from the entire collection, without modifying the text document.  Most of the summarization research today is on extractive summarization.
  • 5. Abstractive summaries  Requires deep understanding and reasoning over the text  It Provides own summary over input text without using same word or sentence in the input text  Determines the actual and short meaning of each element, such as words,sentences and paragraphs
  • 6. Natural Language Toolkit  leading platform for building Python programs to work with human language data  NLP is a field of computer science, artificial intelligence (also called machine learning), and linguistics processing  Interactions between computers and human (natural) languages  It provides suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning 
  • 7. Continued… Following are the NLTK Libraries used in text summarization  Word tokenizer  Sentence tokenizer  stopwords  BeautifulSoup  numpy library  Tagging  Parsing
  • 8. Linguistic Preprocessing for Automatic Summarization Fig.Pipeline architecture of an information extraction process
  • 9. Sentence Segmentation  Converts raw text into sentences  List of strings  Sentence tokenizer Input Text: John owns a car. It is a Toyota. Output: Segm1: John owns a car. Segm2: It is a Toyota.
  • 10. Tokenization  Identifies the word tokens from given sentence  Provides a list of tokens as output  Word tokenizer Input: John owns a car. Output: [[John], [owns], [a], [car], [.]]
  • 11. Part of speech tagging (POS Tagging)  Assigns appropriate part of speech tag to each word  POS is useful in extraction of nouns, adverbs, adjective, which provide some meaningful information about text  Generates a list of tuples with POS annotation  Input: [[John], [owns], [a], [car], [.]] Output: (NP (NNP John)) (VP (VBZ owns) (NP (DT a) (NN car))) (. .)
  • 12. Entity detection  Identification of predefined categories such as person, location, quantities, organizations etc  NER provides the entity detection for linguistic processing  NER system uses linguistic grammar-based techniques and also statistical model to identify the entity Input: (NP (NNP John)) (VP (VBZ owns) (NP (DT a) (NN car))) (. .) Output: John->Person
  • 13. Relation detection  Identifies the possible relation between two or more chunked sentences  Co-reference chain provides a relation between two or more sentences  Provides the link between pronouns and its corresponding nouns  Replacement of the pronouns with proper nouns Input Text: John owns a car. It is a Toyota. (In form of parse tree) Output: "a car" -> "a Toyota"; "It" -> "a Toyota"
  • 14. Conclusion  Automatic Text Summarization has been shown to be useful for Natural Language Processing tasks such as Question Answering or Text Classification and other related fields of computer science such as Information Retrieval. And the access time for information searching will be improved.
  • 15. Future work  From our summarization result we have found that by reducing all sentences that do not contain any geographic information may lead to a loss of information, since there may exist links between that reduced sentences. Therefore, we will analyse this issue in detail, by studying graph based algorithms that capture the relationship between sentences.
  • 16. References  https://www.researchgate.net/publication/315667326 Extractive Based Automatic Text Summarization  https://github.com/shreyans29/ The semicolon Data Analytics youtube tutorials on The Semicolon  https://gist.github.com/shlomibabluki/5473521 summary_tool.py  https://thetokenizer.com/2013/04/28/build-your-own-summary-tool/  https://glowingpython.blogspot.in/2014/09/text-summarization-with-nltk.html  http://www.nltk.org/ NLTK 3.2.5 documentation