SlideShare una empresa de Scribd logo
1 de 11
Descargar para leer sin conexión
Pertemuan 9: Summarization
     12 Desember 2011
 Summarization
 Diberikan sebuah dokumen (korpus), ringkas dalam
  kata-kata yang mewakili isinya
 Extractive summarization
   kata-kata kunci
 Generative summarization
   Kalimat ringkasan




                        Information Retrieval – ISD312   Summarization   2
 Simple statistics
 Most frequent words

  import nltk
  from __future__ import division
  from nltk.book import *




                      Information Retrieval – ISD312   Summarization   3
import nltk
from __future__ import division
from nltk.book import *

def kataKunci(df, ambang):
    max = 0
    for vocab in df.keys():
        if max < df[vocab]:
             max = df[vocab]
    for vocab in df.keys():
        if df[vocab] / max > ambang:
             print vocab,
    print ''

                 Information Retrieval – ISD312   Summarization   4
 Frase, Kumpulan kata
 Collocations
 Jaringan kata dalam dokumen




                    Information Retrieval – ISD312   Summarization   5
 Membangkitkan kalimat
 Simple statistics
   Tabel statistik kemunculan kata
   Statistik Bayesian
   Probabilitas sebuah kata pada awal kalimat
   Probabilitas sebuah kata mengikuti kata lainnya
 Metode lain
   N-gram
   POS-tag



                         Information Retrieval – ISD312   Summarization   6
The rapid growth of the Internet has resulted in enormous
  amounts of information that has become more difficult to access
  efficiently. Internet users require tools to help manage this vast
  quantity of information. The primary goal of this research is to
  create an efficient and effective tool that is able to summarize
  large documents quickly. This research presents a linear time
  algorithm for calculating lexical chains which is a method of
  capturing the “aboutness” of a document. This method is
  compared to previous, less efficient methods of lexical chain
  extraction. We also provide alternative methods for extracting
  and scoring lexical chains. We show that our method provides
  similar results to previous research, but is substantially more
  efficient. This efficiency is necessary in Internet search
  applications where many large documents may need to be
  summarized at once, and where the response time to the end
  user is extremely important.

                          Information Retrieval – ISD312   Summarization   7
import os
os.chdir('pathtotugas')
import tugas
reload(tugas)




                 Information Retrieval – ISD312   Summarization   8
import nltk
data = 'Sebuah contoh kalimat yang ingin
  dianalisis menggunakan NLTK'
tokens = nltk.word_tokenize(data)
text = nltk.Text(tokens)




                 Information Retrieval – ISD312   Summarization   9
 http://www.nltk.org/book
 http://tjerdastangkas.blogspot.com/search/label/isd312




                      Information Retrieval – ISD312   Summarization   10
Senin, 12 Desember 2011

Más contenido relacionado

Destacado

Crowdfunding 101
Crowdfunding 101Crowdfunding 101
Crowdfunding 101Dave Gee
 
Formación en centro 15 16
Formación en centro 15 16Formación en centro 15 16
Formación en centro 15 16XXX XXX
 
Dignity Of Woman Pub Lcomp1
Dignity Of Woman Pub Lcomp1Dignity Of Woman Pub Lcomp1
Dignity Of Woman Pub Lcomp1Elisabeth Riedl
 
Innovations in Institutional Arrangements: Towards Enabling Continuous Transi...
Innovations in Institutional Arrangements: Towards Enabling Continuous Transi...Innovations in Institutional Arrangements: Towards Enabling Continuous Transi...
Innovations in Institutional Arrangements: Towards Enabling Continuous Transi...LINKInnovationStudies
 
Africa 6A
Africa 6AAfrica 6A
Africa 6AC FM
 
How to Embed Innovation into Organization Culture Part 2
How to Embed Innovation into Organization Culture Part 2How to Embed Innovation into Organization Culture Part 2
How to Embed Innovation into Organization Culture Part 2cfrangos
 
Presentation workshop
Presentation workshopPresentation workshop
Presentation workshopSponderEdTech
 
John Mucci Profile
John Mucci ProfileJohn Mucci Profile
John Mucci Profilejmucci
 
Innovation based economic development for industry in haverhill
Innovation based economic development for industry in haverhillInnovation based economic development for industry in haverhill
Innovation based economic development for industry in haverhillJohn Michitson
 
O que é o Foto na Parede?
O que é o Foto na Parede?O que é o Foto na Parede?
O que é o Foto na Parede?Foto na Parede
 
Vortex: The Intelligent Data Sharing Platform for the Internet of Things
Vortex: The Intelligent Data Sharing Platform for the Internet of ThingsVortex: The Intelligent Data Sharing Platform for the Internet of Things
Vortex: The Intelligent Data Sharing Platform for the Internet of ThingsAngelo Corsaro
 
James Powers CEO iLinc keynote at Enterprise Network
James Powers CEO iLinc keynote at Enterprise NetworkJames Powers CEO iLinc keynote at Enterprise Network
James Powers CEO iLinc keynote at Enterprise NetworkKathy Sacks
 
Sunshine coast literacy_jan_2015
Sunshine coast literacy_jan_2015Sunshine coast literacy_jan_2015
Sunshine coast literacy_jan_2015Faye Brownlie
 

Destacado (20)

Crowdfunding 101
Crowdfunding 101Crowdfunding 101
Crowdfunding 101
 
Formación en centro 15 16
Formación en centro 15 16Formación en centro 15 16
Formación en centro 15 16
 
Dignity Of Woman Pub Lcomp1
Dignity Of Woman Pub Lcomp1Dignity Of Woman Pub Lcomp1
Dignity Of Woman Pub Lcomp1
 
Innovations in Institutional Arrangements: Towards Enabling Continuous Transi...
Innovations in Institutional Arrangements: Towards Enabling Continuous Transi...Innovations in Institutional Arrangements: Towards Enabling Continuous Transi...
Innovations in Institutional Arrangements: Towards Enabling Continuous Transi...
 
Africa 6A
Africa 6AAfrica 6A
Africa 6A
 
How to Embed Innovation into Organization Culture Part 2
How to Embed Innovation into Organization Culture Part 2How to Embed Innovation into Organization Culture Part 2
How to Embed Innovation into Organization Culture Part 2
 
Vsb sec lit #1
Vsb sec lit #1Vsb sec lit #1
Vsb sec lit #1
 
Gmecdeck
GmecdeckGmecdeck
Gmecdeck
 
Recent work
Recent workRecent work
Recent work
 
Выход Есть!
Выход Есть!Выход Есть!
Выход Есть!
 
Naresh
NareshNaresh
Naresh
 
Bill haley
Bill haleyBill haley
Bill haley
 
Presentation workshop
Presentation workshopPresentation workshop
Presentation workshop
 
John Mucci Profile
John Mucci ProfileJohn Mucci Profile
John Mucci Profile
 
Innovation based economic development for industry in haverhill
Innovation based economic development for industry in haverhillInnovation based economic development for industry in haverhill
Innovation based economic development for industry in haverhill
 
O que é o Foto na Parede?
O que é o Foto na Parede?O que é o Foto na Parede?
O que é o Foto na Parede?
 
Vortex: The Intelligent Data Sharing Platform for the Internet of Things
Vortex: The Intelligent Data Sharing Platform for the Internet of ThingsVortex: The Intelligent Data Sharing Platform for the Internet of Things
Vortex: The Intelligent Data Sharing Platform for the Internet of Things
 
James Powers CEO iLinc keynote at Enterprise Network
James Powers CEO iLinc keynote at Enterprise NetworkJames Powers CEO iLinc keynote at Enterprise Network
James Powers CEO iLinc keynote at Enterprise Network
 
ikd312-08-fd
ikd312-08-fdikd312-08-fd
ikd312-08-fd
 
Sunshine coast literacy_jan_2015
Sunshine coast literacy_jan_2015Sunshine coast literacy_jan_2015
Sunshine coast literacy_jan_2015
 

Similar a isd312-09-summarization

Knowledge Discovery in Remote Access Databases
Knowledge Discovery in Remote Access Databases Knowledge Discovery in Remote Access Databases
Knowledge Discovery in Remote Access Databases Zakaria Zubi
 
Summarization using ntc approach based on keyword extraction for discussion f...
Summarization using ntc approach based on keyword extraction for discussion f...Summarization using ntc approach based on keyword extraction for discussion f...
Summarization using ntc approach based on keyword extraction for discussion f...eSAT Publishing House
 
Virtual Knowledge Graphs for Federated Log Analysis
Virtual Knowledge Graphs for Federated Log AnalysisVirtual Knowledge Graphs for Federated Log Analysis
Virtual Knowledge Graphs for Federated Log AnalysisKabul Kurniawan
 
Information Retrieval based on Cluster Analysis Approach
Information Retrieval based on Cluster Analysis ApproachInformation Retrieval based on Cluster Analysis Approach
Information Retrieval based on Cluster Analysis ApproachAIRCC Publishing Corporation
 
INFORMATION RETRIEVAL BASED ON CLUSTER ANALYSIS APPROACH
INFORMATION RETRIEVAL BASED ON CLUSTER ANALYSIS APPROACHINFORMATION RETRIEVAL BASED ON CLUSTER ANALYSIS APPROACH
INFORMATION RETRIEVAL BASED ON CLUSTER ANALYSIS APPROACHijcsit
 
A Document Exploring System on LDA Topic Model for Wikipedia Articles
A Document Exploring System on LDA Topic Model for Wikipedia ArticlesA Document Exploring System on LDA Topic Model for Wikipedia Articles
A Document Exploring System on LDA Topic Model for Wikipedia Articlesijma
 
Efficient Similarity Search over Encrypted Data
Efficient Similarity Search over Encrypted DataEfficient Similarity Search over Encrypted Data
Efficient Similarity Search over Encrypted DataIRJET Journal
 
Splunk and map_reduce
Splunk and map_reduceSplunk and map_reduce
Splunk and map_reduceGreg Hanchin
 
ACT Talk, Giuseppe Totaro: High Performance Computing for Distributed Indexin...
ACT Talk, Giuseppe Totaro: High Performance Computing for Distributed Indexin...ACT Talk, Giuseppe Totaro: High Performance Computing for Distributed Indexin...
ACT Talk, Giuseppe Totaro: High Performance Computing for Distributed Indexin...Advanced-Concepts-Team
 
Modern association rule mining methods
Modern association rule mining methodsModern association rule mining methods
Modern association rule mining methodsijcsity
 
Methodology for Managing Dynamic Collections on Semantic Semi-Structured XMLs
Methodology for Managing Dynamic Collections on Semantic Semi-Structured XMLsMethodology for Managing Dynamic Collections on Semantic Semi-Structured XMLs
Methodology for Managing Dynamic Collections on Semantic Semi-Structured XMLsIRJET Journal
 
xldb2012_wed_0950_TimFrazier
xldb2012_wed_0950_TimFrazierxldb2012_wed_0950_TimFrazier
xldb2012_wed_0950_TimFrazierTim Frazier
 
Semantic Knowledge Acquisition of Information for Syntactic web
Semantic Knowledge Acquisition of Information for Syntactic web Semantic Knowledge Acquisition of Information for Syntactic web
Semantic Knowledge Acquisition of Information for Syntactic web dannyijwest
 
CANDIDATE SET KEY DOCUMENT RETRIEVAL SYSTEM
CANDIDATE SET KEY DOCUMENT RETRIEVAL SYSTEMCANDIDATE SET KEY DOCUMENT RETRIEVAL SYSTEM
CANDIDATE SET KEY DOCUMENT RETRIEVAL SYSTEMIRJET Journal
 
Automatic keyword extraction.pptx
Automatic keyword extraction.pptxAutomatic keyword extraction.pptx
Automatic keyword extraction.pptxBiswarupDas18
 
Questions On The Code And Core Module
Questions On The Code And Core ModuleQuestions On The Code And Core Module
Questions On The Code And Core ModuleKatie Gulley
 
A Robust Keywords Based Document Retrieval by Utilizing Advanced Encryption S...
A Robust Keywords Based Document Retrieval by Utilizing Advanced Encryption S...A Robust Keywords Based Document Retrieval by Utilizing Advanced Encryption S...
A Robust Keywords Based Document Retrieval by Utilizing Advanced Encryption S...IRJET Journal
 
IA3_presentation.pptx
IA3_presentation.pptxIA3_presentation.pptx
IA3_presentation.pptxKtonNguyn2
 

Similar a isd312-09-summarization (20)

Knowledge Discovery in Remote Access Databases
Knowledge Discovery in Remote Access Databases Knowledge Discovery in Remote Access Databases
Knowledge Discovery in Remote Access Databases
 
Summarization using ntc approach based on keyword extraction for discussion f...
Summarization using ntc approach based on keyword extraction for discussion f...Summarization using ntc approach based on keyword extraction for discussion f...
Summarization using ntc approach based on keyword extraction for discussion f...
 
Virtual Knowledge Graphs for Federated Log Analysis
Virtual Knowledge Graphs for Federated Log AnalysisVirtual Knowledge Graphs for Federated Log Analysis
Virtual Knowledge Graphs for Federated Log Analysis
 
Information Retrieval based on Cluster Analysis Approach
Information Retrieval based on Cluster Analysis ApproachInformation Retrieval based on Cluster Analysis Approach
Information Retrieval based on Cluster Analysis Approach
 
INFORMATION RETRIEVAL BASED ON CLUSTER ANALYSIS APPROACH
INFORMATION RETRIEVAL BASED ON CLUSTER ANALYSIS APPROACHINFORMATION RETRIEVAL BASED ON CLUSTER ANALYSIS APPROACH
INFORMATION RETRIEVAL BASED ON CLUSTER ANALYSIS APPROACH
 
A Document Exploring System on LDA Topic Model for Wikipedia Articles
A Document Exploring System on LDA Topic Model for Wikipedia ArticlesA Document Exploring System on LDA Topic Model for Wikipedia Articles
A Document Exploring System on LDA Topic Model for Wikipedia Articles
 
Efficient Similarity Search over Encrypted Data
Efficient Similarity Search over Encrypted DataEfficient Similarity Search over Encrypted Data
Efficient Similarity Search over Encrypted Data
 
Splunk and map_reduce
Splunk and map_reduceSplunk and map_reduce
Splunk and map_reduce
 
clustering.pptx
clustering.pptxclustering.pptx
clustering.pptx
 
ACT Talk, Giuseppe Totaro: High Performance Computing for Distributed Indexin...
ACT Talk, Giuseppe Totaro: High Performance Computing for Distributed Indexin...ACT Talk, Giuseppe Totaro: High Performance Computing for Distributed Indexin...
ACT Talk, Giuseppe Totaro: High Performance Computing for Distributed Indexin...
 
Modern association rule mining methods
Modern association rule mining methodsModern association rule mining methods
Modern association rule mining methods
 
Methodology for Managing Dynamic Collections on Semantic Semi-Structured XMLs
Methodology for Managing Dynamic Collections on Semantic Semi-Structured XMLsMethodology for Managing Dynamic Collections on Semantic Semi-Structured XMLs
Methodology for Managing Dynamic Collections on Semantic Semi-Structured XMLs
 
xldb2012_wed_0950_TimFrazier
xldb2012_wed_0950_TimFrazierxldb2012_wed_0950_TimFrazier
xldb2012_wed_0950_TimFrazier
 
Semantic Knowledge Acquisition of Information for Syntactic web
Semantic Knowledge Acquisition of Information for Syntactic web Semantic Knowledge Acquisition of Information for Syntactic web
Semantic Knowledge Acquisition of Information for Syntactic web
 
CANDIDATE SET KEY DOCUMENT RETRIEVAL SYSTEM
CANDIDATE SET KEY DOCUMENT RETRIEVAL SYSTEMCANDIDATE SET KEY DOCUMENT RETRIEVAL SYSTEM
CANDIDATE SET KEY DOCUMENT RETRIEVAL SYSTEM
 
Automatic keyword extraction.pptx
Automatic keyword extraction.pptxAutomatic keyword extraction.pptx
Automatic keyword extraction.pptx
 
Questions On The Code And Core Module
Questions On The Code And Core ModuleQuestions On The Code And Core Module
Questions On The Code And Core Module
 
Mdb dn 2016_06_query_primer
Mdb dn 2016_06_query_primerMdb dn 2016_06_query_primer
Mdb dn 2016_06_query_primer
 
A Robust Keywords Based Document Retrieval by Utilizing Advanced Encryption S...
A Robust Keywords Based Document Retrieval by Utilizing Advanced Encryption S...A Robust Keywords Based Document Retrieval by Utilizing Advanced Encryption S...
A Robust Keywords Based Document Retrieval by Utilizing Advanced Encryption S...
 
IA3_presentation.pptx
IA3_presentation.pptxIA3_presentation.pptx
IA3_presentation.pptx
 

Más de Anung Ariwibowo (20)

isd314-06-association-mining
isd314-06-association-miningisd314-06-association-mining
isd314-06-association-mining
 
ikp213-unifikasi
ikp213-unifikasiikp213-unifikasi
ikp213-unifikasi
 
ikp213-06-horn-clause
ikp213-06-horn-clauseikp213-06-horn-clause
ikp213-06-horn-clause
 
ikp213-01-pendahuluan
ikp213-01-pendahuluanikp213-01-pendahuluan
ikp213-01-pendahuluan
 
ikd312-05-sqlite
ikd312-05-sqliteikd312-05-sqlite
ikd312-05-sqlite
 
ikd312-05-kalkulus-relasional
ikd312-05-kalkulus-relasionalikd312-05-kalkulus-relasional
ikd312-05-kalkulus-relasional
 
ikd312-04-aljabar-relasional
ikd312-04-aljabar-relasionalikd312-04-aljabar-relasional
ikd312-04-aljabar-relasional
 
ikd312-03-design
ikd312-03-designikd312-03-design
ikd312-03-design
 
ikd312-02-three-schema
ikd312-02-three-schemaikd312-02-three-schema
ikd312-02-three-schema
 
ikp213-02-pendahuluan
ikp213-02-pendahuluanikp213-02-pendahuluan
ikp213-02-pendahuluan
 
ikh311-08
ikh311-08ikh311-08
ikh311-08
 
ikh311-07
ikh311-07ikh311-07
ikh311-07
 
ikh311-06
ikh311-06ikh311-06
ikh311-06
 
ikh311-05
ikh311-05ikh311-05
ikh311-05
 
ikp321-svn
ikp321-svnikp321-svn
ikp321-svn
 
ikh311-04
ikh311-04ikh311-04
ikh311-04
 
ikp321-05
ikp321-05ikp321-05
ikp321-05
 
imsakiyah-jakarta-1433-09
imsakiyah-jakarta-1433-09imsakiyah-jakarta-1433-09
imsakiyah-jakarta-1433-09
 
ikh311-03
ikh311-03ikh311-03
ikh311-03
 
ikp321-04
ikp321-04ikp321-04
ikp321-04
 

Último

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 

Último (20)

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 

isd312-09-summarization

  • 1. Pertemuan 9: Summarization 12 Desember 2011
  • 2.  Summarization  Diberikan sebuah dokumen (korpus), ringkas dalam kata-kata yang mewakili isinya  Extractive summarization  kata-kata kunci  Generative summarization  Kalimat ringkasan Information Retrieval – ISD312 Summarization 2
  • 3.  Simple statistics  Most frequent words import nltk from __future__ import division from nltk.book import * Information Retrieval – ISD312 Summarization 3
  • 4. import nltk from __future__ import division from nltk.book import * def kataKunci(df, ambang): max = 0 for vocab in df.keys(): if max < df[vocab]: max = df[vocab] for vocab in df.keys(): if df[vocab] / max > ambang: print vocab, print '' Information Retrieval – ISD312 Summarization 4
  • 5.  Frase, Kumpulan kata  Collocations  Jaringan kata dalam dokumen Information Retrieval – ISD312 Summarization 5
  • 6.  Membangkitkan kalimat  Simple statistics  Tabel statistik kemunculan kata  Statistik Bayesian  Probabilitas sebuah kata pada awal kalimat  Probabilitas sebuah kata mengikuti kata lainnya  Metode lain  N-gram  POS-tag Information Retrieval – ISD312 Summarization 6
  • 7. The rapid growth of the Internet has resulted in enormous amounts of information that has become more difficult to access efficiently. Internet users require tools to help manage this vast quantity of information. The primary goal of this research is to create an efficient and effective tool that is able to summarize large documents quickly. This research presents a linear time algorithm for calculating lexical chains which is a method of capturing the “aboutness” of a document. This method is compared to previous, less efficient methods of lexical chain extraction. We also provide alternative methods for extracting and scoring lexical chains. We show that our method provides similar results to previous research, but is substantially more efficient. This efficiency is necessary in Internet search applications where many large documents may need to be summarized at once, and where the response time to the end user is extremely important. Information Retrieval – ISD312 Summarization 7
  • 8. import os os.chdir('pathtotugas') import tugas reload(tugas) Information Retrieval – ISD312 Summarization 8
  • 9. import nltk data = 'Sebuah contoh kalimat yang ingin dianalisis menggunakan NLTK' tokens = nltk.word_tokenize(data) text = nltk.Text(tokens) Information Retrieval – ISD312 Summarization 9