SlideShare una empresa de Scribd logo
1 de 45
Descargar para leer sin conexión
Advantages of Query
Biased Summaries in
Information Retrieval
Anastasios Tombros
Computing Science Department University of Glasgow, Scotland
Mark Sanderson
CIIR, Computing Science Department University of Massachusetts,
U.S.A.
Presentation by Onur Yılmaz
Outline
 Introduction
 Summarization System
 Experimental Design
 Experimental Results
 Further Work and Conclusions
Introduction
Interaction with an IR system
 User enters a specific information need,
 Query
 Response of a typical system
 Retrieved document list
Introduction
Interaction with an IR system

Introduction
Interaction with an IR system

Title
First few
sentences
URL
Relevance
Quantification
Introduction
Interaction with an IR system
 Users have to decide which of the retrieved documents
are most likely to convey their information need
 Ideally, it should be possible to make this decision
without having to refer to the full document text
Introduction
Interaction with an IR system

Title
First few
sentences
Not clear which the document relates to a
user’s query
Introduction
What is presented?
 Aim
 to minimize users’ need to refer to the full
document text
 to provide enough information to support their
retrieval decisions
Introduction
What is presented?
 Proposed method
 An automatically generated summary of each document in
a retrieved document list, biased to a user’s query
Introduction
Document Summary
 Condensation of a full text document
 indicative of the source’s content
 informative
Introduction
Sentence Extraction
 Selection of sentences from the original document
 Scores are assigned to sentences according to a set of
extraction criteria
 Best-scoring sentences are presented in the summary
Summarisation System
 Aim of system
 Providing information on the relevance of documents
retrieved in response to their query
 Summaries were to be user-directed: biased towards
the user’s query
Summarisation System
Articles
 Wall Street Journal (WSJ) taken from the TREC
collection (Text REtrieval Conferences)
Summarisation System
Methodology
 Examining 50 randomly selected articles from the
collection
 Extract conclusions about the distribution of important
information within them
 Title
 Headings
 Leading paragraph
 Overall structural
Summarisation System
Sentence Extraction - Title Method
 Headlines of news articles
 tend to reveal the major subject of the article
 Terms in the title section of the documents
 assigned a positive weight (title score)
 Leading paragraph of each article
 Provides much information on the article’s content
 In order to quantify this contribution,
 An ordinal weight was assigned to the first two
sentences of each article.
Summarisation System
Sentence Extraction - Title Method
Summarisation System
Sentence Extraction - Title Method
 Section Headings
 «heading score» was assigned to each one of the
sentences comprising a heading
Summarisation System
Sentence Extraction - Term Occurrence (TO)
 Number of term occurrences (TO) within each document
to further assign weights to sentences
 Reasonable TO
 7  medium-sized documents
 (between 25 - 40 sentences)
 TO value is augmented by 10% of the increase in document
size
Summarisation System
Sentence Extraction - Term Occurrence (TO)
 Significantly Related Terms
 Two terms are are significant
 No more than 4 non-significant words between them
Recent studies that show that in the English
language 98% of the lexical relations occur
between words within a span of 5 words in a
Sentence.
Summarisation System
Biasing summaries towards queries
 Query Score
 calculated for each of the sentences of a document
 based on the distribution of query terms in each
sentence
Summarisation System
Biasing summaries towards queries
 For each sentence
+Query score
Score obtained by
the sentence
extraction methods
Final Score=
Summarisation System
Biasing summaries towards queries
 Summary for each document
 Output the top-scoring sentence
 Until desired summary length reached
15% of the
document's
length
up to a
maximum of
five sentences
Experimental Design
 Aim of experiment
 Use of query biased summaries in a retrieved
document list
 would have a positive effect on
 the process of relevance judgement by users
Experimental Design
Experimental Conditions
 Two levels of an independent variable:
 Use of query-biased summaries in a ranked list of
retrieved documents
 Use of static pre-defined summaries
Experimental Design
Groups of Subjects
 Two groups consisting of 10 people
 Postgraduate students
 Population is not representative of that which we wish to
generalize the conclusions
Experimental Design
Retrieval Task
 Identify, in a limited amount of time, as many
relevant documents as possible
 Subjects presented with a retrieved document list
and query
Experimental Design
Queries
 Randomly selected from the queries of the TREC test
collection
 Standard against which the subjects’ relevance
judgements compared
Experimental Design
IR System
 Classic document ranking system
 tf•idf term weighting scheme
 with stop word removal
 word stemming using the Porter stemmer
Experimental Design
Operationalising the Experiment
Select
typical or
proposed
method
Assign 5
random
queries to
subject
Present to
subject
retrieved
document
list (50
highest
ranked
documents)
Subject
identifies
the relevant
documents
in 5 minutes
Experimental Design
Experiment – Typical IR System Output

Experimental Design
Experiment – Proposed IR System Output

Experimental Results
Dependent Variable
 Dependent variable:
 Performance of the users in the
 process of relevance judgements on documents retrieved
by specific queries
Experimental Results
Criteria
 Recall and precision of the relevance judgements
 The speed of judgements
 Need of the assistance from the full text of the
retrieved documents.
 The subjective opinion about the assistance provided
Experimental Results
Recall
 Recall: Number of relevant documents correctly identified by
a subject for a query divided by the total number of relevant
documents, within the examined ones, for that query
«summary group»
managed to
successfully identify
a larger number of
relevant documents
Experimental Results
Precision
 Precision: Relevant documents correctly identified divided by
the total number of indicated relevant documents for a query
Experimental Results
Recall and Precision
 Query biased summaries allow users to identify
more relevant documents, and identify them more
accurately.
Experimental Results
Speed
 Avg. Number of Documents: Actual number of documents that
each subject managed to examine within the specified 5
minute period
13% increase
Tendency for quicker
judgement
Experimental Results
Reference to the full text of the documents
 Static summaries based on the first few lines of a
document is inadequate.
Further Work and Conclusions
Summary
 Query based summaries assist users in performing
relevance judgements more accurately and more
quickly.
 Users can identify more relevant documents for each
query, while at the same time make fewer mistakes.
Further Work and Conclusions
Summary
 Query based summaries almost alleviate the users’
need to refer to the full text
 Users rely almost solely on the information conveyed in
the query-biased summaries in order to perform their
relevance judgements.
Further Work and Conclusions
Future Work
 Possible extensions of the work
 Repeating the experiments by simulating conditions
encountered when searching for information on the
web.
 Intend to examine different summarisation techniques
and to apply our system to alternative test collections.
References
 Abracos, J., and Lopes, G.P. 1997. Statistical methods for retrieving most significant
paragraphs in newspaper articles. In Proceedings of the ACL'97/EACL'97 Workshop on
Intelligent Scalable Text Summarisation (ISTS '97), 51-57. Madrid, Spain, July 11 1997.
 Brandow, R., Mitze, K., and Rau, L.F. 1995. Automatic condensation of electronic
publications by sentence selection. Information Processing & Management 31(5): 675-685,
September 1995.
 Callan, J.P. 1994. Passage-level evidence in document retrieval. In Croft, W.B., and van
Rijbergen, C.J. eds. Proceedings of the Seventeenth Annual ACM SIGIR Conference on
Research and Development in Information Retrieval, 302-310. ACM Press, July 1994.
 Edmundson, H.P. 1964. Problems in automatic abstracting. Communications of the ACM
7(4):259-263, April 1964.
 Edmundson, H.P. 1969. New methods in automatic abstracting. Journal of the ACM
16(2):264-285, April 1969.
 Hand, T.F. 1997. A proposal for a task-based evaluation of text summarisation systems. In
Proceedings of the ACL’97/EACL’97 Workshop on Intelligent Scalable Text Summarisation
(ISTS ’97), 31-38. Madrid, Spain, July 11 1997.
 Harman, D. 1996. Overview of the Fifth Text REtrieval Conference (TREC-5). In Proceedings
of the Text Retrieval Conference (TREC-5), National Institute of Standards and Technology,
Gaithersburg, MD 20899, USA.
References
 Jacobs, P.S., and Rau, L.F. 1990. Scisor: Extracting information from on-line news.
Communications of the ACM 33(11):88-97, November 1990.
 Keppel, G. 1973. Design and analysis: A researcher’s handbook. New Jersey: Prentice Hall.
 Knaus, D., Mittendorf, E., Schauble, P., and Sheridan, P. 1995. Highlighting relevant passages
for users of the interactive SPIDER retrieval system. In Proceedings of the Text Retrieval
Conference (TREC-4), National Institute of Standards and Technology, Gaithersburg, MD
20899, USA.
 Kupiec, J., Pedersen, J., and Chen, F. 1995. A trainable document summariser. In Fox, E.A.,
Ingwersen, P., and Fidel, R. eds. Proceedings of the Eighteenth Annual ACM SIGIR Conference
on Research and Development in Information Retrieval, 68-73. ACM Press, July 1995.
 Luhn, H.P. 1958. The automatic creation of literature abstracts. IBM Journal of Research and
Development 2(2): 159-165, April 1958.
 Maizell, R.E., Smith, J.F., and Singer, T.E.R. 1971. Abstracting scientific and technical
literature: An introductory guide and text for scientists, abstractors and management. New
York: Willey-Interscience, John Willey & Sons Inc.
 Mani, I., and Bloedorn, E. 1997. Multi-document summarisation by graph search and
matching. In Proceedings of AAAI-97, Providence Rhode Island.
References
 McKeown, K., and Radev, D.R. 1995. Generating summaries from multiple news articles.
In Fox, E.A., Ingwersen, P., and Fidel, R. eds. Proceedings of the Eighteenth Annual ACM
SIGIR Conference on Research and Development in Information Retrieval, 74-82. ACM
Press, July 1995.
 Miike, S., Itoh, E., Ono, K., and Sumita, K. 1994. A full-text retrieval system with a
dynamic abstract generation function. In Croft, W.B., and van Rijbergen, C.J. eds.
Proceedings of the Seventeenth Annual ACM SIGIR Conference on Research and
Development in Information Retrieval, 152-161. ACM Press, July 1994.
 Miller, S. 1984. Experimental design and statistics (2nd edition). New York: Routledge.
 Paice, C.D. 1990. Constructing literature abstracts by computer: Techniques and
prospects. Information Processing & Management 26(1):171-186.
 Porter, M.F. 1980. An algorithm for suffix stripping. Program - automated library and
information systems 14(3):130-137.
 Rush, J.E., Salvador, R., and Zamora, A. 1971. Automatic abstracting and indexing. II.
Production of indicative abstracts by application of contextual inference and syntactic
coherence criteria. Journal of the American Society for Information Science 22(4):260-
274.
 Salton, G., Singhal, A., Mitra, M., and Buckley, C. 1997. Automatic text structuring and
summarisation. Information Processing & Management 33(2):193-20.
Advantages of Query
Biased Summaries in
Information Retrieval
Anastasios Tombros
Computing Science Department University of Glasgow, Scotland
Mark Sanderson
CIIR, Computing Science Department University of Massachusetts,
U.S.A.
Presentation by Onur Yılmaz

Más contenido relacionado

La actualidad más candente

Probabilistic Information Retrieval
Probabilistic Information RetrievalProbabilistic Information Retrieval
Probabilistic Information RetrievalHarsh Thakkar
 
Review of Various Text Categorization Methods
Review of Various Text Categorization MethodsReview of Various Text Categorization Methods
Review of Various Text Categorization Methodsiosrjce
 
Relevance feature discovery for text mining
Relevance feature discovery for text miningRelevance feature discovery for text mining
Relevance feature discovery for text miningredpel dot com
 
Document Retrieval System, a Case Study
Document Retrieval System, a Case StudyDocument Retrieval System, a Case Study
Document Retrieval System, a Case StudyIJERA Editor
 
76 s201906
76 s20190676 s201906
76 s201906IJRAT
 
Research Paper Selection Based On an Ontology and Text Mining Technique Using...
Research Paper Selection Based On an Ontology and Text Mining Technique Using...Research Paper Selection Based On an Ontology and Text Mining Technique Using...
Research Paper Selection Based On an Ontology and Text Mining Technique Using...IOSR Journals
 
Text Mining at Feature Level: A Review
Text Mining at Feature Level: A ReviewText Mining at Feature Level: A Review
Text Mining at Feature Level: A ReviewINFOGAIN PUBLICATION
 
Text Classification using Support Vector Machine
Text Classification using Support Vector MachineText Classification using Support Vector Machine
Text Classification using Support Vector Machineinventionjournals
 
Semantics-based clustering approach for similar research area detection
Semantics-based clustering approach for similar research area detectionSemantics-based clustering approach for similar research area detection
Semantics-based clustering approach for similar research area detectionTELKOMNIKA JOURNAL
 
Text Document categorization using support vector machine
Text Document categorization using support vector machineText Document categorization using support vector machine
Text Document categorization using support vector machineIRJET Journal
 
A rough set based hybrid method to text categorization
A rough set based hybrid method to text categorizationA rough set based hybrid method to text categorization
A rough set based hybrid method to text categorizationNinad Samel
 
An efficient-classification-model-for-unstructured-text-document
An efficient-classification-model-for-unstructured-text-documentAn efficient-classification-model-for-unstructured-text-document
An efficient-classification-model-for-unstructured-text-documentSaleihGero
 
RES860 P6 IndividualProject - version101
RES860 P6 IndividualProject - version101RES860 P6 IndividualProject - version101
RES860 P6 IndividualProject - version101ThienSi Le
 
Correlation Coefficient Based Average Textual Similarity Model for Informatio...
Correlation Coefficient Based Average Textual Similarity Model for Informatio...Correlation Coefficient Based Average Textual Similarity Model for Informatio...
Correlation Coefficient Based Average Textual Similarity Model for Informatio...IOSR Journals
 
DOMAIN KEYWORD EXTRACTION TECHNIQUE: A NEW WEIGHTING METHOD BASED ON FREQUENC...
DOMAIN KEYWORD EXTRACTION TECHNIQUE: A NEW WEIGHTING METHOD BASED ON FREQUENC...DOMAIN KEYWORD EXTRACTION TECHNIQUE: A NEW WEIGHTING METHOD BASED ON FREQUENC...
DOMAIN KEYWORD EXTRACTION TECHNIQUE: A NEW WEIGHTING METHOD BASED ON FREQUENC...cscpconf
 

La actualidad más candente (17)

N15-1013
N15-1013N15-1013
N15-1013
 
Probabilistic Information Retrieval
Probabilistic Information RetrievalProbabilistic Information Retrieval
Probabilistic Information Retrieval
 
Review of Various Text Categorization Methods
Review of Various Text Categorization MethodsReview of Various Text Categorization Methods
Review of Various Text Categorization Methods
 
Relevance feature discovery for text mining
Relevance feature discovery for text miningRelevance feature discovery for text mining
Relevance feature discovery for text mining
 
Document Retrieval System, a Case Study
Document Retrieval System, a Case StudyDocument Retrieval System, a Case Study
Document Retrieval System, a Case Study
 
76 s201906
76 s20190676 s201906
76 s201906
 
Research Paper Selection Based On an Ontology and Text Mining Technique Using...
Research Paper Selection Based On an Ontology and Text Mining Technique Using...Research Paper Selection Based On an Ontology and Text Mining Technique Using...
Research Paper Selection Based On an Ontology and Text Mining Technique Using...
 
Text Mining at Feature Level: A Review
Text Mining at Feature Level: A ReviewText Mining at Feature Level: A Review
Text Mining at Feature Level: A Review
 
Text Classification using Support Vector Machine
Text Classification using Support Vector MachineText Classification using Support Vector Machine
Text Classification using Support Vector Machine
 
N045038690
N045038690N045038690
N045038690
 
Semantics-based clustering approach for similar research area detection
Semantics-based clustering approach for similar research area detectionSemantics-based clustering approach for similar research area detection
Semantics-based clustering approach for similar research area detection
 
Text Document categorization using support vector machine
Text Document categorization using support vector machineText Document categorization using support vector machine
Text Document categorization using support vector machine
 
A rough set based hybrid method to text categorization
A rough set based hybrid method to text categorizationA rough set based hybrid method to text categorization
A rough set based hybrid method to text categorization
 
An efficient-classification-model-for-unstructured-text-document
An efficient-classification-model-for-unstructured-text-documentAn efficient-classification-model-for-unstructured-text-document
An efficient-classification-model-for-unstructured-text-document
 
RES860 P6 IndividualProject - version101
RES860 P6 IndividualProject - version101RES860 P6 IndividualProject - version101
RES860 P6 IndividualProject - version101
 
Correlation Coefficient Based Average Textual Similarity Model for Informatio...
Correlation Coefficient Based Average Textual Similarity Model for Informatio...Correlation Coefficient Based Average Textual Similarity Model for Informatio...
Correlation Coefficient Based Average Textual Similarity Model for Informatio...
 
DOMAIN KEYWORD EXTRACTION TECHNIQUE: A NEW WEIGHTING METHOD BASED ON FREQUENC...
DOMAIN KEYWORD EXTRACTION TECHNIQUE: A NEW WEIGHTING METHOD BASED ON FREQUENC...DOMAIN KEYWORD EXTRACTION TECHNIQUE: A NEW WEIGHTING METHOD BASED ON FREQUENC...
DOMAIN KEYWORD EXTRACTION TECHNIQUE: A NEW WEIGHTING METHOD BASED ON FREQUENC...
 

Similar a Query Biased Summaries Improve IR Relevance Judgements

Query based summarization
Query based summarizationQuery based summarization
Query based summarizationdamom77
 
Answer extraction and passage retrieval for
Answer extraction and passage retrieval forAnswer extraction and passage retrieval for
Answer extraction and passage retrieval forWaheeb Ahmed
 
IRJET- Automatic Recapitulation of Text Document
IRJET- Automatic Recapitulation of Text DocumentIRJET- Automatic Recapitulation of Text Document
IRJET- Automatic Recapitulation of Text DocumentIRJET Journal
 
Text databases and information retrieval
Text databases and information retrievalText databases and information retrieval
Text databases and information retrievalunyil96
 
Architecture of an ontology based domain-specific natural language question a...
Architecture of an ontology based domain-specific natural language question a...Architecture of an ontology based domain-specific natural language question a...
Architecture of an ontology based domain-specific natural language question a...IJwest
 
Search term recommendation and non-textual ranking evaluated
 Search term recommendation and non-textual ranking evaluated Search term recommendation and non-textual ranking evaluated
Search term recommendation and non-textual ranking evaluatedGESIS
 
Information_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_HabibInformation_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_HabibEl Habib NFAOUI
 
An Effective Academic Research Papers Recommendation For Non-Profiled Users
An Effective Academic Research Papers Recommendation For Non-Profiled UsersAn Effective Academic Research Papers Recommendation For Non-Profiled Users
An Effective Academic Research Papers Recommendation For Non-Profiled UsersNat Rice
 
Performance Evaluation of Query Processing Techniques in Information Retrieval
Performance Evaluation of Query Processing Techniques in Information RetrievalPerformance Evaluation of Query Processing Techniques in Information Retrieval
Performance Evaluation of Query Processing Techniques in Information Retrievalidescitation
 
Arabic text categorization algorithm using vector evaluation method
Arabic text categorization algorithm using vector evaluation methodArabic text categorization algorithm using vector evaluation method
Arabic text categorization algorithm using vector evaluation methodijcsit
 
kantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.pptkantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.pptbutest
 
A template based algorithm for automatic summarization and dialogue managemen...
A template based algorithm for automatic summarization and dialogue managemen...A template based algorithm for automatic summarization and dialogue managemen...
A template based algorithm for automatic summarization and dialogue managemen...eSAT Journals
 
Enhancing the labelling technique of
Enhancing the labelling technique ofEnhancing the labelling technique of
Enhancing the labelling technique ofIJDKP
 
Query Answering Approach Based on Document Summarization
Query Answering Approach Based on Document SummarizationQuery Answering Approach Based on Document Summarization
Query Answering Approach Based on Document SummarizationIJMER
 
Tovek Presentation by Livio Costantini
Tovek Presentation by Livio CostantiniTovek Presentation by Livio Costantini
Tovek Presentation by Livio Costantinimaxfalc
 
A domain specific automatic text summarization using fuzzy logic
A domain specific automatic text summarization using fuzzy logicA domain specific automatic text summarization using fuzzy logic
A domain specific automatic text summarization using fuzzy logicIAEME Publication
 
8 efficient multi-document summary generation using neural network
8 efficient multi-document summary generation using neural network8 efficient multi-document summary generation using neural network
8 efficient multi-document summary generation using neural networkINFOGAIN PUBLICATION
 
Multi-Topic Multi-Document Summarizer
Multi-Topic Multi-Document SummarizerMulti-Topic Multi-Document Summarizer
Multi-Topic Multi-Document Summarizerijcsit
 

Similar a Query Biased Summaries Improve IR Relevance Judgements (20)

Query Based Summarization
Query Based SummarizationQuery Based Summarization
Query Based Summarization
 
Query based summarization
Query based summarizationQuery based summarization
Query based summarization
 
Answer extraction and passage retrieval for
Answer extraction and passage retrieval forAnswer extraction and passage retrieval for
Answer extraction and passage retrieval for
 
IRJET- Automatic Recapitulation of Text Document
IRJET- Automatic Recapitulation of Text DocumentIRJET- Automatic Recapitulation of Text Document
IRJET- Automatic Recapitulation of Text Document
 
Text databases and information retrieval
Text databases and information retrievalText databases and information retrieval
Text databases and information retrieval
 
Architecture of an ontology based domain-specific natural language question a...
Architecture of an ontology based domain-specific natural language question a...Architecture of an ontology based domain-specific natural language question a...
Architecture of an ontology based domain-specific natural language question a...
 
Search term recommendation and non-textual ranking evaluated
 Search term recommendation and non-textual ranking evaluated Search term recommendation and non-textual ranking evaluated
Search term recommendation and non-textual ranking evaluated
 
Information_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_HabibInformation_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_Habib
 
An Effective Academic Research Papers Recommendation For Non-Profiled Users
An Effective Academic Research Papers Recommendation For Non-Profiled UsersAn Effective Academic Research Papers Recommendation For Non-Profiled Users
An Effective Academic Research Papers Recommendation For Non-Profiled Users
 
Performance Evaluation of Query Processing Techniques in Information Retrieval
Performance Evaluation of Query Processing Techniques in Information RetrievalPerformance Evaluation of Query Processing Techniques in Information Retrieval
Performance Evaluation of Query Processing Techniques in Information Retrieval
 
Arabic text categorization algorithm using vector evaluation method
Arabic text categorization algorithm using vector evaluation methodArabic text categorization algorithm using vector evaluation method
Arabic text categorization algorithm using vector evaluation method
 
kantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.pptkantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.ppt
 
A template based algorithm for automatic summarization and dialogue managemen...
A template based algorithm for automatic summarization and dialogue managemen...A template based algorithm for automatic summarization and dialogue managemen...
A template based algorithm for automatic summarization and dialogue managemen...
 
Enhancing the labelling technique of
Enhancing the labelling technique ofEnhancing the labelling technique of
Enhancing the labelling technique of
 
Query Answering Approach Based on Document Summarization
Query Answering Approach Based on Document SummarizationQuery Answering Approach Based on Document Summarization
Query Answering Approach Based on Document Summarization
 
K0936266
K0936266K0936266
K0936266
 
Tovek Presentation by Livio Costantini
Tovek Presentation by Livio CostantiniTovek Presentation by Livio Costantini
Tovek Presentation by Livio Costantini
 
A domain specific automatic text summarization using fuzzy logic
A domain specific automatic text summarization using fuzzy logicA domain specific automatic text summarization using fuzzy logic
A domain specific automatic text summarization using fuzzy logic
 
8 efficient multi-document summary generation using neural network
8 efficient multi-document summary generation using neural network8 efficient multi-document summary generation using neural network
8 efficient multi-document summary generation using neural network
 
Multi-Topic Multi-Document Summarizer
Multi-Topic Multi-Document SummarizerMulti-Topic Multi-Document Summarizer
Multi-Topic Multi-Document Summarizer
 

Último

Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 

Último (20)

Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 

Query Biased Summaries Improve IR Relevance Judgements

  • 1. Advantages of Query Biased Summaries in Information Retrieval Anastasios Tombros Computing Science Department University of Glasgow, Scotland Mark Sanderson CIIR, Computing Science Department University of Massachusetts, U.S.A. Presentation by Onur Yılmaz
  • 2. Outline  Introduction  Summarization System  Experimental Design  Experimental Results  Further Work and Conclusions
  • 3. Introduction Interaction with an IR system  User enters a specific information need,  Query  Response of a typical system  Retrieved document list
  • 5. Introduction Interaction with an IR system  Title First few sentences URL Relevance Quantification
  • 6. Introduction Interaction with an IR system  Users have to decide which of the retrieved documents are most likely to convey their information need  Ideally, it should be possible to make this decision without having to refer to the full document text
  • 7. Introduction Interaction with an IR system  Title First few sentences Not clear which the document relates to a user’s query
  • 8. Introduction What is presented?  Aim  to minimize users’ need to refer to the full document text  to provide enough information to support their retrieval decisions
  • 9. Introduction What is presented?  Proposed method  An automatically generated summary of each document in a retrieved document list, biased to a user’s query
  • 10. Introduction Document Summary  Condensation of a full text document  indicative of the source’s content  informative
  • 11. Introduction Sentence Extraction  Selection of sentences from the original document  Scores are assigned to sentences according to a set of extraction criteria  Best-scoring sentences are presented in the summary
  • 12. Summarisation System  Aim of system  Providing information on the relevance of documents retrieved in response to their query  Summaries were to be user-directed: biased towards the user’s query
  • 13. Summarisation System Articles  Wall Street Journal (WSJ) taken from the TREC collection (Text REtrieval Conferences)
  • 14. Summarisation System Methodology  Examining 50 randomly selected articles from the collection  Extract conclusions about the distribution of important information within them  Title  Headings  Leading paragraph  Overall structural
  • 15. Summarisation System Sentence Extraction - Title Method  Headlines of news articles  tend to reveal the major subject of the article  Terms in the title section of the documents  assigned a positive weight (title score)
  • 16.  Leading paragraph of each article  Provides much information on the article’s content  In order to quantify this contribution,  An ordinal weight was assigned to the first two sentences of each article. Summarisation System Sentence Extraction - Title Method
  • 17. Summarisation System Sentence Extraction - Title Method  Section Headings  «heading score» was assigned to each one of the sentences comprising a heading
  • 18. Summarisation System Sentence Extraction - Term Occurrence (TO)  Number of term occurrences (TO) within each document to further assign weights to sentences  Reasonable TO  7  medium-sized documents  (between 25 - 40 sentences)  TO value is augmented by 10% of the increase in document size
  • 19. Summarisation System Sentence Extraction - Term Occurrence (TO)  Significantly Related Terms  Two terms are are significant  No more than 4 non-significant words between them Recent studies that show that in the English language 98% of the lexical relations occur between words within a span of 5 words in a Sentence.
  • 20. Summarisation System Biasing summaries towards queries  Query Score  calculated for each of the sentences of a document  based on the distribution of query terms in each sentence
  • 21. Summarisation System Biasing summaries towards queries  For each sentence +Query score Score obtained by the sentence extraction methods Final Score=
  • 22. Summarisation System Biasing summaries towards queries  Summary for each document  Output the top-scoring sentence  Until desired summary length reached 15% of the document's length up to a maximum of five sentences
  • 23. Experimental Design  Aim of experiment  Use of query biased summaries in a retrieved document list  would have a positive effect on  the process of relevance judgement by users
  • 24. Experimental Design Experimental Conditions  Two levels of an independent variable:  Use of query-biased summaries in a ranked list of retrieved documents  Use of static pre-defined summaries
  • 25. Experimental Design Groups of Subjects  Two groups consisting of 10 people  Postgraduate students  Population is not representative of that which we wish to generalize the conclusions
  • 26. Experimental Design Retrieval Task  Identify, in a limited amount of time, as many relevant documents as possible  Subjects presented with a retrieved document list and query
  • 27. Experimental Design Queries  Randomly selected from the queries of the TREC test collection  Standard against which the subjects’ relevance judgements compared
  • 28. Experimental Design IR System  Classic document ranking system  tf•idf term weighting scheme  with stop word removal  word stemming using the Porter stemmer
  • 29. Experimental Design Operationalising the Experiment Select typical or proposed method Assign 5 random queries to subject Present to subject retrieved document list (50 highest ranked documents) Subject identifies the relevant documents in 5 minutes
  • 30. Experimental Design Experiment – Typical IR System Output 
  • 31. Experimental Design Experiment – Proposed IR System Output 
  • 32. Experimental Results Dependent Variable  Dependent variable:  Performance of the users in the  process of relevance judgements on documents retrieved by specific queries
  • 33. Experimental Results Criteria  Recall and precision of the relevance judgements  The speed of judgements  Need of the assistance from the full text of the retrieved documents.  The subjective opinion about the assistance provided
  • 34. Experimental Results Recall  Recall: Number of relevant documents correctly identified by a subject for a query divided by the total number of relevant documents, within the examined ones, for that query «summary group» managed to successfully identify a larger number of relevant documents
  • 35. Experimental Results Precision  Precision: Relevant documents correctly identified divided by the total number of indicated relevant documents for a query
  • 36. Experimental Results Recall and Precision  Query biased summaries allow users to identify more relevant documents, and identify them more accurately.
  • 37. Experimental Results Speed  Avg. Number of Documents: Actual number of documents that each subject managed to examine within the specified 5 minute period 13% increase Tendency for quicker judgement
  • 38. Experimental Results Reference to the full text of the documents  Static summaries based on the first few lines of a document is inadequate.
  • 39. Further Work and Conclusions Summary  Query based summaries assist users in performing relevance judgements more accurately and more quickly.  Users can identify more relevant documents for each query, while at the same time make fewer mistakes.
  • 40. Further Work and Conclusions Summary  Query based summaries almost alleviate the users’ need to refer to the full text  Users rely almost solely on the information conveyed in the query-biased summaries in order to perform their relevance judgements.
  • 41. Further Work and Conclusions Future Work  Possible extensions of the work  Repeating the experiments by simulating conditions encountered when searching for information on the web.  Intend to examine different summarisation techniques and to apply our system to alternative test collections.
  • 42. References  Abracos, J., and Lopes, G.P. 1997. Statistical methods for retrieving most significant paragraphs in newspaper articles. In Proceedings of the ACL'97/EACL'97 Workshop on Intelligent Scalable Text Summarisation (ISTS '97), 51-57. Madrid, Spain, July 11 1997.  Brandow, R., Mitze, K., and Rau, L.F. 1995. Automatic condensation of electronic publications by sentence selection. Information Processing & Management 31(5): 675-685, September 1995.  Callan, J.P. 1994. Passage-level evidence in document retrieval. In Croft, W.B., and van Rijbergen, C.J. eds. Proceedings of the Seventeenth Annual ACM SIGIR Conference on Research and Development in Information Retrieval, 302-310. ACM Press, July 1994.  Edmundson, H.P. 1964. Problems in automatic abstracting. Communications of the ACM 7(4):259-263, April 1964.  Edmundson, H.P. 1969. New methods in automatic abstracting. Journal of the ACM 16(2):264-285, April 1969.  Hand, T.F. 1997. A proposal for a task-based evaluation of text summarisation systems. In Proceedings of the ACL’97/EACL’97 Workshop on Intelligent Scalable Text Summarisation (ISTS ’97), 31-38. Madrid, Spain, July 11 1997.  Harman, D. 1996. Overview of the Fifth Text REtrieval Conference (TREC-5). In Proceedings of the Text Retrieval Conference (TREC-5), National Institute of Standards and Technology, Gaithersburg, MD 20899, USA.
  • 43. References  Jacobs, P.S., and Rau, L.F. 1990. Scisor: Extracting information from on-line news. Communications of the ACM 33(11):88-97, November 1990.  Keppel, G. 1973. Design and analysis: A researcher’s handbook. New Jersey: Prentice Hall.  Knaus, D., Mittendorf, E., Schauble, P., and Sheridan, P. 1995. Highlighting relevant passages for users of the interactive SPIDER retrieval system. In Proceedings of the Text Retrieval Conference (TREC-4), National Institute of Standards and Technology, Gaithersburg, MD 20899, USA.  Kupiec, J., Pedersen, J., and Chen, F. 1995. A trainable document summariser. In Fox, E.A., Ingwersen, P., and Fidel, R. eds. Proceedings of the Eighteenth Annual ACM SIGIR Conference on Research and Development in Information Retrieval, 68-73. ACM Press, July 1995.  Luhn, H.P. 1958. The automatic creation of literature abstracts. IBM Journal of Research and Development 2(2): 159-165, April 1958.  Maizell, R.E., Smith, J.F., and Singer, T.E.R. 1971. Abstracting scientific and technical literature: An introductory guide and text for scientists, abstractors and management. New York: Willey-Interscience, John Willey & Sons Inc.  Mani, I., and Bloedorn, E. 1997. Multi-document summarisation by graph search and matching. In Proceedings of AAAI-97, Providence Rhode Island.
  • 44. References  McKeown, K., and Radev, D.R. 1995. Generating summaries from multiple news articles. In Fox, E.A., Ingwersen, P., and Fidel, R. eds. Proceedings of the Eighteenth Annual ACM SIGIR Conference on Research and Development in Information Retrieval, 74-82. ACM Press, July 1995.  Miike, S., Itoh, E., Ono, K., and Sumita, K. 1994. A full-text retrieval system with a dynamic abstract generation function. In Croft, W.B., and van Rijbergen, C.J. eds. Proceedings of the Seventeenth Annual ACM SIGIR Conference on Research and Development in Information Retrieval, 152-161. ACM Press, July 1994.  Miller, S. 1984. Experimental design and statistics (2nd edition). New York: Routledge.  Paice, C.D. 1990. Constructing literature abstracts by computer: Techniques and prospects. Information Processing & Management 26(1):171-186.  Porter, M.F. 1980. An algorithm for suffix stripping. Program - automated library and information systems 14(3):130-137.  Rush, J.E., Salvador, R., and Zamora, A. 1971. Automatic abstracting and indexing. II. Production of indicative abstracts by application of contextual inference and syntactic coherence criteria. Journal of the American Society for Information Science 22(4):260- 274.  Salton, G., Singhal, A., Mitra, M., and Buckley, C. 1997. Automatic text structuring and summarisation. Information Processing & Management 33(2):193-20.
  • 45. Advantages of Query Biased Summaries in Information Retrieval Anastasios Tombros Computing Science Department University of Glasgow, Scotland Mark Sanderson CIIR, Computing Science Department University of Massachusetts, U.S.A. Presentation by Onur Yılmaz