SlideShare una empresa de Scribd logo
1 de 36
Effects of Position and Number of Relevant Documents on Users’ Evaluations of System Performance A presentation by Meg Eastwood  on the 2010 paper by D. Kelly, X. Fu, and C. Shah INF 384H September 26th, 2011 1
Diane Kelly Associate Professor, School of Library and Information Science, UNC Chapel Hill ,[object Object]
Ph.D., Rutgers University (Information Science)
MLS, Rutgers University (Information Retrieval)
BA, University of Alabama (Psychology and English)
Graduate Certificate in Cognitive Science, Rutgers Center for Cognitive Science2
Primary Aim of Research “to investigate the  relationship between actual system performance and users’ evaluations of system performance” (pg 9:2) 3
Secondary Aim of Research “to develop an experimental method that can be used to isolate and study specific aspects of the search process” (pg 9:2) 4
Previous Experimental Protocols Traditional lab-based Naturalistic TREC Interactive Track Study entire search episodes Thomas and Hawking (2006) Trade control for “ecological validity” 5 Both designs include so many variables that it can be “difficult to establish causal relationships” (pg 9:2)
Literature Review Main criticisms of previous studies: Evaluation measures were calculated based on TREC assessor’s relevance judgments, not user judgments Users not provided with explicit instructions Users may have been fatigued Low sample sizes 6
Methods 7
Studies 1 and 2 :  effect of position of relevant documents on user’s evaluation of system performance Study 3: effect of number of relevant documents 8
9 Participants were asked to help researchers evaluate four search engines For each search engine, read topic and posed one query
10 After issuing query, all participants were re-directed to the same results page with 10 standardized results
11 Participants asked to evaluate full text of each search result in the order presented and judge the relevance
12 After evaluating all the documents on the results page, participants were asked to evaluate the search engine
Study 1 Operationalized average precision at n Subjects required to evaluate all 10 documents 13
Study 2 Also operationalized average precision at n Subjects instructed to find five relevant documents 14
Study 3 – Operationalized Precision at n 15
Topics and Documents 16 Selected topics associated with newspaper articles about current events Selected documents with “high probability of being judged relevant or not relevant” (pg 9:12)
Study Participants 17 “Convenient sample” (pg 9:27) of undergraduates from UNC 27 participants for each study (1 -3) Demographic information collected: Sex Age Major Search experience Search frequency
Results Relevance Assessments 18
Did users’ relevance judgments agree with baseline assessments? 19
Did users’ relevance judgments agree with baseline assessments? 20
Did the topic affect differences in relevance assessments? 21
How much did relevance assessments vary between documents? 22
Results Evaluations of  System Performance 23
Did participants modify evaluation ratings? 24
Participant ratings compared between performance levels and studies 25
Participant ratings compared between performance levels and studies 26 Study 1 showed no significant differences in ratings according to performance level
Participant ratings compared between performance levels and studies 27 Studies 2 and 3 did show significant differences in ratings according to performance level
What are the differences between study 1 and study 2? Intended difference:  Completion time? 28
What are the differences between study 1 and study 2? Unintended differences: Instructions for study 2 provided clearer performance objective Subjects felt more successful in study 2? 29
User Experienced Precision 30 “experimental manipulations [of precision] were only 90% effective” (pg 9:24)
Are user-experienced precision values correlated with user ratings of system performance? 31
Are user-experienced precision values correlated with user ratings of system performance? 32

Más contenido relacionado

La actualidad más candente

9-Meta Analysis/ Systematic Review
9-Meta Analysis/ Systematic Review9-Meta Analysis/ Systematic Review
9-Meta Analysis/ Systematic ReviewResearchGuru
 
C:\fakepath\applied and participatory paradigm
C:\fakepath\applied and participatory paradigmC:\fakepath\applied and participatory paradigm
C:\fakepath\applied and participatory paradigmRobyn
 
Basics of Systematic Review and Meta-analysis: Part 3
Basics of Systematic Review and Meta-analysis: Part 3Basics of Systematic Review and Meta-analysis: Part 3
Basics of Systematic Review and Meta-analysis: Part 3Rizwan S A
 
Measuring Engagement in Technology-Based Health Interventions
Measuring Engagement in Technology-Based Health InterventionsMeasuring Engagement in Technology-Based Health Interventions
Measuring Engagement in Technology-Based Health InterventionsYTH
 
Comparative and non-comparative study
Comparative and non-comparative studyComparative and non-comparative study
Comparative and non-comparative studyu070536
 
Assignment 2 ppt
Assignment 2 pptAssignment 2 ppt
Assignment 2 pptShiyuLi0903
 
Comparative and non comparative studies
Comparative and non comparative studiesComparative and non comparative studies
Comparative and non comparative studiesu069072
 
Basics of Systematic Review and Meta-analysis: Part 2
Basics of Systematic Review and Meta-analysis: Part 2Basics of Systematic Review and Meta-analysis: Part 2
Basics of Systematic Review and Meta-analysis: Part 2Rizwan S A
 
Threats to Internal Validity
Threats to Internal ValidityThreats to Internal Validity
Threats to Internal ValidityRiya Jain
 
Awareness Support in Global Software Development: A Systematic Review Based o...
Awareness Support in Global Software Development: A Systematic Review Based o...Awareness Support in Global Software Development: A Systematic Review Based o...
Awareness Support in Global Software Development: A Systematic Review Based o...Marco Aurelio Gerosa
 
Persuasive Communication: A Comparison of Major Attitude- Behaviour Theories ...
Persuasive Communication: A Comparison of Major Attitude- Behaviour Theories ...Persuasive Communication: A Comparison of Major Attitude- Behaviour Theories ...
Persuasive Communication: A Comparison of Major Attitude- Behaviour Theories ...Conferenceproceedings
 
Research Process Explained
Research Process ExplainedResearch Process Explained
Research Process Explained360dissertations
 

La actualidad más candente (19)

2. Research Process
2. Research Process2. Research Process
2. Research Process
 
Experimental research
Experimental researchExperimental research
Experimental research
 
9-Meta Analysis/ Systematic Review
9-Meta Analysis/ Systematic Review9-Meta Analysis/ Systematic Review
9-Meta Analysis/ Systematic Review
 
C:\fakepath\applied and participatory paradigm
C:\fakepath\applied and participatory paradigmC:\fakepath\applied and participatory paradigm
C:\fakepath\applied and participatory paradigm
 
Basics of Systematic Review and Meta-analysis: Part 3
Basics of Systematic Review and Meta-analysis: Part 3Basics of Systematic Review and Meta-analysis: Part 3
Basics of Systematic Review and Meta-analysis: Part 3
 
Measuring Engagement in Technology-Based Health Interventions
Measuring Engagement in Technology-Based Health InterventionsMeasuring Engagement in Technology-Based Health Interventions
Measuring Engagement in Technology-Based Health Interventions
 
Trln
TrlnTrln
Trln
 
Comparative and non-comparative study
Comparative and non-comparative studyComparative and non-comparative study
Comparative and non-comparative study
 
Assignment 2 ppt
Assignment 2 pptAssignment 2 ppt
Assignment 2 ppt
 
Comparative and non comparative studies
Comparative and non comparative studiesComparative and non comparative studies
Comparative and non comparative studies
 
meta analysis
meta analysis meta analysis
meta analysis
 
Basics of Systematic Review and Meta-analysis: Part 2
Basics of Systematic Review and Meta-analysis: Part 2Basics of Systematic Review and Meta-analysis: Part 2
Basics of Systematic Review and Meta-analysis: Part 2
 
Threats to Internal Validity
Threats to Internal ValidityThreats to Internal Validity
Threats to Internal Validity
 
Systematic Review & Meta-Analysis Course - Summary Slides
Systematic Review & Meta-Analysis Course - Summary SlidesSystematic Review & Meta-Analysis Course - Summary Slides
Systematic Review & Meta-Analysis Course - Summary Slides
 
Awareness Support in Global Software Development: A Systematic Review Based o...
Awareness Support in Global Software Development: A Systematic Review Based o...Awareness Support in Global Software Development: A Systematic Review Based o...
Awareness Support in Global Software Development: A Systematic Review Based o...
 
Tufts Fwpe Data Analysis For Aota Pd Afc
Tufts Fwpe Data Analysis For Aota Pd AfcTufts Fwpe Data Analysis For Aota Pd Afc
Tufts Fwpe Data Analysis For Aota Pd Afc
 
Persuasive Communication: A Comparison of Major Attitude- Behaviour Theories ...
Persuasive Communication: A Comparison of Major Attitude- Behaviour Theories ...Persuasive Communication: A Comparison of Major Attitude- Behaviour Theories ...
Persuasive Communication: A Comparison of Major Attitude- Behaviour Theories ...
 
Research Process Explained
Research Process ExplainedResearch Process Explained
Research Process Explained
 
Systematic review and meta analysis applications in medication safety 2
Systematic review and meta analysis applications in medication safety 2Systematic review and meta analysis applications in medication safety 2
Systematic review and meta analysis applications in medication safety 2
 

Destacado

Eastwood users lost
Eastwood users lostEastwood users lost
Eastwood users lostmegmeg42
 
Assignment 3 - Certification in Dispute Management
Assignment 3 - Certification in Dispute ManagementAssignment 3 - Certification in Dispute Management
Assignment 3 - Certification in Dispute ManagementJyotpreet Kaur
 
Alexis Is...
Alexis Is...Alexis Is...
Alexis Is...azayfert
 
Euroopa keeltepäev näidis
Euroopa keeltepäev näidisEuroopa keeltepäev näidis
Euroopa keeltepäev näidiskristamahl
 
Communal helpers
Communal helpersCommunal helpers
Communal helperskvilberg
 
D3 nu business plan 'helping hands'
D3 nu business plan 'helping hands'D3 nu business plan 'helping hands'
D3 nu business plan 'helping hands'kvilberg
 
การวิจัยการอ่านแบบพาโนรามา
การวิจัยการอ่านแบบพาโนรามาการวิจัยการอ่านแบบพาโนรามา
การวิจัยการอ่านแบบพาโนรามาkruthai40
 
ITPI, Conditions of Engagement and Scale of Professional Fees
ITPI, Conditions of Engagement and Scale of Professional FeesITPI, Conditions of Engagement and Scale of Professional Fees
ITPI, Conditions of Engagement and Scale of Professional FeesShubhranshu Upadhyay
 
A Novel Approach to Fingerprint Identification Using Gabor Filter-Bank
A Novel Approach to Fingerprint Identification Using Gabor Filter-BankA Novel Approach to Fingerprint Identification Using Gabor Filter-Bank
A Novel Approach to Fingerprint Identification Using Gabor Filter-BankIDES Editor
 

Destacado (11)

Eastwood users lost
Eastwood users lostEastwood users lost
Eastwood users lost
 
Assignment 3 - Certification in Dispute Management
Assignment 3 - Certification in Dispute ManagementAssignment 3 - Certification in Dispute Management
Assignment 3 - Certification in Dispute Management
 
Intro to memtech java
Intro to memtech javaIntro to memtech java
Intro to memtech java
 
Alexis Is...
Alexis Is...Alexis Is...
Alexis Is...
 
Euroopa keeltepäev näidis
Euroopa keeltepäev näidisEuroopa keeltepäev näidis
Euroopa keeltepäev näidis
 
Communal helpers
Communal helpersCommunal helpers
Communal helpers
 
D3 nu business plan 'helping hands'
D3 nu business plan 'helping hands'D3 nu business plan 'helping hands'
D3 nu business plan 'helping hands'
 
การวิจัยการอ่านแบบพาโนรามา
การวิจัยการอ่านแบบพาโนรามาการวิจัยการอ่านแบบพาโนรามา
การวิจัยการอ่านแบบพาโนรามา
 
ITPI, Conditions of Engagement and Scale of Professional Fees
ITPI, Conditions of Engagement and Scale of Professional FeesITPI, Conditions of Engagement and Scale of Professional Fees
ITPI, Conditions of Engagement and Scale of Professional Fees
 
Dip fingerprint
Dip fingerprintDip fingerprint
Dip fingerprint
 
A Novel Approach to Fingerprint Identification Using Gabor Filter-Bank
A Novel Approach to Fingerprint Identification Using Gabor Filter-BankA Novel Approach to Fingerprint Identification Using Gabor Filter-Bank
A Novel Approach to Fingerprint Identification Using Gabor Filter-Bank
 

Similar a Eastwood presentation on_kellyetal2010

Study quality in quantitative l2 research (1990–2010) a methodological synthe...
Study quality in quantitative l2 research (1990–2010) a methodological synthe...Study quality in quantitative l2 research (1990–2010) a methodological synthe...
Study quality in quantitative l2 research (1990–2010) a methodological synthe...Mahsa Farahanynia
 
Design based for lisbon 2011
Design based for lisbon 2011Design based for lisbon 2011
Design based for lisbon 2011Terry Anderson
 
What papers should I cite from my reading list? User evaluation of a manuscri...
What papers should I cite from my reading list? User evaluation of a manuscri...What papers should I cite from my reading list? User evaluation of a manuscri...
What papers should I cite from my reading list? User evaluation of a manuscri...Aravind Sesagiri Raamkumar
 
Introduction to Systematic Literature Review method
Introduction to Systematic Literature Review methodIntroduction to Systematic Literature Review method
Introduction to Systematic Literature Review methodNorsaremah Salleh
 
Classification of Researcher's Collaboration Patterns Towards Research Perfor...
Classification of Researcher's Collaboration Patterns Towards Research Perfor...Classification of Researcher's Collaboration Patterns Towards Research Perfor...
Classification of Researcher's Collaboration Patterns Towards Research Perfor...Nur Hazimah Khalid
 
The Influence of Participant Personality in Usability Tests
The Influence of Participant Personality in Usability TestsThe Influence of Participant Personality in Usability Tests
The Influence of Participant Personality in Usability TestsCSCJournals
 
Evaluating e reference
Evaluating e referenceEvaluating e reference
Evaluating e referenceElaine Lasda
 
Validity in Research
Validity in ResearchValidity in Research
Validity in ResearchEcem Ekinci
 
Scalable Exploration of Relevance Prospects to Support Decision Making
Scalable Exploration of Relevance Prospects to Support Decision MakingScalable Exploration of Relevance Prospects to Support Decision Making
Scalable Exploration of Relevance Prospects to Support Decision MakingKatrien Verbert
 
Resource comparison SciKnow 2019
Resource comparison SciKnow 2019Resource comparison SciKnow 2019
Resource comparison SciKnow 2019Allard Oelen
 
Agents vs Users: Visual Recommendation of Research Talks with Multiple Dimens...
Agents vs Users: Visual Recommendation of Research Talks with Multiple Dimens...Agents vs Users: Visual Recommendation of Research Talks with Multiple Dimens...
Agents vs Users: Visual Recommendation of Research Talks with Multiple Dimens...Katrien Verbert
 
Analysis Of Qualitative Methods Used In Computer And Educational Technologies...
Analysis Of Qualitative Methods Used In Computer And Educational Technologies...Analysis Of Qualitative Methods Used In Computer And Educational Technologies...
Analysis Of Qualitative Methods Used In Computer And Educational Technologies...Kristen Carter
 
1_Q2-PRACTICAL-RESEARCH.pptx
1_Q2-PRACTICAL-RESEARCH.pptx1_Q2-PRACTICAL-RESEARCH.pptx
1_Q2-PRACTICAL-RESEARCH.pptxGeraldRefil3
 
Whether simulation models that fall under the information systems category ad...
Whether simulation models that fall under the information systems category ad...Whether simulation models that fall under the information systems category ad...
Whether simulation models that fall under the information systems category ad...Elisavet Andrikopoulou
 
impact of COViD 19.pdf
impact of COViD 19.pdfimpact of COViD 19.pdf
impact of COViD 19.pdfstudywriters
 
Systematic literature review technique.pptx
Systematic literature review technique.pptxSystematic literature review technique.pptx
Systematic literature review technique.pptxTANMAY DAS GUPTA
 
Colleague #1 - Renee Morris Plum investigated the interactio
Colleague #1 - Renee Morris Plum investigated the interactioColleague #1 - Renee Morris Plum investigated the interactio
Colleague #1 - Renee Morris Plum investigated the interactioWilheminaRossi174
 
RDAP14 Poster: Evaluation of research data services: What things should we ev...
RDAP14 Poster: Evaluation of research data services: What things should we ev...RDAP14 Poster: Evaluation of research data services: What things should we ev...
RDAP14 Poster: Evaluation of research data services: What things should we ev...ASIS&T
 
Assessing Perceived Usability of the Data Curation Profiles Toolkit Using th...
Assessing Perceived Usability of the Data Curation Profiles Toolkit  Using th...Assessing Perceived Usability of the Data Curation Profiles Toolkit  Using th...
Assessing Perceived Usability of the Data Curation Profiles Toolkit Using th...Tao Zhang
 
Meta-Analysis of Interaction in Distance Education
Meta-Analysis of Interaction in Distance EducationMeta-Analysis of Interaction in Distance Education
Meta-Analysis of Interaction in Distance EducationSu-Tuan Lulee
 

Similar a Eastwood presentation on_kellyetal2010 (20)

Study quality in quantitative l2 research (1990–2010) a methodological synthe...
Study quality in quantitative l2 research (1990–2010) a methodological synthe...Study quality in quantitative l2 research (1990–2010) a methodological synthe...
Study quality in quantitative l2 research (1990–2010) a methodological synthe...
 
Design based for lisbon 2011
Design based for lisbon 2011Design based for lisbon 2011
Design based for lisbon 2011
 
What papers should I cite from my reading list? User evaluation of a manuscri...
What papers should I cite from my reading list? User evaluation of a manuscri...What papers should I cite from my reading list? User evaluation of a manuscri...
What papers should I cite from my reading list? User evaluation of a manuscri...
 
Introduction to Systematic Literature Review method
Introduction to Systematic Literature Review methodIntroduction to Systematic Literature Review method
Introduction to Systematic Literature Review method
 
Classification of Researcher's Collaboration Patterns Towards Research Perfor...
Classification of Researcher's Collaboration Patterns Towards Research Perfor...Classification of Researcher's Collaboration Patterns Towards Research Perfor...
Classification of Researcher's Collaboration Patterns Towards Research Perfor...
 
The Influence of Participant Personality in Usability Tests
The Influence of Participant Personality in Usability TestsThe Influence of Participant Personality in Usability Tests
The Influence of Participant Personality in Usability Tests
 
Evaluating e reference
Evaluating e referenceEvaluating e reference
Evaluating e reference
 
Validity in Research
Validity in ResearchValidity in Research
Validity in Research
 
Scalable Exploration of Relevance Prospects to Support Decision Making
Scalable Exploration of Relevance Prospects to Support Decision MakingScalable Exploration of Relevance Prospects to Support Decision Making
Scalable Exploration of Relevance Prospects to Support Decision Making
 
Resource comparison SciKnow 2019
Resource comparison SciKnow 2019Resource comparison SciKnow 2019
Resource comparison SciKnow 2019
 
Agents vs Users: Visual Recommendation of Research Talks with Multiple Dimens...
Agents vs Users: Visual Recommendation of Research Talks with Multiple Dimens...Agents vs Users: Visual Recommendation of Research Talks with Multiple Dimens...
Agents vs Users: Visual Recommendation of Research Talks with Multiple Dimens...
 
Analysis Of Qualitative Methods Used In Computer And Educational Technologies...
Analysis Of Qualitative Methods Used In Computer And Educational Technologies...Analysis Of Qualitative Methods Used In Computer And Educational Technologies...
Analysis Of Qualitative Methods Used In Computer And Educational Technologies...
 
1_Q2-PRACTICAL-RESEARCH.pptx
1_Q2-PRACTICAL-RESEARCH.pptx1_Q2-PRACTICAL-RESEARCH.pptx
1_Q2-PRACTICAL-RESEARCH.pptx
 
Whether simulation models that fall under the information systems category ad...
Whether simulation models that fall under the information systems category ad...Whether simulation models that fall under the information systems category ad...
Whether simulation models that fall under the information systems category ad...
 
impact of COViD 19.pdf
impact of COViD 19.pdfimpact of COViD 19.pdf
impact of COViD 19.pdf
 
Systematic literature review technique.pptx
Systematic literature review technique.pptxSystematic literature review technique.pptx
Systematic literature review technique.pptx
 
Colleague #1 - Renee Morris Plum investigated the interactio
Colleague #1 - Renee Morris Plum investigated the interactioColleague #1 - Renee Morris Plum investigated the interactio
Colleague #1 - Renee Morris Plum investigated the interactio
 
RDAP14 Poster: Evaluation of research data services: What things should we ev...
RDAP14 Poster: Evaluation of research data services: What things should we ev...RDAP14 Poster: Evaluation of research data services: What things should we ev...
RDAP14 Poster: Evaluation of research data services: What things should we ev...
 
Assessing Perceived Usability of the Data Curation Profiles Toolkit Using th...
Assessing Perceived Usability of the Data Curation Profiles Toolkit  Using th...Assessing Perceived Usability of the Data Curation Profiles Toolkit  Using th...
Assessing Perceived Usability of the Data Curation Profiles Toolkit Using th...
 
Meta-Analysis of Interaction in Distance Education
Meta-Analysis of Interaction in Distance EducationMeta-Analysis of Interaction in Distance Education
Meta-Analysis of Interaction in Distance Education
 

Último

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 

Último (20)

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 

Eastwood presentation on_kellyetal2010

  • 1. Effects of Position and Number of Relevant Documents on Users’ Evaluations of System Performance A presentation by Meg Eastwood on the 2010 paper by D. Kelly, X. Fu, and C. Shah INF 384H September 26th, 2011 1
  • 2.
  • 3. Ph.D., Rutgers University (Information Science)
  • 4. MLS, Rutgers University (Information Retrieval)
  • 5. BA, University of Alabama (Psychology and English)
  • 6. Graduate Certificate in Cognitive Science, Rutgers Center for Cognitive Science2
  • 7. Primary Aim of Research “to investigate the relationship between actual system performance and users’ evaluations of system performance” (pg 9:2) 3
  • 8. Secondary Aim of Research “to develop an experimental method that can be used to isolate and study specific aspects of the search process” (pg 9:2) 4
  • 9. Previous Experimental Protocols Traditional lab-based Naturalistic TREC Interactive Track Study entire search episodes Thomas and Hawking (2006) Trade control for “ecological validity” 5 Both designs include so many variables that it can be “difficult to establish causal relationships” (pg 9:2)
  • 10. Literature Review Main criticisms of previous studies: Evaluation measures were calculated based on TREC assessor’s relevance judgments, not user judgments Users not provided with explicit instructions Users may have been fatigued Low sample sizes 6
  • 12. Studies 1 and 2 : effect of position of relevant documents on user’s evaluation of system performance Study 3: effect of number of relevant documents 8
  • 13. 9 Participants were asked to help researchers evaluate four search engines For each search engine, read topic and posed one query
  • 14. 10 After issuing query, all participants were re-directed to the same results page with 10 standardized results
  • 15. 11 Participants asked to evaluate full text of each search result in the order presented and judge the relevance
  • 16. 12 After evaluating all the documents on the results page, participants were asked to evaluate the search engine
  • 17. Study 1 Operationalized average precision at n Subjects required to evaluate all 10 documents 13
  • 18. Study 2 Also operationalized average precision at n Subjects instructed to find five relevant documents 14
  • 19. Study 3 – Operationalized Precision at n 15
  • 20. Topics and Documents 16 Selected topics associated with newspaper articles about current events Selected documents with “high probability of being judged relevant or not relevant” (pg 9:12)
  • 21. Study Participants 17 “Convenient sample” (pg 9:27) of undergraduates from UNC 27 participants for each study (1 -3) Demographic information collected: Sex Age Major Search experience Search frequency
  • 23. Did users’ relevance judgments agree with baseline assessments? 19
  • 24. Did users’ relevance judgments agree with baseline assessments? 20
  • 25. Did the topic affect differences in relevance assessments? 21
  • 26. How much did relevance assessments vary between documents? 22
  • 27. Results Evaluations of System Performance 23
  • 28. Did participants modify evaluation ratings? 24
  • 29. Participant ratings compared between performance levels and studies 25
  • 30. Participant ratings compared between performance levels and studies 26 Study 1 showed no significant differences in ratings according to performance level
  • 31. Participant ratings compared between performance levels and studies 27 Studies 2 and 3 did show significant differences in ratings according to performance level
  • 32. What are the differences between study 1 and study 2? Intended difference: Completion time? 28
  • 33. What are the differences between study 1 and study 2? Unintended differences: Instructions for study 2 provided clearer performance objective Subjects felt more successful in study 2? 29
  • 34. User Experienced Precision 30 “experimental manipulations [of precision] were only 90% effective” (pg 9:24)
  • 35. Are user-experienced precision values correlated with user ratings of system performance? 31
  • 36. Are user-experienced precision values correlated with user ratings of system performance? 32
  • 37. Regression analysis: can you use experienced precision to predict user evaluation? 33
  • 38. Authors’ Discussion and Conclusions “…variations in precision at 10 scores have the greatest impact on subjects’ evaluation ratings.” (pg 9:26) Thoughtful analysis of experimental caveats and generalizability of results Convenient sample of students Only one genre of documents represented Are these results specific to informational/exploratory tasks? 34
  • 39. Suggested Class Discussion Topics Areas where the experiment may have been too tightly controlled/artificial: Controlling order in which users could rate documents? Areas where the experiment may not have been as controlled as the authors intended: Allowing subjects to formulate own queries Study 2 allowed participants to feel “successful”? Ten-point evaluation scale versus five-point evaluation scale? 35
  • 40. References Kelly, D., Fu, X., and Shah, C. 2010. Effects of position and number of relevant documents retrieved on users’ evaluations of system performance. ACM Trans. Inf. Syst. 28, 2, Article 9 (May 2010), 29 pages. DOI 10.1145/1740592.1740597. http://doi.acm.org/10.1145/1740592.1740597 36

Notas del editor

  1. “My research is focused on information search behavior and the design and evaluation of systems that support interactive information retrieval.”UNC Chapel Hill : according to US News and World Report, they have the #2 library science graduate school in nation– very strong programXun Fu and Chirag Shah were P.h.D students in the program at the time this article was written