Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Eastwood presentation on_kellyetal2010
1. Effects of Position and Number of Relevant Documents on Users’ Evaluations of System Performance A presentation by Meg Eastwood on the 2010 paper by D. Kelly, X. Fu, and C. Shah INF 384H September 26th, 2011 1
7. Primary Aim of Research “to investigate the relationship between actual system performance and users’ evaluations of system performance” (pg 9:2) 3
8. Secondary Aim of Research “to develop an experimental method that can be used to isolate and study specific aspects of the search process” (pg 9:2) 4
9. Previous Experimental Protocols Traditional lab-based Naturalistic TREC Interactive Track Study entire search episodes Thomas and Hawking (2006) Trade control for “ecological validity” 5 Both designs include so many variables that it can be “difficult to establish causal relationships” (pg 9:2)
10. Literature Review Main criticisms of previous studies: Evaluation measures were calculated based on TREC assessor’s relevance judgments, not user judgments Users not provided with explicit instructions Users may have been fatigued Low sample sizes 6
12. Studies 1 and 2 : effect of position of relevant documents on user’s evaluation of system performance Study 3: effect of number of relevant documents 8
13. 9 Participants were asked to help researchers evaluate four search engines For each search engine, read topic and posed one query
14. 10 After issuing query, all participants were re-directed to the same results page with 10 standardized results
15. 11 Participants asked to evaluate full text of each search result in the order presented and judge the relevance
16. 12 After evaluating all the documents on the results page, participants were asked to evaluate the search engine
17. Study 1 Operationalized average precision at n Subjects required to evaluate all 10 documents 13
18. Study 2 Also operationalized average precision at n Subjects instructed to find five relevant documents 14
20. Topics and Documents 16 Selected topics associated with newspaper articles about current events Selected documents with “high probability of being judged relevant or not relevant” (pg 9:12)
21. Study Participants 17 “Convenient sample” (pg 9:27) of undergraduates from UNC 27 participants for each study (1 -3) Demographic information collected: Sex Age Major Search experience Search frequency
30. Participant ratings compared between performance levels and studies 26 Study 1 showed no significant differences in ratings according to performance level
31. Participant ratings compared between performance levels and studies 27 Studies 2 and 3 did show significant differences in ratings according to performance level
32. What are the differences between study 1 and study 2? Intended difference: Completion time? 28
33. What are the differences between study 1 and study 2? Unintended differences: Instructions for study 2 provided clearer performance objective Subjects felt more successful in study 2? 29
34. User Experienced Precision 30 “experimental manipulations [of precision] were only 90% effective” (pg 9:24)
38. Authors’ Discussion and Conclusions “…variations in precision at 10 scores have the greatest impact on subjects’ evaluation ratings.” (pg 9:26) Thoughtful analysis of experimental caveats and generalizability of results Convenient sample of students Only one genre of documents represented Are these results specific to informational/exploratory tasks? 34
39. Suggested Class Discussion Topics Areas where the experiment may have been too tightly controlled/artificial: Controlling order in which users could rate documents? Areas where the experiment may not have been as controlled as the authors intended: Allowing subjects to formulate own queries Study 2 allowed participants to feel “successful”? Ten-point evaluation scale versus five-point evaluation scale? 35
40. References Kelly, D., Fu, X., and Shah, C. 2010. Effects of position and number of relevant documents retrieved on users’ evaluations of system performance. ACM Trans. Inf. Syst. 28, 2, Article 9 (May 2010), 29 pages. DOI 10.1145/1740592.1740597. http://doi.acm.org/10.1145/1740592.1740597 36
Notas del editor
“My research is focused on information search behavior and the design and evaluation of systems that support interactive information retrieval.”UNC Chapel Hill : according to US News and World Report, they have the #2 library science graduate school in nation– very strong programXun Fu and Chirag Shah were P.h.D students in the program at the time this article was written