The document discusses various survey reduction techniques to reduce administration time and increase engagement while maintaining data quality. It recommends making surveys seem short by streamlining the entire experience. Specific techniques include reducing instructions, using archival demographics, strategic placement of eligibility questions, and skip/branch logic. Scale reduction can be done through item analysis, factor analysis, and relating items to external criteria. Unobtrusive observation and one-item measures are also discussed as alternatives to surveys. Trade-offs of reduced surveys include potential lower construct coverage and less information obtained per respondent.
2. Primary Goal: Reduce Administration Time Secondary goals Reduce perceived administration time Increase the engagement of the respondent with the experience of completing instrument lock in interest and excitement from the start Reduce the extent of missing and erroneous data due to carelessness, rushing, test forms that are hard to use, etc. Increase the respondents’ ease of experience (maybe even enjoyment!) so that they will persist to the end AND that they will respond again next year (or whenever the next survey comes out) Conclusions? Make the survey SEEM as short and compact as possible Streamline the WHOLE EXPERIENCE from the first call for participation all the way to the end of the final page of the instrument Focus test-reduction efforts on the easy stuff before diving into the nitty-gritty statistical stuff 2
3. 3 Please choose the option that most closely fits how youdescribe yourself. Please select only one of the two options: Female [] Male []
4. Instruction Reduction Fewer than 4% of respondents make use of printed instructions:Novick and Ward (2006, ACM-SIGDOC) Comprehension of instructions only influences novice performance on surveys: Catrambone(1990; HCI) Instructions on average are written five grade levels above average grade level of respondent; 23% of respondents failed to understand at least one element of instructions: Spandorferet al. (1993; Annals of EM) Unless you are working with a special/unusual population, you can assume that respondents know how to complete Likert scales and other common response formats without instructions Most people don’t read instructions anyway. When they do, the instructions often don’t help them respond any better! If your response format is so novel that people require instructions, then you have a substantial burden to pilot test, in order to ensure that people comprehend the instructions and respond appropriately. Otherwise, do not take the risk! 4
8. Respondents should feel like demographics are not serving to identify them in their survey responses.
9. You could offer respondents two choices: match (or automatically fill in) some/all demographic data using the code number provided in your invitation email (or on a paper letter); they fill in the demographic data (on web-based surveys, a reveal can branch respondents to the demographics page) 6
10. Eligibility If a survey has eligibility requirements, the screening questions should be placed at the earliest possible point in the survey. (eligibility requirements can appear in instructions, but this should not be the sole method of screening out ineligible respondents) Skip Logic Skip logic actually shortens the survey by setting aside questions for which the respondent is ineligible. Branching Branching may not shorten, but can improve the user experience by offering questions specifically focused to the respondent’s demographic or reported experience. 7 Illustration credit: Vovici.com Eligibility, Skip Logic, and Branching
11. Ever answer a survey where you knew that your answer would predict how many questions you would have to answer after that? e.g., “How many hotel chains have you been to in the last year?” If users can predict that their eligibility, the survey skip logic, or survey branching will lead to longer responses, more complex responses, or more difficult or tedious responses, they may: Abandon the survey Backup and change their answer to the conditional with less work (if the interface permits it). Branch design should try not to imply what the user would have experienced in another branch. Paths through the survey should avoid causing considerably more work for some respondents than for others. 8 Implications: Eligibility, Skip Logic, and Branching
12. Panel Designs and/or Multiple Administration Panel designs measure the same respondents on multiple occasions. Typically either predictors are gathered at an early point in time, and outcomes gathered at a later point in time, or both predictors and outcomes are measured at every time point. (There are variations on these two themes). Panel designs are based on maturation and/or intervention processes that require the passage of time. Examples: career aspirations over time, person-organization fit over time, training before/after Minimally, panel designs can help mitigate (though not solve) the problem of common method bias; e.g., responding to a criterion at time 2, respondents tend to forget how they responded at time 1. 9
13. Panel Designs and/or Multiple Administration Survey designers can apply the logic of panel designs to their own surveys: Sometimes, you have to collect a large number of variables (no measure shortening), and it is impractical to do so in a single administration. Generally speaking: Better to have a many short, pleasant survey administrations with a cumulative “work time lost” of an hour vs. long and grinding one hour-long survey. The former can get you happier and less fatigued respondents and better data, hopefully. In the limit, consider the implications of a “Today’s Poll” approach to measuring climate, stress, satisfaction, or other attitudinal variables: One question per day, every day…. 10
14. Unobtrusive Behavioral Observation Surveys appear convenient and relatively inexpensive in and of themselves…however, the cumulative work time lost across all respondents may be quite large. Methods that assess social variables through observations of overt behavior rather than self report can provide indications of stress, satisfaction, organizational citizenship, intent to quit, and other psychologically and organizationally relevant variables. Examples Cigarette breaks over time (frequency, # of incumbents per day); Garbage (weight of trash before/after a recycling program); Social media usage (tweets, blog posts, Facebook); Wear of floor tiles Absenteeism or tardiness records; Incumbent, team and department production quality and quantity measures 11
15. Unobtrusive Behavioral Observation Most unobtrusive observations must be conducted over time: Establish a baseline for the behavior. Examine subsequent time periods to examine changes/trends over time. Generally, much more labor intensive data collection than surveys. Results should be cross-validated with other types of evidence. 12
16. Scale Reduction and One-item Measures Standard scale construction calls for “sampling the construct domain” with items that tap into different aspects of the construct with items that refer to various content areas. Scales with more items can include a larger sample of the behaviors or topics relevant to the construct. 13 RELEVANT measuring what you want measure Construct Domain Item Content CONTAMINATED measuring what you don’t want to measure DEFICIENT not measuring what you want to measure
17. Scale Reduction and One-item Measures When fewer items are used, by necessity they must be either more general in wording to obtain full coverage (hopefully) more narrow to focus on a subset of behaviors/topics Internal consistency reliability reinforces this trade-off: As the number of items gets smaller, inter-item correlation must rise to maintain a given level of internal consistency. However, scales with fewer than 3-5 items rarely achieve acceptable internal consistency without simply becoming alternative wordings of the same questions. Discussion: How many of you have taken a measure where you were being asked the same question again and again? Your reactions? Why was this done? The one-item solution: A one-item measure usually “covers” a construct only if is highly non-specific. A one item measure has a measurable reliability (see Wanous & Hudy; ORM, 2001), but the concept of internal consistency is meaningless. Discuss: A one-item knowledge measure vs. a one-item job satisfaction measure. 14
18. One-item Measure Literature Research using single item measures of each of the five JDI job satisfaction facets and found correlations between .60 and .72 to the full length versions of the JDI scalesNagy (2002) Review of single-item graphical representation scales; so called “faces” scales Patrician (2004) Single item graphic scale for organizational identificationShamir & Kark (2004) Research finding that single item job satisfaction scales systematically overestimate workers’ job satisfactionOshagbemi(1999) Single item measures work best on “homogeneous” constructsLoo (2002) 15
19. Scale Reduction:Technical Considerations Items can be struck from a scale based on three different sets of qualities: 1. Internal item qualities refer to properties of items that can be assessed in reference to other items on the scale or the scale's summated scores. 2. External item qualities refer to connections between the scale (or its individual items) and other constructs or indicators. 3. Judgmental item qualities refer to those issues that require subjective judgment and/or are difficult to assess in isolation of the context in which the scale is administered Literature review suggests that the most widely used method for item selection in scale reduction is some form of internal consistency maximization Corrected item-total correlations provide diagnostic information about internal consistency. In scale reduction efforts, item-total correlations have been employed as a basis for retaining items for a shortened scale version Factor analysis is another technique that, when used for scale reduction, can lead to increased internal consistency, assuming one chooses items that load strongly on a dominant factor 16
20. Scale Reduction II Despite their prevalence, there are important limitations of scale reduction techniques that maximize internal consistency. Choosing items to maximize internal consistency leads to item sets highly redundant in appearance, narrow in content, and potentially low in validity High internal consistency often signifies a failure to adequately sample content from all parts of the construct domain To obtain high values of coefficient alpha, a scale developer need only write a set of items that paraphrase each other or are antonyms of one other. One can expect an equivalent result (i.e., high redundancy) from using the analogous approach in scale reduction, that is, excluding all items but those highly similar in content. 17
21. Scale Reduction III IRT provides an alternative strategy for scale reduction that does not focus on maximizing internal consistency. One should retain items that are highly discriminating (i.e., moderate to large values of a) and one should attempt to include items with a range of item thresholds (i.e., b) that adequately cover the expected range of the trait in measured individuals IRT analysis for scale reduction can be complex and does not provide a definitive answer to the question of which items to retain; rather, it provides evidence for which items might work well together to cover the trait range Relating items to external criteria provides a viable alternative to internal consistency and other internal qualities Because correlations vary across different samples, instruments, and administration contexts, an item that predicts an external criterion best in one sample may not do so in another. Choosing items to maximize a relation with an external criterion runs the risk of a decrease in discriminant validity between the measures of the two constructs. 18
22. Scale Reduction IV The overarching goal of any scale reduction project should be to closely replicate the pattern of relations established within the construct's nomologicalnetwork. In evaluating any given item's relations with external criteria, one should seek moderate correlations with a variety of related scales (i.e., convergent validity) and low correlations with a variety of unrelated measures Researchers may also need to examine other criteria beyond statistical relations to determine which items should remain in an abbreviated scale. Clarity of expression, its relevance to a particular respondent population, the semantic redundancy of an item's content with other items, the perceived invasiveness of an item, and an item's "face" validity. Items lacking apparent relevance, or that are highly redundant with other items on the scale, may be viewed negatively by respondents. To the extent that judgmental qualities can be used to select items with face validity, both the reactions of constituencies and the motivation of respondents maybe enhanced Simple strategy for retention that does not require IRT analysis: Stepwise regression Rank ordered item inclusion in an "optimal" reduced-length scale that accounts for a nearly maximal proportion of variance in its own full-length summated scale score. Order of entry into the stepwise regression is a rank order proxy indicating item goodness Empirical results show that this method performs as well as a brute force combinatorial scan of item combinations; method can also be combined with human judgment to pick items from among the top ranked items (but not in strict ranking order) 19
25. the higher chance that one or more constructs will perform poorly if the measures are not well established/developed
26. less information might be obtained about each respondent and their score on a given construct
27. have to sell its meaningfulness to decision makers who will act on the results20
28. Bibliography Binning, J. F., & Barrett, G. V. (1989). Validity of personnel decisions: A conceptual analysis of the inferential and evidential bases. Journal of Applied Psychology, 74, 478-494. Catrambone, R. (1990). Specific versus general procedures in instructions. Human-Computer Interaction, 5, 49-93. Dillman, D. A., Smyth, J. D., & Christian, L. M. (2008). Internet, mail, and mixed-mode surveys: The tailored design method. Hoboken, NJ: Wiley. Donnellan, M. B., Oswald, F. L., Baird, B. M., & Lucas, R. E. (2006). The Mini-IPIP scales: Tiny-yet-effective measures of the Big Five factors of personality. Psychological Assessment, 18, 192-203. Emons, W. H. M., Sijtsma, K., & Meijer, R. R. (2007). On the consistency of classification using short scales. Psychological Methods, 12, 105-12. Girard, T. A., & Christiansen, B. K. (2008). Clarifying problems and offering solutions for correlated error when assessing the validity of selected-subtest short forms. Psychological Assessment, 20, 76-8. Hinkin, T. R. (1995). A review of scale development practices in the study of organizations. Journal of Management, 21, 967-988. Levy, P. (1968). Short-form tests: A methodological review. Psychological Bulletin, 6, 410-416. Loo, R. (2002). A caveat on using single-item versus multiple-item scales. Journal of Managerial Psychology, 17, 68-75. Lord, F. M. (1965). A strong true-score theory, with applications. Psychometrika, 3, 239-27. Nagy, M. S. (2002). Using a single item approach to measure facet job satisfaction. Journal of Occupational and Organizational Psychology, 75, 77-86. Novick, D. G., & Ward, K. (2006). Why don't people read the manual? Paper presented at the SIGDOC '06 Proceedings of the 24th Annual ACM International Conference on Design of Communication. Oshagbemi, T. (1999). Overall job satisfaction: how good are single versus multiple-item measures? Journal of Managerial Psychology, 14, 388-403. Patrician, P. A. (2004). Single-item graphic representational scales. Nursing Research, 53, 347-352. Shamir, B., & Kark, R. (2004). A single item graphic scale for the measurement of organizational identification. Journal of Occupational and Organizational Psychology, 77, 115-123. 21
29. Bibliography (Continued) Smith, G. T., McCarthy, D. M., & Anderson, K. G. (2000). On the sins of short form development. Psychological Assessment, 12, 102-111. S pandorfer, J. M., Karras, D. J., Hughes, L. A., & Caputo, C. (1995). Comprehension of discharge instructions by patients in an urban emergency department. Annals of Emergency Medicine, 25, 71-74. Stanton, J. M., Sinar, E., Balzer, W. K., Smith, P. C., (2002). Issues and strategies for reducing the length of self-report scale. Personnel Psychology, 55, 167-194. Wanous, J. P., & Hudy, M. J. (2001). Single-item reliability: A replication and extension. Organizational Research Methods, 4, 361-375. Widaman, K. F., Little, T. D., Preacher, K. J., Sawalani, G. M. (2011). On creating and using short forms of scales in secondary research. In K. H. Trzesniewski, M. B. Donnellan, & R. E. Lucas (Eds.). Secondary data analysis: An introduction for psychologists (pp. 39-61). Washington, DC: American Psychological Association. 22