Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Bijker, M. (2010) Making Measures And Inferences Reserve

Presentation of Monique Bijker (OU CELSTEC Learning & Cognition)

  • Sé el primero en comentar

  • Sé el primero en recomendar esto

Bijker, M. (2010) Making Measures And Inferences Reserve

  2. 2. OVERVIEW <ul><li>Practical and theoretical rationales </li></ul><ul><li>Literature review </li></ul><ul><li>Fundamental measurement to improve theory </li></ul><ul><li>The development of items and scales </li></ul><ul><li>Participants </li></ul><ul><li>Results </li></ul><ul><li>Differences between educational science and psychology students </li></ul>
  3. 3. PRACTICAL BACKGROUND <ul><li>Instruments for self-reported generic competences, to predict learning performance measures and labor market success measures in causal models (SEM) </li></ul>
  4. 4. TOWER OF BABEL <ul><li>Self-regulating learning capabilities ? </li></ul><ul><li>An intra-individual system of motivations, expectancies, and learning strategies? </li></ul><ul><li>For our purposes, can we use separate variables from the system? </li></ul><ul><li>Self-directing learning capabilies? </li></ul><ul><li>Self-directing career capabilities? </li></ul>
  5. 5. VAGUENESS <ul><li>The concepts are never studied simultaneously and never operationalized and validated simultaneously. </li></ul><ul><li>Unknown whether they are similar or different. </li></ul>
  6. 6. FINDINGS BASED ON LITERATURE <ul><li>Self-efficacy (SE) and self-regulating learning capabilities (SRLC): bottom-up concepts, emerging from social-cognitive experimental research. Predictors of the (more frequent) use of cognitive strategies and predictors of academic achievement (Pintrich et al., 1991, 1993) </li></ul><ul><li>SE: “People’s judgments of their capabilities to organize and execute courses of action required to attain designated types of performances” (Bandura, 1986, p. 391) </li></ul><ul><li>SRLC: planning, monitoring, evaluation. Effort, perseverance, and persistence (Pintrich et al., 1991, 1993). </li></ul>
  7. 7. FINDINGS BASED ON LITERATURE <ul><li>Self-directed learning and career capabilities (SDLC and SDCC): top down concepts, emerging from descriptive adult learning theory, multidisciplinary career theory, and informal learning environments. </li></ul><ul><li>Influenced by social, economic, and political perspectives. </li></ul><ul><li>Predictors of employability. </li></ul><ul><li>SDLC: “A characteristic adaptation to influence work-related learning processes in order to cope for oneself on the labour market” (Raemdonck, 2006, p.13). </li></ul><ul><li>SDCC: “A characteristic adaptation to influence career processes in order to cope for oneself on the labour market” (Raemdonck, 2006, p.13). </li></ul>
  8. 8. UNADDRESSED QUESTIONS <ul><li>Can operationalizations of self-regulating (SRLC; TSE) and self-directing capabilities (SDLC-SDCC) be combined in one construct? </li></ul><ul><li>Do the concepts predict different outcomes? </li></ul><ul><li>Are there any differences in these concepts between different groups of adult learners in formal education programs? </li></ul>
  9. 9. APPROACH <ul><li>Use of 36 existing and the development of 48 new, theory-based items. </li></ul><ul><li>Collection of real data. </li></ul><ul><li>The use of a measurement theory that defines the measures, and constructs person capability measures independent from the items, and items independent from the persons: the Rasch model </li></ul><ul><li>Selection of items that fit the model and verification of the construct validity and dimensionality. </li></ul><ul><li>Creation of measures in the first sample and anchoring the measures in the second sample on the first one, to correct for possibly different response patterns on items. </li></ul>
  10. 10. WHY THE RASCH MODEL? <ul><li>Rasch person and item measures are invariant across samples and tests (generalization). </li></ul><ul><li>Rasch transforms qualitatively ordered (Likert type) raw scores in mathematically ordered person and item interval measures . Each unit of measurement is the same as the next one. </li></ul><ul><li>Rasch recognizes that items contribute differently to the underlying variable (in difficulty, or endorsability). </li></ul><ul><li>Rasch recognizes that scale distances (1-2; 2-3; 3-4; 4-5) in Likert-type items are unequal. Scales of items should fit the Rasch model, to measure person capabilities invariantly. Hence, Likert raw scores are unsuitable to be summed up, and will bias statistical analyses. </li></ul><ul><li>Generalizability theory and CFA cannot adjust for targeting and the lack of interval properties of scales. </li></ul>
  13. 13. FORMULA <ul><li>The polytomous &quot;Rating Scale&quot; model: </li></ul><ul><li>log(Pnij/ Pni(j-1) ) = Bn - Di - Fj </li></ul><ul><li>where </li></ul><ul><li>Pnij is the probability that person n encountering item i is observed in category j, </li></ul><ul><li>Bn is the &quot;ability&quot; measure of person n, </li></ul><ul><li>Di is the &quot;difficulty&quot; measure of item i, the point where the highest and lowest categories of the item are equally probable. </li></ul><ul><li>Fj is the &quot;calibration&quot; measure of category j relative to category j-1, the point where categories j-1 and j are equally probable relative to the measure of the item. </li></ul>
  14. 14. DATA COLLECTION <ul><li>Online questionnaires composed of the 84 items (and additional open questions about curricula). </li></ul><ul><li>Participants: 232 adult students of the school of Educational Sciences and 139 students of the school of Psychology of the Open University of the Netherlands in their premaster (BSc) or master trajectory. </li></ul><ul><li>35% male, 65% female. Average age: 42, SD = 10. </li></ul>
  15. 15. RESULTS <ul><li>Four distinct scales with Cronbach alpha’s of .90 (SDCC; 20 items), .84 (SDLC; 23 items), .72 (SRLC; 6 items), and .79 (TSE; 9 items). (RQ1) </li></ul><ul><li>26 items of the 84 did not fit the model. Predominantly the new items fit the Rasch model in SDLC and SDCC. </li></ul><ul><li>Specifically items in SDLC are very sensitive for misfitting the model, misfits, and disordered thresholds. SDLC has very small categories. </li></ul><ul><li>TSE is characterized by contextualized items. Which items are generalizable to other contexts (suitable for anchoring)? </li></ul><ul><li>SRLC is too easy to endorse. </li></ul><ul><li>SDCC is the most stable and best targeted construct. </li></ul><ul><li>Modeling of the constructs in SEM. (RQ2) </li></ul><ul><li>Three significant differences between ES and Psy. (RQ3) </li></ul>
  16. 16. SCALES SUCH AS TSE tem Infit Outfit Measure Error PTMEA Miscellaneous 83 and 84 are similar in ES and Psy. 80 is different in ES and Psy. 77 .82 .84 1.86 .13 .57 80 1.29 1.27 1.03 .14 .42 72 .74 .74 .90 .14 .70 84 .98 .91 .51A .14 .60 70 .64 .64 .22A .15 .69 71 .77 .75 .09A .15 .72 83 .99 .94 -.41A .15 .70 73 1.13 1.07 -1.10A .15 .54 78 .88 .82 -1.16 .15 .74 All items Mean .91 .89 .21 .15 Person Reliability .79 SD. .19 .18 .94 .01 Person Separation 1.91 All persons Item Reliability .97 Mean .89 .89 1.62 .62 Item Separation 6.24 SD .64 .64 1.35 .11 Cronbach alpha .82 Average measures 1 2 3 4 5 -1.96 -.61 .58 2.37 4.10 Step calibration measures -3.79 -1.95 1.19 4.55
  17. 21. Implications for practice <ul><li>For ES: In the premaster stage: Focus on tasks that support SRLC and academic achievement (planning; monitoring; evaluation ,but also support effort, persistence, and perseverance). </li></ul><ul><li>For ES: Support TSE, by mastery experiences, modeling, and persuasion. </li></ul><ul><li>For PSY: support SDLC by integrating more authentic professional tasks (or practice experiences), not only in research practicals, but also regarding diagnostic or interventions practice. </li></ul>
  18. 22. Implications for future research <ul><li>How generalizable is self-efficacy as a construct (and consequently, how can you compare groups on this phenomenon)? </li></ul><ul><li>What is the quality of the negatively formulated items? </li></ul><ul><li>Is it justified to assume that student samples, in comparable stages of their learning trajectory, are of an equal endorsability level in self-reporting generic competences in different domains? </li></ul><ul><li>Is it justified to assume that responses on items can be attributed to persons, if context affects response patterns (e.g. SRLC “When I participate in an education program I make sure that I complete that program”)? (has also consequences for making measures) </li></ul>
  19. 23. Rude questions… <ul><li>What is the quality of the instruments we use to measure learning and development (how and when are they validated? With which methods)? </li></ul><ul><li>How reliable, valid, and comparable are our performance measures , if we do not use Rasch validated items or tests? </li></ul><ul><li>How frequently do we calibrate our measures? </li></ul>
  20. 24. THANK YOU FOR YOUR ATTENTION. Any questions? [email_address]