1. Machines with meaning: The potential of
machine learning in educational research
Discussant
Dr Christian Bokhove
Professor in Mathematics Education
2. • This “symposium explores the multi-layered value of machine
learning in educational research”.
• Structure of my contribution
• Short summary of what I understand to be the original contribution of the
paper.
• Some comments on the paper.
• After doing this for the three papers, I try to synthesise a few discussion
themes.
3. Paper 1 - Babette Bühler - Video-based mind-
wandering detection employing gaze features in
temporal models during reading
• Paper which uses deep-learning based facial expression features in
temporal models to predict mind wandering and test generalizability
in cross-task/-setting prediction”.
• OpenFace: Toolbox for automatic facial behavior analysis.
• Temporal classification model, over non-temporal.
4. Commentary
• Ground truth: Self-caught mind-wandering reports via key press. Validated,
but…”covert, internal cognitive state”.
• Explainable AI – “explainable AI methods to gain deeper insights”.
• Great to see privacy considerations.
• Explain ‘well’ in “Model generalizes well to other tasks and settings” -
Predict mind wandering 10% above chance level, close to within data
prediction (11% above chance level) – whether we think this is enough
depends on ambition level and stakes.
• Openface – accuracy enough depends on the stakes e.g. forensics
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6201796/ -
but…technology constantly improves.
5. Paper 2 - Lonneke Boels - Gaze-Based Machine Learning
Analysis of Students’ Learning During Solving Graph Items
• Interpretation of histograms after dotplot items.
• Students’ gaze data, answers, and stimulated recall interview data
were collected
• What are the main differences in students’ gaze patterns on histogram items
before and after solving dotplot items?
• What are differences in students’ answers on histogram items before and
after solving dotplot items?
• What are the main changes in students' approaches to histograms after
solving dotplot items?
• ML, random forests.
• Data repository: good!
6. Commentary
• The cognitive link - scanpath patterns on the graph area are relevant for
students’ thinking processes
• Measurement multiple-choice versus estimation. Prefer more open in
maths!
• In the paper, it was very good to report an unsuccessful MLA.
• Precision good, but what does it actually mean? What is the mechanism
behind all this. Explainability and theory.
• NHST – onesided v twosided – both directions?
• Ceiling effect? - “For twenty-six students, there was (almost) no room for
improvement because they already gave answers within or close to the
answer range during the before sequence of single-histogram items.”
7. Paper 3 - Nora McIntyre - Interpretable machine learning
insights into inequalities in access to online learning
• Mathematics online environment.
• Three countries: HIC and LMIC.
• Social justice.
• Machine learning.
• Features & SHAP values.
5-13 years?
8. Commentary
• Outcome variable. Play_count – lessons completed – what about the quality? And
if accessibility, wouldn’t just the raw numbers say enough?
• Technological environment matters – quality of items.
• I like the theory-led and data-led distinction. But in a sense, phase 1 was still
data-led. After all, they had to be available. Theoretically there are many more
we know from the literature, which might not be available (prior knowledge,
quality school, family SES). The 3 countries will have different curricula. Such
context matters.
• Related variables. E.g. totalquestions means more lessons completed.
MathAbility is age in quarters minus item difficulty. Time taken probably higher if
more questions done. Lessons completed then higher as well? Risk confounders.
• SHAPs very small – when do they have practical significance? 0.07 for
interactions. What is our ambition level?
9. Discussion
• Interpretation can be a challenge – “In short, interpretable models
usually are not the best performers and the best performers
classifiers are usually not interpretable.” - Orrù et al. (2020).
Explainability.
• Analytical variability (Bokhove, 2022). Boels: “Answer correctness is
therefore quite sensitive to researchers’ choices.” Also see the recent
Arizmendi et al. (2022).
• Not always results that couldn’t be found with simple analysis
methods. Technical skills.
• Measurement – test validity – reliability. Stakes. Effect sizes vs
accuracy.
10. Discussion
• Open science – reproducibility of algorithms – seeds. Secondary
data, data available (Boels), proprietary systems…
• McIntyre – “there is a fundamental shift in the way modelling is
viewed and used in machine learning as compared with traditional,
inferential statistics.” – Agree, so huge potential, but we have to
remain critical. Should not be: apply bunch of algorithms and then
choose the highest scoring one (model hacking). Crud (Orben &
Lakens, 2021). Measurement Schmeasurement (Flake & Fried,
2020).
• “computer programs that constantly absorb new data and adapt
their decisions in response—don’t always make ethical or accurate
choices.” (Babic & Cohen, 2021)
• Machine Learning for data analysis or Machine Learning to improve
platforms?
(Babic & Cohen, 2021)
Agency risk (risks stemming from things that aren’t under the control of a specific business or user.)
Concept drift (the relationship between the inputs the system uses and its outputs isn’t stable over time or may be misspecified.)
Covariate shift (data fed into an algorithm during its use differs from the data that trained it.)