Christopher Brooks SOED 2016

brooksch@umich.edu @cab938
Shape of Educational Data:
Predictive Modeling as an Enabler of
Personalized Learning
Christopher Brooks
Research Assistant Professor, School of Information
Director of Learning Analytics and Research
Digital Education and Innovation
University of Michigan

Psychohistory
“…[it] combined history, psychology and
mathematical statistics to create a (nearly)
exact science of the behavior of very large
populations of people…Asimov used the analogy
of a gas: in a gas, the motion of a single
molecule is very difficult to predict, but the mass
action of the gas can be predicted to a high level
of accuracy. Asimov applied this concept to the
population of the fictional Galactic Empire, which
numbered in the quadrillions.”
http://asimov.wikia.com/wiki/Psychohistory

• "Averaginarianism"
• Regression towards a mean that
doesn't actually naturally exist
• There is a gulf between the
predictive modeling perspectives,
and the explanatory modeling
ones

Research Perspective
• Learners are individuals
• There is nuance in data that is
important and being missed by
studying populations vs.
individuals
• Computational modelling (esp.
predictive modelling) has
opportunity to help

Traditional Higher
Education
Low Stakes
Lifelong Learning

Lecture Capture
• How do students integrate educational technologies into their study habits?
– (and do those technologies have any effect?)
• A need for insight
– Studies largely show only student satisfaction benefits from lecture capture
– Several studies show no effect to the use of lecture capture on performance
• Data mining for usage patterns
– Apply unsupervised machine learning methods (k-means clustering) to viewership data by
week
– Then built general model from prototypes and apply to new datasets and determine fit
(replication)

Results (Chemistry 2xx 2010)
• 5 groups found, each pedagogically labelled (by investigators!)
• Error and size of groups ranges considerably
• The final exam period is not indicative of activity throughout semester

The general model
• How well does the model generalize?

(Grades Pairwise Tukey HSD, * p<0.1 ** p<0.05)
(Grades)

Results
• Not a predictive model,
but a more discriminate
descriptive model
– Showed an effect not for general use of lecture
capture, but for specific ways of using lecture capture
• Replication suggests there is merit to the model, but
that it is highly contextualized (theme of course)
• Data from more sources could add further detail to the
model as to causal effects
Brooks, C. A., Erickson, G., Greer, J. E.,
Gutwin, C. (2014) Modelling and Quantifying
the Behaviours of Students in Lecture Capture
Environments. In Computers & Education. Vol
75 June. pages 282-292.

Bonus Calculus Slide

Massive Open Online Courses
• As of the end of 2014, MOOCs at Michigan have attracted
1.9 million enrollees and nearly 1 million participants
• Of these participants, ~ 300K attempt some assessment
task, ~80K end up passing the course (certificate)
• Can we do better in understanding student success in this
environment?
• Could we predict who is at-risk for students who want to
obtain a certificate?

• MOOCs lack the diversity of data we have about residential students
– Previous achievement (SAT/ACT, last years course)
– Socioeconomic status (distance from university, first in family,
wealth)
– Gender
– Ethnicity
– Motivation
• Building predictive models of student achievement in learning
analytics is largely done on these entry-level features
• Both frustrating and refreshing
– Want accurate models, but want actionable data

• Built a novel feature selection algorithm inspired by
work in the text-mining community
• It looks at the pattern of engagement that a student
has with course resources
• Build of historical data (last years course) to create
day-by-day multilevel models (C4.5)
• Initial work is based on student certificate
achievement (pass/fail)
–(not the only valuable outcome variable to try and
predict!)

Resour
ce
Day of Course
1 2 3 4 5 6 7 8 9
Video
Daily Accesses
Day 1: Yes
Day 2: No
Day 3: Yes
Day 4: No
Day 5: No
Day 6: No
Day 7: No
Day 8: Yes
Day 9: No
3-Day counts
Day 1-3: Yes
Day 4-6: No
Day 7-9: Yes
Weekly counts
Week 1: Yes
Week 2: Yes
Monthly counts
Month 1: Yes
For a 104 day long course,
with three resources
(videos, forums, quizzes)
this gives us 408 features
for the modelling activity.

Text Mining Inspiration
• Text mining often uses n-grams as features in a document
– A bigram (cat, good) is the number of pairs of these two words in a document, a
trigam (cat, was not good), etc.
– We build engagement n-grams up to 5 gram
Daily Accesses
Day 1: Yes
Day 2: No
Day 3: Yes
Day 4: No
Day 5: No
Day 6: No
Day 7: No
Day 8: Yes
Day 9: No
Possible bigrams
[yes, yes]: 0
[no, no]: 3
[yes, no]: 3
[no, yes]: 2
Possible trigrams:
[yes, yes, yes]: 0
[yes, yes, no]: 0
[yes, no, yes]: 1
…
For a 104 day long course,
with three resources
(videos, forums, quizzes)
this gives us 717 more
features for the modelling
activity.

In a nutshell
• We do not have diverse set of data, but
we do have a detailed set of data
• And there is a lot of it (200 million+
clickstream events)
• By pulling out patterns of resource
access, we can use supervised machine
learning (C4.5) techniques to build
predictive models
• But what if we did have entry data
from students?
– Gender & Ethnicity, certification
status, country of origin, etc.

0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75
Fliess'κ
Day of Course Offering
Fliess' κ versus Time in Days
Activity Features Only
Demographics Features Only
Activity and Demographics Features

Results
• It is possible to create predictive models on clickstream data for MOOCs
• 3 weeks into the MOOC seems to be an interesting point for some courses
• It is computationally intensive to create these models (daily!)
• MOOC entry/demographics information does not seem to add value
C. Brooks, C. Thompson, S. Teasley. (2015) A Time Series Interaction Analysis Method for Building Predictive
Models of Learners using Log Data. 5th International Conference on Learning Analytics and Knowledge 2015
(LAK'15)
C. Brooks, C. Thompson, S. Teasley. (2015) Who You Are or What You Do: Comparing the Predictive Power of
Demographics vs. Activity Patterns in Massive Open Online Courses (MOOCs). The second annual conference on
Learning At Scale 2015 (L@S2015), Works in Progress track. Vancouver BC, March 14-15, 2015. Vancouver, BC.

No Particular Night or Morning
“I looked at the page with my name under the
title…it was some other man…the story was
familiar – I knew I had written it – but that name
on the paper still was not me. It was a symbol, a
name.”
“I’ve always figured it that you die each day, and
each day is a box…but you never go back and lift
the lids...each is a different you, somebody you
do not know or understand or want to
understand.
”

Questions? Comments?
Christopher Brooks
Research Assistant Professor, School of Information
Director of Learning Analytics and Research
in Digital Education and Innovation
University of Michigan
brooksch@umich.edu
@cab938

Christopher Brooks SOED 2016

Recomendados

Recomendados

Más contenido relacionado

Destacado

Destacado (9)

Similar a Christopher Brooks SOED 2016

Similar a Christopher Brooks SOED 2016 (20)

Más de Colleen Ganley

Más de Colleen Ganley (6)

Último

Último (20)

Christopher Brooks SOED 2016

Notas del editor