Improving retention: predicting at-risk students by analysing clicking behaviour in a virtual learning environment.
Presentation from 'InFocus: Learner analytics and big data', a CDE technology symposium held at Senate House on 10 December 2013. Conducted by Annika Wolff, Knowledge Media Institute, Open University.
Audio of the session and more details can be found at www.cde.london.ac.uk.
Similar a In Focus Presentation: Improving retention: predicting at-risk students by analysing clicking behaviour in a virtual learning environment (20)
In Focus Presentation: Improving retention: predicting at-risk students by analysing clicking behaviour in a virtual learning environment
1. Improving retention: predicting
at-risk students by analysing
clicking behaviour in a virtual
learning environment
Annika Wolff and Zdenek Zdrahal
10th December 2013
2. Student retention
• Struggling students don’t always ask
for help – drop-out of module or fail
and then don’t progress further
• When timely help is offered, this can
make the difference between
success and failure.
• It can be hard to know who’s in
trouble and where to direct
resources
3. Open University context
Distance learning:
• Content through VLE
• Contact mediated
through VLE – how to
tell if students are
struggling?
Solution: develop
predictive models from
student data
students
tutors
4. Data sources and data sets
VLE
Assessment
Learning content
Forums
Quizzes….
Ongoing assessments
Final exam
Demographic
Age
Gender
Previous study…..
6. VLE activity (prior TMA1)
•
•
•
•
•
•
•
No VLE activity … 317 students
1-20 clicks ……….. 609 students
21-80 clicks ……… 943 students
81-150 clicks ……. 621 students
151-300 clicks …. 803 students
301-600 clicks …. 516 students
> 600 clicks ……… 355 students
7. Problem specification
• Given:
– Demographic data at the Start (may include information about
student’s previous modules studied at the OU and his/her objectives)
– Assessments (TMAs) as they are available during the module
– VLE activities between TMAs
– Conditions student must satisfy to pass the module
• Goal:
– Identify students at risk of failing the module as early as possible so
that OU intervention is meaningful.
8. Comments on problem specification
• OU intervention is meaningful if the cost of the intervention is
lower than the expected gain from retaining the student.
• Modelling the problem:
We are here
9. Comments on problem specification
• OU intervention is meaningful if the cost of intervention is
lower than the expected gain from retaining the student.
• Modelling the problem:
History we know
We are here
10. Comments on problem specification
• OU intervention is meaningful if the cost of intervention is
lower than the expected gain from retaining the student.
• Modelling the problem:
Future we can estimate
History we know
We are here
11. Comments on problem specification
• OU intervention is meaningful if the cost of intervention is
lower than the expected gain from retaining the student.
• Modelling the problem:
Future we can estimate
History we know
We are here
… and we can influence!
12. Comments on problem specification
• OU intervention is meaningful if the cost of intervention is
lower than the expected gain from retaining the student.
• Modelling the problem:
Future we can estimate
History we know
We are here
How can we estimate the future? … Based on student’s history and properties of
upcoming parts of the module known from previous presentations.
13. Prediction at TMA1
– Why? TMA1 is a good predictor of success or
failure
– It is enough time to intervene
History we know
Future we can affect
We are here
14. Building a classifier
Pass
Fail
Training instances
New instances
PASS
FAIL
Decision Tree – first results (no demographics)
Assessment 1 score?
>40%
<40%
Fail
Pass
Pass
Fail
17. Naïve Bayes network
• Education:
–
–
–
–
–
Sex
N/C
TMA1
Education
VLE
No formal qualif.
Lower than A level
A level
HE qualif.
Postgraduate qualif.
• VLE:
–
–
–
–
No engagement
1-20 clicks
21-100 clicks
101 – 800 clicks
• N/C:
Goal:
Calculate probability of failing at TMA1
• either by not submitting TMA1,
• or by submitting with score < 40.
– New student
– Continuing student
• Sex:
– Female
– Male
18. Predicting final result from TMA1
Pass/Distinction
TMA1 >=40
TMA1
TMA2
TMA7
TMA1 <40
Final result
Fail
Prior probabilities: P(Success) = 0.807, P(Fail) = 0.193
Posteriori probabilities: P(Success|TMA1) = 0.858, P(Fail|TMA1) = 0.142
P(Success|~TMA1) = 0.093, P(Fail|~TMA1) = 0.907
Bayes minimum error classifier
If student fails in TMA1 he/she is likely to fail the final result
VLE
20. Predicting final result from TMA1
Sex
Pass/Distinction
TMA1 >=40
N/C
TMA1
TMA2
TMA7
Final result
Education
TMA1 <40
Fail
VLE
Prior probabilities: P(Success) = 0.807, P(Fail) = 0.193
Posteriori probabilities: P(Success|TMA1) = 0.858, P(Fail|TMA1) = 0.142
P(Success|~TMA1) = 0.093, P(Fail|~TMA1) = 0.907
Bayes minimum error classifier
If student fails in TMA1 he/she is likely to fail the final result
VLE
21. Demo Case 1
• Demographic data
Sex
N/C
Educatio
n
– Student fits certain
demographic profile of
gender, educational
background etc.
TMA1
Without VLE:
Probability of failing at TMA1 = 18.5%
With VLE:
Sex
N/C
Educatio
n
VLE
Clicks
TMA1
Probability Nr of students
0
64%
4
1-20
44%
3
21-100
26%
5
101-800
6.3%
14
22. Demo Case 2
• Demographic data
Sex
N/C
Educatio
n
– Different demographic profile
to previous slide
TMA1
Without VLE:
Probability of failing at TMA1 = 7.7%
With VLE:
Sex
N/C
Educatio
n
VLE
Clicks
TMA1
Probability Nr of students
0
39%
35
1-20
22%
74
21-100
11.2%
178
101-800
2.4%
461
23. TMA1? … it might be too late!
Future we can affect
History
We are here
Can we predict TMA1 from VLE activities 1 week
before the TMA1 deadline?
How about 2, 3, … weeks?
24. Dashboard and Chart
has not engaged with VLE
at least one TMA below 40
predicted to fail
Has not submitted TMA5
average score < 40
However
average score = 81.71 !!!
has not engaged with VLE
26. Conclusions
• In a distance learning context, the
VLE data provides a valuable
source of data for prediction
• Prediction improves as a module
progresses, but this is too late!
• We need to optimise methods for
early prediction