In Focus Presentation: Improving retention: predicting at-risk students by analysing clicking behaviour in a virtual learning environment

Improving retention: predicting
at-risk students by analysing
clicking behaviour in a virtual
learning environment
Annika Wolff and Zdenek Zdrahal
10th December 2013

Student retention
• Struggling students don’t always ask
for help – drop-out of module or fail
and then don’t progress further
• When timely help is offered, this can
make the difference between
success and failure.
• It can be hard to know who’s in
trouble and where to direct
resources

Open University context

Distance learning:
• Content through VLE
• Contact mediated
through VLE – how to
tell if students are
struggling?
Solution: develop
predictive models from
student data

students

tutors

Data sources and data sets
VLE

Assessment

Learning content
Forums
Quizzes….

Ongoing assessments
Final exam

Demographic

Age
Gender
Previous study…..

Typical VLE clicks
3000

2500
Students

Tutors

2000

1500

1000

500

0
1

3

5

7

9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47

VLE activity (prior TMA1)
•
•
•
•
•
•
•

No VLE activity … 317 students
1-20 clicks ……….. 609 students
21-80 clicks ……… 943 students
81-150 clicks ……. 621 students
151-300 clicks …. 803 students
301-600 clicks …. 516 students
> 600 clicks ……… 355 students

Problem specification

• Given:
– Demographic data at the Start (may include information about
student’s previous modules studied at the OU and his/her objectives)
– Assessments (TMAs) as they are available during the module
– VLE activities between TMAs
– Conditions student must satisfy to pass the module

• Goal:
– Identify students at risk of failing the module as early as possible so
that OU intervention is meaningful.

Comments on problem specification
• OU intervention is meaningful if the cost of the intervention is
lower than the expected gain from retaining the student.
• Modelling the problem:

We are here

• OU intervention is meaningful if the cost of intervention is

History we know
We are here


Future we can estimate

History we know
We are here



History we know
We are here

… and we can influence!



History we know
We are here

How can we estimate the future? … Based on student’s history and properties of
upcoming parts of the module known from previous presentations.

Prediction at TMA1
– Why? TMA1 is a good predictor of success or
failure
– It is enough time to intervene

History we know

Future we can affect

We are here

Building a classifier
Pass

Fail

Training instances
New instances
PASS

FAIL
Decision Tree – first results (no demographics)
Assessment 1 score?
>40%

<40%

Fail
Pass
Pass

Fail

Naïve Bayes network
• Education:
–
–
–
–
–

Sex

N/C
TMA1
Education

VLE

No formal qualif.
Lower than A level
A level
HE qualif.
Postgraduate qualif.

• VLE:
–
–
–
–

No engagement
1-20 clicks
21-100 clicks
101 – 800 clicks

• N/C:
Goal:
Calculate probability of failing at TMA1

• either by not submitting TMA1,
• or by submitting with score < 40.

– New student
– Continuing student

• Sex:

– Female
– Male

Predicting final result from TMA1
Pass/Distinction

TMA1 >=40
TMA1

TMA2

TMA7

TMA1 <40

Final result

Fail

Prior probabilities: P(Success) = 0.807, P(Fail) = 0.193
Posteriori probabilities: P(Success|TMA1) = 0.858, P(Fail|TMA1) = 0.142
P(Success|~TMA1) = 0.093, P(Fail|~TMA1) = 0.907

Bayes minimum error classifier
If student fails in TMA1 he/she is likely to fail the final result
VLE

P(Fail|TMA1-score), P(Pass/Dist|TMA1-score)
1
0.9
0.8
0.7
0.6
0.5

Fail
Pass/Dist

0.4
0.3
0.2
0.1
0
0-39

40-59

60-69

70-79

80-100

TMA1

Predicting final result from TMA1
Sex

Pass/Distinction

TMA1 >=40
N/C
TMA1

TMA2

TMA7

Final result

Education

TMA1 <40

Fail

VLE

Prior probabilities: P(Success) = 0.807, P(Fail) = 0.193
Posteriori probabilities: P(Success|TMA1) = 0.858, P(Fail|TMA1) = 0.142
P(Success|~TMA1) = 0.093, P(Fail|~TMA1) = 0.907

Bayes minimum error classifier
If student fails in TMA1 he/she is likely to fail the final result
VLE

Demo Case 1
• Demographic data

Sex
N/C
Educatio
n

– Student fits certain
demographic profile of
gender, educational
background etc.

TMA1

Without VLE:
Probability of failing at TMA1 = 18.5%

With VLE:
Sex
N/C
Educatio
n
VLE

Clicks
TMA1

Probability Nr of students

0

64%

4

1-20

44%

3

21-100

26%

5

101-800

6.3%

14

Demo Case 2
• Demographic data

Sex
N/C
Educatio
n

– Different demographic profile
to previous slide

TMA1

Without VLE:
Probability of failing at TMA1 = 7.7%

With VLE:
Sex
N/C
Educatio
n
VLE

Clicks
TMA1

Probability Nr of students

0

39%

35

1-20

22%

74

21-100

11.2%

178

101-800

2.4%

461

TMA1? … it might be too late!

Future we can affect

History
We are here

Can we predict TMA1 from VLE activities 1 week
before the TMA1 deadline?
How about 2, 3, … weeks?

Dashboard and Chart

has not engaged with VLE
at least one TMA below 40

predicted to fail

Has not submitted TMA5

average score < 40

However
average score = 81.71 !!!

has not engaged with VLE

Conclusions
• In a distance learning context, the
VLE data provides a valuable
source of data for prediction
• Prediction improves as a module
progresses, but this is too late!
• We need to optimise methods for
early prediction

In Focus Presentation: Improving retention: predicting at-risk students by analysing clicking behaviour in a virtual learning environment

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (19)

Destacado

Destacado (16)

Similar a In Focus Presentation: Improving retention: predicting at-risk students by analysing clicking behaviour in a virtual learning environment

Similar a In Focus Presentation: Improving retention: predicting at-risk students by analysing clicking behaviour in a virtual learning environment (20)

Más de Centre for Distance Education

Más de Centre for Distance Education (18)

Último

Último (20)

In Focus Presentation: Improving retention: predicting at-risk students by analysing clicking behaviour in a virtual learning environment