1. CurriM: Curriculum Mining
Mykola Pechenizkiy
TU Eindhoven
Learning Analytics Innovation
10 October 2012
SURFfoundation, Utrecht, the Netherlands
2. Initial Motivation for CurriM
• Current practice:
– We think we know what our curriculum is and
how the students study. But is this true?
• CurriM aims at providing tools to analyze
– how the students actually study
• Who would benefit from our tool?
– Directors of education, study advisers, students
• Goal: showcase the potential and feasibility
– Data mining and process mining techniques
– 10 years of TUE administrative data; exam grades
Learning Analytics @Surf CurriM: Curriculum Mining 1
10 October 2012, Utrecht, Mykola Pechenizkiy, Eindhoven University of Technology
3. Questions for CurriM to Answer
• What is the real academic curriculum (study
program)?
• How do students really study?
• Is there a typical (or the best) way to study?
• Do current prerequisites make sense?
• Is the particular curriculum constraint obeyed?
• How likely is it that a student will finish the
studies successfully or will drop out?
• What is my expected time to finish?
• Should I now take courses A & B & C or C & D?
Learning Analytics @Surf CurriM: Curriculum Mining 2
10 October 2012, Utrecht, Mykola Pechenizkiy, Eindhoven University of Technology
4. Refocused to Target Students as Users
(based on the received feedback)
Awareness tool supporting interactive querying:
• How does a course relate to the program?
– Prerequisites, follow up dependencies
• How am I doing wrt the averages, top 10%?
– Aggregates/OLAP
• What is my expected time to finish?
– Predictive modeling
• Should I now take courses A & B & C or C & D?
– Collaborative filtering style recommendations
Learning Analytics @Surf CurriM: Curriculum Mining 3
10 October 2012, Utrecht, Mykola Pechenizkiy, Eindhoven University of Technology
5. CurriM UI Demo
Learning Analytics @Surf CurriM: Curriculum Mining 4
10 October 2012, Utrecht, Mykola Pechenizkiy, Eindhoven University of Technology
6. Where is EDM/LA?
(hidden from the users behind GUI)
Curriculum model:
• Codified constraints with Colored Petri net and LTL
– Prerequisites, follow up dependencies, 3 out of 5
selection, number of attempts, mandatory courses etc.
– Input: domain knowledge and output of patters mining
• Awareness and automated conformance checking
– Is the currently chosen path compliant with the official
guidelines and follows data driven recommendations
– Computed aggregates and mined pattern from the data
• Data driven recommendations and predictions
– What is my expected time to finish?
– Should I take now courses A & B & C or C & D?
Learning Analytics @Surf CurriM: Curriculum Mining 5
10 October 2012, Utrecht, Mykola Pechenizkiy, Eindhoven University of Technology
7. Main Results
• Software prototype – CurriM as ProM plugin,
– Focus on GUI + architecture/interfaces
– Demonstrates the concept
• Experiments with TUE dataset
– Prerequisites, bottleneck/predictive courses
– Recommendations
– Data quality is the key
• Clear motivation and need for a continuation
– The concept is found to be promising
– Potential and feasibility is shown
– Roadmap
Learning Analytics @Surf CurriM: Curriculum Mining 6
10 October 2012, Utrecht, Mykola Pechenizkiy, Eindhoven University of Technology
8. Why Do Students Like the Concept?
CurriM is a tool that
• Provides orientation:
– Curriculum as a guide and motivation
– See the connections and dependencies
• Provides awareness and recommendations
– Global: how good is their personal education
route, where they currently are, where they are
heading,
how well they do in comparison with others
– Local: what would it mean to take course X
• Enables better planning and regular monitoring
– Focus on what looks important, not just interesting
Learning Analytics @Surf
CurriM: Curriculum Mining 7
10 October 2012, Utrecht, Mykola Pechenizkiy, Eindhoven University of Technology
9. Main Lessons Learnt
Data quality is the key
• Administrative DBs and existing data collection
organization do not keep EDM/LA in mind
• Lots of preprocessing and reorganization is required
Meta-data is the other key (lacking codifiability)
• Everything that is scattered in study guides and minds of
study advisors should become easy to codify
Curriculum changes more often than we tend to think
• Semesters-trimesters-quartiles, courses & course ids
Being “flexible” (written vs. unwritten rules) too much
• Effectively means no formal curriculum
Learning Analytics @Surf CurriM: Curriculum Mining 8
10 October 2012, Utrecht, Mykola Pechenizkiy, Eindhoven University of Technology
10. Conclusions
• CurriM can become a big success
– The students seem to like the idea
– It is promising and it is feasible; but it is a long way
from the current concept to a fully functional and
usable tool
• Surf funding opportunity in LA was nice
– Triggered us to take concrete practical steps, a tool
rather than techniques development;
– But a more serious commitment is needed to
make a real breakthrough and bring CurriM into
the educational practice
Learning Analytics @Surf CurriM: Curriculum Mining 9
10 October 2012, Utrecht, Mykola Pechenizkiy, Eindhoven University of Technology
11. Continuation Roadmap
Conditioned wrt funding opportunities
• Working out the full cycle of the information
flows including pattern mining, predictions and
recommendations, and its
integration/parallelization with the administrative
processes
• Working out different views and functionality for
students vs. educators, HCI/usability aspects
• Improve data quality collection
• Facilitate knowledge base construction (meta-
data, mappings)
• Facilitate curriculum formalization for faculties
(tooling)
Learning Analytics @Surf CurriM: Curriculum Mining 10
10 October 2012, Utrecht, Mykola Pechenizkiy, Eindhoven University of Technology
12. Project Team
Project leader:
• dr. Mykola Pechenizkiy – educational data mining expert
Driving force:
• Pedro Toledo – software developer, applied researcher
Technology experts:
• Prof. dr. Paul De Bra – Human-computer interaction and databases
expert
• dr. Toon Calders – pattern mining expert, assistant professor
• dr. Nikola Trcka – collaborator on curriculum mining, postdoc
• dr. Boudewijn van Dongen – process mining expert, assistant
professor
• dr. Eric Verbeek – ProM software expert, scientific programmer
Domain experts
• Several domain experts, i.e. responsible educators, are available for
CurriM on request: dr. Karen Ali (STU), Prof. dr. Mark de Berg (CSE)
Learning Analytics @Surf CurriM: Curriculum Mining 11
10 October 2012, Utrecht, Mykola Pechenizkiy, Eindhoven University of Technology
13. Additional slides
• Including some from the original proposal
Learning Analytics @Surf CurriM: Curriculum Mining Project Proposal 12
29 February 2012, Utrecht, Mykola Pechenizkiy, Eindhoven University of Technology
14. Execution plan
Task 1. Developing the first software prototype for
academic curriculum modeling. As mini R&D cycles:
• identifying types of curriculum specific patterns we
need to mine from the event logs (in collaboration with
the domain experts) and to include in the curriculum
modeling and developing corresponding pattern
mining and pattern assembling techniques;
• Implementing techniques and integrating it with ProM
that provides an important process mining foundation
framework and many of the building blocks for
curriculum modeling software;
• testing a particular piece of software.
Learning Analytics @Surf CurriM: Curriculum Mining Project Proposal 13
29 February2012, Utrecht, Mykola Pechenizkiy, Eindhoven University of Technology
15. Execution plan
Task 2. Case study: modeling the curriculum of the
Department of Computer Science, TUE; Goals:
• Validating the correctness and usefulness (to the
end users, i.e. teachers, study advisers, students)
of the developed curriculum mining techniques
and their implementations.
• Developing guidelines for managing the
curriculum related data to avoid the problems we
will encounter or envision during the case study.
• Task 1 and Task 2 will run simultaneously
ensuring timely feedback.
Learning Analytics @Surf CurriM: Curriculum Mining Project Proposal 14
29 February2012, Utrecht, Mykola Pechenizkiy, Eindhoven University of Technology
16. Execution plan
Task 3. Creating a roadmap for further study and
development of the curriculum modeling toolset
• Develop R&D agenda for the coming years.
• This includes identification of not only research
challenges i.e. answering the question
– “what kind of new data mining and process mining
techniques are needed to address the peculiarities of
the curriculum mining domain?”
• but also the strategy of the smooth technology
transfer to the prospective end users, i.e.
– early adopters (e.g. TUE or 3TU departments) that
would help to validate the usability and usefulness of
the curriculum mining software “in the wild”.
Learning Analytics @Surf CurriM: Curriculum Mining Project Proposal 15
29 February2012, Utrecht, Mykola Pechenizkiy, Eindhoven University of Technology
17. Project Team
Task 3. Creating a roadmap for further study and
development of the curriculum modeling toolset
• Develop R&D agenda for the coming years.
• This includes identification of not only research
challenges i.e. answering the question
– “what kind of new data mining and process mining
techniques are needed to address the peculiarities of
the curriculum mining domain?”
• but also the strategy of the smooth technology
transfer to the prospective end users, i.e.
– early adopters (e.g. TUE or 3TU departments) that
would help to validate the usability and usefulness of
the curriculum mining software “in the wild”.
Learning Analytics @Surf CurriM: Curriculum Mining Project Proposal 16
29 February2012, Utrecht, Mykola Pechenizkiy, Eindhoven University of Technology
18. Learning Analytics Seminar, Educational Data Mining & Learning Analytics for All: Potential, Dangers, Challenges 17
August 30-31, Utrecht, NL Mykola Pechenizkiy, Eindhoven University of Technology
19. Educational Process Mining Toolbox
Learning Analytics @Surf CurriM: Curriculum Mining Project Proposal 18
29 February 2012, Utrecht, Mykola Pechenizkiy, Eindhoven University of Technology
20. Intuition suggests that curriculum is
• Structured and easy to understand as we think
there are not that many options to choose from
– It may look just like this one:
• but the data may suggest that it looks different…
Learning Analytics @Surf CurriM: Curriculum Mining Project Proposal 19
29 February 2012, Utrecht, Mykola Pechenizkiy, Eindhoven University of Technology
21. … data may suggest that students show
somewhat more
diverse behaviour:
Learning Analytics @Surf CurriM: Curriculum Mining Project Proposal 20
29 February2012, Utrecht, Mykola Pechenizkiy, Eindhoven University of Technology
22. Two Different Tasks
Isolate a set of standard curriculum patterns and based on these patterns
• mine the curriculum as an executable quantified formal model and
analyze it, or
• first (manually) devise a formal model of the assumed curriculum and test
it against the data.
Event Log -
MXML format Typical forms of
supported by ProM requirements in the
curriculum
Colored
Petri net
Learning Analytics @Surf CurriM: Curriculum Mining Project Proposal 23
29 February 2012, Utrecht, Mykola Pechenizkiy, Eindhoven University of Technology
23. Application Scenarios
Scenario 1: Find most common types of Student
A
Timestamp
S1
Events
2, 3, 5
behavior (and cluster them) A S2 6, 1
A S3 1
Scenario 2: Find emerging patterns: such B S1 4, 5, 6
B S3 2
patterns, which capture significant B S4 7, 8, 1, 2
B S5 1, 6
– differences in behavior of students who C S1 1, 8, 7
graduated vs. those students who did not
– changes in behaviour of students from year
2006-07 to 2007-08.
– in both cases we search for such patters which
supports increase significantly from one dataset
to another (i.e. in space in the first case and in
time in the second case)
Scenario 3: After finding a bottleneck, find
frequent patterns that describe it, i.e. for which
students it is the bottleneck and why
Learning Analytics @Surf CurriM: Curriculum Mining Project Proposal 24
29 February 2012, Utrecht, Mykola Pechenizkiy, Eindhoven University of Technology
24. Example 2-out-of-3 Pattern Check
• At least 2 courses from { 2Y420,2F725,2IH20 } must
be taken before graduation :
• An higher level abstraction can be developed on a
longer run to avoid we aim at developing a
Learning Analytics @Surf CurriM: Curriculum Mining Project Proposal 25
29 February 2012, Utrecht, Mykola Pechenizkiy, Eindhoven University of Technology
25. Process Discovery Example
Learning Analytics @Surf CurriM: Curriculum Mining Project Proposal 26
29 February 2012, Utrecht, Mykola Pechenizkiy, Eindhoven University of Technology
26. Which Courses Are Difficult/Easy for Which
Students?
Learning Analytics @Surf CurriM: Curriculum Mining Project Proposal 27
29 February 2012, Utrecht, Mykola Pechenizkiy, Eindhoven University of Technology
27. References
• Trčka, N., Pechenizkiy, M. & van der Aalst, W. (2010) "Process Mining from
Educational Data (Chapter 9)", In Handbook of Educational Data Mining. , pp.
123-142. London: CRC Press.
• Pechenizkiy, M., Trčka, N., Vasilyeva, E., van der Aalst, W. & De Bra, P.
(2009) Process Mining Online Assessment Data, In Proceedings of 2nd
International Conference on Educational Data Mining (EDM'09), pp. 279-288.
• Trčka, N. & Pechenizkiy, M. (2009) From Local Patterns to Global Models:
Towards Domain Driven Educational Process Mining, In Proceedings of Ninth
International Conference on Intelligent Systems Design and Applications
(ISDA'09), pp. 1114-1119.
• Bose, R.P.J.C., van der Aalst, W.M.P., Zliobaite, I. & Pechenizkiy, M.
(2011) Handling Concept Drift in Process Mining, In Proceedings of 23rd
International Conference on Advanced Information Systems Engineering
CAiSE'2011, Lecture Notes in Computer Science 6741, Springer, pp. 391-405.
• Dekker, G., Pechenizkiy, M. & Vleeshouwers, J. (2009) Predicting Students
Drop Out: a Case Study, In Proceedings of the 2nd International Conference
on Educational Data Mining (EDM'09), pp. 41-50.
• http://www.processmining.org/
Learning Analytics @Surf CurriM: Curriculum Mining Project Proposal 29
29 Febnuary 2012, Utrecht, Mykola Pechenizkiy, Eindhoven University of Technology
28. Short CV of the Project Leader
Mykola Pechenizkiy
Assistant Professor at Dept. of Computer Science, TU/e
Research interests: data mining and knowledge discovery;
Particularly predictive analytics for information systems
serving industry, commerse, medicine and education.
http://www.win.tue.nl/~mpechen/ - projects, pubs, talks etc.
Major recent EDM-related activities:
29. Confirmed interest in CurriM at TUE
• Dr. Karen S. Ali - Director of Education and
Student Service Center, STU
• Prof. Dr. Mark de Berg - Director of the
graduate program, Dept. of Computer Science
• Dr. Marloes van Lierop - Director of the
bachelor program, Dept. of Computer Science
• Study advisers at different faculties
Learning Analytics @Surf CurriM: Curriculum Mining Project Proposal 31
29 February 2012, Utrecht, Mykola Pechenizkiy, Eindhoven University of Technology
Notas del editor
the curriculum based on sound formalisms and
Focus on education management people, like directors of education, study advisors and alike
Presenting poster at EDM’12, preparing for LAK’13 and journal submission
Motivation e.g. to do math because it is needed for many other coursesJust in line with the motivation we have had
Being “flexible” (written vs. unwritten rules) on too many things results in a mess, not a flexible curriculum
Usefulness and potential utility will be evaluated by the educators. The correctness of the tool will be done by data/process mining experts.
The ultimate goal of this task is to validate the usefulness of the developed mining curriculum mining techniques and their implementations in the software. In data mining and process mining it is often not enough to build the correct or sound algorithms and to implement then in a software toolkit. We want the resulting models, which are constructed with these techniques, to provide certain utility, i.e. be useful for the end users. Given the timeline of this project it is important that we have a few of such cycles during the project execution to receive a timely feedback from the analysis of the resulting models. Experimenting with the real historical data will hint us what issues have been omitted in the initial R&D sprints.Working with real data also gives an understanding how good or bad it is wrt organization, noise, redundancy, consistency, completeness etc. Obviously through the hands on experience with the data that has been collected already in the past we can developing guidelines for management of the curriculum related data to avoid the problems we will encounter or envision during the case study.
The color indicates how much time the students on average spend in a certain node. This awareness helps to understand bottlenecks in the curriculum and to facilitate data-driven decision making as for students (I really need to take pathB, i.e. Logic first or Logic with grade >8 or whatever semantics we put) as for study advisors or directors of education (we need to reconsider a prerequisite)This figure is about online assessment, but the principle can be explained.