NCAIR 2016 Conference Presentation:
As the spotlight for increased transparency and accountability continue to shine upon higher education a need for more granular data regarding student retention and graduation has become a critical component in the decision making process for both faculty and staff. Developing an extensive program-level retention and graduation report is needed to inform faculty and staff as to the outcomes of their efforts and how to improve for the future. And while this kind of data is great for reflection and summative assessment, there has become an increasing need for data to become more predictive so preventative steps may be taken in a more formative assessment style. This session will explore the reporting of program-level retention and graduation and what the future holds for more predictive insights through the use of data mining and machine learning.
4. The Problem
• We are asked for program-level data
• Lack of program-level retention and graduation rates
• Complicated (particularly for undergrads)
• Different programs serve different purposes
• High stakes – program prioritization (AA driven)
• Reports lumped all non-retained together
(whether they graduated or stopped out)
4
7. What We Wanted
• Solid & simple approach (easy to explain and defend)
• Fair
• Useful for all types of programs
• Meaningful for decision-making (high- and low-level)
• Not overly complicated display
• Illuminates
• Overall performance
• Historic trends
• When are students lost
• Something that can be generated yearly w/o too much
effort
7
8. 5 Outcomes
• Five possible outcomes for each student that
declares a given major
• Retained in program
• Graduated in program
• Retained in different program
• Graduated in different program
• Not retained (stop-out/drop-out)
• Exclusive and exhaustive
8
9. General Approach
• Based on cohorts:
• A student is placed in a program cohort the 1st time
they declare a given program
• Each student in the cohort is flagged as one of the 5
possible outcomes for each ½ year interval (each
regular semester)
• At each interval we report where the members of the
cohort fall
• Each student will only appear in one cohort for a
program (usually)
9
13. @1 year @4 years
1 4
Year Semester Class Level
New Cohort Program Success
Non-program
Success
Not Retained Program Success
Non-program
Success
Not Retained
2006-2007 Total 40 33% 25% 43% 18% 44% 38%
Lower Division 36 28% 25% 47% 19% 47% 36%
Upper Division 4 75% 25% 0% 0% 0% 50%
2007-2008 Total 27 56% 7% 37% 37% 33% 78%
Lower Division 26 54% 8% 38% 27% 31% 81%
Upper Division 1 100% 0% 0% 300% 100% 0%
2008-2009 Total 42 45% 24% 31% 19% 12% 33%
Lower Division 38 42% 24% 34% 21% 13% 34%
Upper Division 3 67% 33% 0% 0% 0% 33%
2009-2010 Total 38 50% 11% 39% 37% 29% 45%
Lower Division 32 47% 9% 44% 34% 34% 50%
Upper Division 6 67% 17% 17% 33% 0% 17%
2010-2011 Total 62 48% 19% 32% 13% 8% 40%
Lower Division 55 42% 22% 36% 13% 5% 40%
Upper Division 7 100% 0% 0% 14% 29% 43%
2011-2012 Total 51 51% 24% 25%
Lower Division 42 50% 24% 26%
Upper Division 9 56% 22% 22%
2012-2013 Total 56 52% 25% 23%
Lower Division 44 52% 27% 20%
Upper Division 12 50% 17% 33%
2013-2014 Total 11 64% 27% 9%
Lower Division 6 50% 50% 0%
Upper Division 5 80% 0% 20%
Summary
All students lumped into 3 groups
This is the most summarized data we can (read: are willing to) provide.
The bottom line = Bold 3-group number
13
14. But how does that compare?
This is average program data to use as a comparison
14
New Cohort
@1 year @4 years
New Cohort 1 4
Year Semester Class Level
New Cohort Program Success
Non-program
Success
Not Retained Program Success
Non-program
Success
Not Retained
2006-2007 Total 1 64% 16% 21% 44% 24% 32%
Lower Division 40 62% 19% 19% 34% 30% 36%
Upper Division 37 68% 9% 23% 61% 12% 27%
2007-2008 Total 2 59% 16% 25% 38% 27% 36%
Lower Division 40 55% 19% 26% 26% 33% 41%
Upper Division 36 67% 11% 22% 57% 16% 27%
2008-2009 Total 0 56% 19% 25% 30% 32% 38%
Lower Division 27 54% 22% 24% 23% 37% 40%
Upper Division 26 60% 14% 26% 43% 23% 35%
2009-2010 Total 1 39% 38% 23% 31% 28% 41%
Lower Division 42 33% 41% 26% 23% 25% 52%
Upper Division 38 50% 33% 17% 45% 33% 22%
2010-2011 Total 1 67% 13% 20% 57% 12% 31%
Lower Division 38 59% 17% 24% 43% 15% 42%
Upper Division 32 75% 9% 16% 73% 9% 18%
2011-2012 Total 2 66% 8% 26%
Lower Division 64 63% 10% 27%
Upper Division 55 71% 5% 24%
2012-2013 Total 3 66% 9% 25%
Lower Division 51 62% 11% 28%
Upper Division 42 74% 6% 20%
2013-2014 Total 3 65% 14% 21%
Lower Division 58 60% 17% 23%
Upper Division 44 75% 8% 16%
15. 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
2006-2007
2007-2008
2008-2009
2009-2010
2010-2011
2011-2012
2012-2013
2013-2014
Program Success WCU Success Not Retained
Horizontal stacked bar
• Normally we would never do this, but …..
What are people asking:
- How is the program performing?
- How are students performing overall at institution?
- How many are dropping out?
15
16. We graph 5 Flags too
@ 1 year @ 4 year @ 6 year
Transitions to completers and drop-outs
16
0% 50% 100%
2006-2007
2007-2008
2008-2009
2009-2010
2010-2011
2011-2012
2012-2013
2013-2014
0% 50% 100% 0% 50% 100%
Compare performance over time
17. Main question
17
0% 20% 40% 60% 80% 100%
2006-2007
2007-2008
2008-2009
2009-2010
2010-2011
2011-2012
2012-2013
2013-2014
But what about next year?
18.
19. If we study learning as a data science, we can
reverse engineer the human brain and tailor
learning techniques to maximize the chances of
student success. This is the biggest revolution that
could happen in education, turning it into a data-
driven science, and not such a medieval set of
rumors professors tend to carry on.
-Sebastian Thrun
Typical reporting from IR office is University-level only
Only one segment of our population - Historically reported on freshmen cohort
No info on grad students
No info on transfers
No info on those who start part-time
Want to mention
How do I know I am going to retain these students and what can I do about it? What does @2 years look like for 2010-2011?
Sebastian Thrun
Standford University
Co-founder of udacity
Data has been used in a single manner, which is primarily looking backwards and reviewing what has happened.
Timely decisions and impacting the future based on what we know from the past
Rising tide of data on-campus is an opportunitiy
*supervised machine* learning: The program is “trained” on a pre-defined set of “training examples”, which then facilitate its ability to reach an accurate conclusion when given new data.
*Unsupervised machine* learning: The program is given a bunch of data and must find patterns and relationships therein.
Decision trees split larger groups of objects into smaller multiple subgroups, based on rules that use the independent variables (factors) that best explain the dependent variable.
Decision tree induction is the supervised learning of decision tree structure that predicts of classifies future observations based on a set of decision rules.
- The nodes are like Factor Analysis and you can trim the model
-Completely transparent prediction
–Can handle a mix of variable types (numeric & string)
-Intuitive to understand
-Classification & regression trees