SlideShare una empresa de Scribd logo
1 de 154
UNSUPERVISED MACHINE
LEARNING
VITOMIR KOVANOVIĆ
UNIVERSITY OF SOUTH AUSTRALIA
#vkovanovic
vitomir.kovanovic.info
Vitomir.Kovanovic@unisa.edu.au
LEARNING ANALYTICS SUMMER INSTITUTE
TEACHERS COLLEGE, COLUMBIA UNIVERSITY
JUNE 11-13, 2018
SREĆKO JOKSIMOVIĆ
UNIVERSITY OF SOUTH AUSTRALIA
#s_joksimovic
www.sjoksimovic.info
Srecko.Joksimovic@unisa.edu.au
1
About me
• Learning analytics researcher
• Research Fellow, School of Education, UniSA
Data Scientist, Teaching Innovation Unit, UniSA
Member of the Centre for Change and Complexity in Learning (C3L)
• Member of the SoLAR executive board
• Computer science and information systems background
• Used cluster analysis in several research projects
2
About you
• Introduce yourself
• Name, affiliation, position
• Experience with machine learning and clustering
• Experience with Weka or some other ML/DM toolkit
• Ideas for clustering in your own research/work
3
Download data from Dropbox or USB
http://bit.ly/lak18ul
4
Workshop outline
1. Three days, four sessions
2. Equally theoretical and practical
3. Use of Weka Machine Learning toolkit
4. Focus on practical use
5. Examples of clustering use in learning analytics
5
Workshop topics
• Introduction to machine learning & unsupervised methods
• Introduction to cluster analysis
• Overview of cluster analysis use in Learning Analytics
• Introduction to WEKA toolkit
• Overview of the tutorial dataset
• K-means algorithm
• K-means demo
• Hierarchical clustering algorithms
• Hierarchical clustering demo
6
Tutorial topics
• How to choose the number of clusters
• How to interpret clustering results
• Practical challenges
• More advanced cluster analysis approaches
• Statistical methods for comparing clusters
• Clustering real-world data from OU UK
• Discussing different cluster analysis methods
7
Ask questions anytime
8
What is machine learning
9
What is machine learning?
Computing method for making sense of the data
10
Data is everywhere
Each minute:
● 3,600,000 Google searches
● 456,000 Twitter posts
● 46,740 Instagram photos
● 45,787 Uber trips
● 600 new Wikipedia edits
● 13 new Spotify songs
Domo (2017). “Data Never Sleeps 5.0”
https://www.domo.com
11
What products should go on sale?
Grouping of related products
12
What movies to recommend?
Grouping users based on their viewing preferences
13
How to navigate streets?
Processing multiple streams of information in real time
14
Online store product recommendation
15
Fields that influenced machine learning
• Statistics
• Operations research
• Artificial intelligence
• Data visualisation
• Software engineering
• Information systems management
16
How machine learning works?
WILL IT TAKE MY JOB?
17
Two key ideas in machine learning
1.Features
2.Models
18
What is a feature?
1. A feature is a characteristic of a data point
2. Each data point is represented as a vector of features [f1, f2, f3 ... fm]
3. A whole dataset of N data points is represented as a N x M matrix
Data point Feature 1 Feature 2 .... Feature M
Data point 1
Data point 2
....
Data point N
19
What is a feature?
• Performance of machine learning algorithms in large part depend on the
quality of extracted features (how useful they are for a given ML task)
• Expertise and prior knowledge come into play when deciding which features
to extract
20
What is a model?
• Something that capture important patterns in the data
• A model can be used to
• Draw inferences
• Understand the data
• Learn hidden rules
• Support decision making
21
An example model: BMI calculator
• Goal: Predicting a person’s body fat category (overweight, normal, or underweight)
from his height (in m) and weight (in kg).
• Model:
• BMI = weight / height2
• If BMI > 25: overweight
• If BMI < 18.5: underweight
• Otherwise: normal
• An example: 1.75m and 70kg:
BMI: 70/(1.75*1.75) = 22.85 -> Normal category
22
ML Model
How machine learning works?
Slow and hard
Model development
Model use
Response
(Prediction)
A new
data point
Fast and easy
ML Model
Model
buildingN data
points
feature
extraction
NxM
feature
matrix
feature
extraction
feature
vector of
length M
23
Two types of errors
• Bias: The error from erroneous
assumptions of the model.
• High bias: miss the relevant
relationships between variables
(underfitting).
• Variance: The error from sensitivity to
small fluctuations in the data.
• High variance: modelling the
random noise in the data, rather
than real relationships (overfitting).
24
Two types of errors
• We always work with samples
• Samples always have noise
• The trick is to develop models that do not
fit training data, but new future data
25
Two types of errors
26
High bias High variance
The trick is to find optimal model complexity
27
Key machine learning approaches
1. Supervised machine learning
1. Predicting categorical value: Classification
2. Predicting continuous value: Regression
2. Unsupervised machine learning
1. Grouping data points (rows): Cluster analysis
2. Grouping features (columns): Principal Component Analysis (PCA), Factor
Analysis (FA), Latent semantic analysis (LSA), Singular Value Decomposition (SVD)
28
Many more approaches
• Models that blur the division between supervised and unsupervised
• Reinforcement learning: learning the class label after making a prediction
• Neural networks (can be supervised and unsupervised)
• Online learning models: learning as data arrives
• Feature processing methods: association rule mining
29
Supervised learning example
NOT A GRAD SCHOOL
30
10 data points
Data point 1
Data point 2
Data point 3
Data point 4
Data point 5
Data point 6
Data point 7
Data point 8
Data point 9
Data point 10
31
ML Model
How machine learning works?
Slow and hard
Model development
Model use
Response
(Prediction)
A new
data point
Fast and easy
ML Model
Model
buildingN data
points
feature
extraction
NxM feature matrix
feature
extraction
feature vector of length M
32
First step: feature extraction
• From each data point we
extracted four features:
• Number of wheels
• Colour
• Top speed (in km/h)
• Weight (in kg)
• Our feature matrix is 10 x 4
ID Wheels Color Top speed
(km/h)
Weight
(kg)
1 4 Yellow 220 1,200
2 4 Red 180 950
3 2 Blue 260 230
4 2 Red 210 320
5 4 Yellow 160 870
6 4 Blue 170 750
7 4 Red 190 850
8 2 Yellow 140 140
9 2 Yellow 210 310
10 2 Red 240 280
33
Supervised learning: classification
• Each data point is provided with
a continuous numerical label
(outcome variable)
• The goal is to predict the class
label for a new data point
ID Wheels Color Top speed
(km/h)
Weight
(kg)
Label
1 4 Yellow 220 1,200 Car
2 4 Red 180 950 Car
3 2 Blue 260 230 Bike
4 2 Red 210 320 Bike
5 4 Yellow 160 870 Car
6 4 Blue 170 750 Car
7 4 Red 190 850 Car
8 2 Yellow 140 140 Bike
9 2 Yellow 210 310 Bike
10 2 Red 240 280 Bike
[4, Yellow, 260, 1100]
?Car
ID Wheels Color Top speed
(km/h)
Weight
(kg)
1 4 Yellow 220 1,200
2 4 Red 180 950
3 2 Blue 260 230
4 2 Red 210 320
5 4 Yellow 160 870
6 4 Blue 170 750
7 4 Red 190 850
8 2 Yellow 140 140
9 2 Yellow 210 310
10 2 Red 240 280
We learned a model to classify a new
(unseen) vehicle as either a car or a bike
34
Supervised learning: regression
• The goal is to predict the
outcome value for a new
data point
[4, Yellow, 260, 1100]
?140,000
ID Wheels Color Top speed
(km/h)
Weight
(kg)
Label
1 4 Yellow 220 1,200 120,000
2 4 Red 180 950 40,000
3 2 Blue 260 230 63,000
4 2 Red 210 320 53,000
5 4 Yellow 160 870 21,000
6 4 Blue 170 750 37,000
7 4 Red 190 850 21,000
8 2 Yellow 140 140 26,000
9 2 Yellow 210 310 68,000
10 2 Red 240 280 75,000
ID Wheels Color Top speed
(km/h)
Weight
(kg)
1 4 Yellow 220 1,200
2 4 Red 180 950
3 2 Blue 260 230
4 2 Red 210 320
5 4 Yellow 160 870
6 4 Blue 170 750
7 4 Red 190 850
8 2 Yellow 140 140
9 2 Yellow 210 310
10 2 Red 240 280
We learned a model to predict a price of a
new (unseen) vehicle
35
Unsupervised learning example
SAME BUT DIFFERENT
36
Unsupervised learning: clustering
• We want algorithm to group data
points into several groups based
on their similarity
ID Wheels Color Top speed
(km/h)
Weight
(kg)
Group
1 4 Yellow 220 1,200 ?
2 4 Red 180 950 ?
3 2 Blue 260 230 ?
4 2 Red 210 320 ?
5 4 Yellow 160 870 ?
6 4 Blue 170 750 ?
7 4 Red 190 850 ?
8 2 Yellow 140 140 ?
9 2 Yellow 210 310 ?
10 2 Red 240 280 ?
ID Wheels Color Top speed
(km/h)
Weight
(kg)
1 4 Yellow 220 1,200
2 4 Red 180 950
3 2 Blue 260 230
4 2 Red 210 320
5 4 Yellow 160 870
6 4 Blue 170 750
7 4 Red 190 850
8 2 Yellow 140 140
9 2 Yellow 210 310
10 2 Red 240 280
[4, Yellow, 260, 1100]
?1
ID Wheels Color Top speed
(km/h)
Weight
(kg)
Group
1 4 Yellow 220 1,200 1
2 4 Red 180 950 1
3 2 Blue 260 230 2
4 2 Red 210 320 2
5 4 Yellow 160 870 1
6 4 Blue 170 750 1
7 4 Red 190 850 1
8 2 Yellow 140 140 2
9 2 Yellow 210 310 2
10 2 Red 240 280 2
Interpretation of group meaning is up to the
researcher
1=?, 2=?
37
Unsupervised learning: clustering
• We want algorithm to group data
points into several groups based
on their similarity
ID Wheels Color Top speed
(km/h)
Weight
(kg)
Group
1 4 Yellow 220 1,200 ?
2 4 Red 180 950 ?
3 2 Blue 260 230 ?
4 2 Red 210 320 ?
5 4 Yellow 160 870 ?
6 4 Blue 170 750 ?
7 4 Red 190 850 ?
8 2 Yellow 140 140 ?
9 2 Yellow 210 310 ?
10 2 Red 240 280 ?
ID Wheels Color Top speed
(km/h)
Weight
(kg)
1 4 Yellow 220 1,200
2 4 Red 180 950
3 2 Blue 260 230
4 2 Red 210 320
5 4 Yellow 160 870
6 4 Blue 170 750
7 4 Red 190 850
8 2 Yellow 140 140
9 2 Yellow 210 310
10 2 Red 240 280
[4, Yellow, 260, 1100]
?2
ID Wheels Color Top speed
(km/h)
Weight
(kg)
Group
1 4 Yellow 220 1,200 2
2 4 Red 180 950 1
3 2 Blue 260 230 2
4 2 Red 210 320 2
5 4 Yellow 160 870 1
6 4 Blue 170 750 1
7 4 Red 190 850 1
8 2 Yellow 140 140 1
9 2 Yellow 210 310 2
10 2 Red 240 280 2
Pick the grouping of data that is most useful
for your own purpose
38
Introduction to cluster analysis
WE NEED TO START SOMEWHERE
39
Homogeneity
Cluster Distance
Unsupervised categorization:
no predefined classes
What is Cluster Analysis?
40
History
• Anthropology
• Biology
• Computer Science
• Statistics
• Mathematics
• Medicine
• Psychology
• Engineering
41
Primary goals
Gain insights Categorize Compress
42
Social Sciences
• Improve understanding of a domain
• Compress and summarize large datasets
• Within Learning Analytics:
• Profile learners based on their course engagement,
• Discover emerging topics in a corpus (student discussions, course materials)
• Group courses based on their characteristics
43
Example Applications (general)
44
Medicine and genetics
Clustering patients, symptoms, gene expressions
45
Marketing and customer analysis
Understanding customer populations
46
Newspapers and document analysis
Grouping related news articles and summarizing large collections of documents
47
Earthquake prediction
Identifying sources of earthquakes
48
Urban planning
Grouping building based on their properties
49
What is not clustering?
• Simple data partitioning
• Single property
• Predefined groups
• Data clustering
• Multiple properties
• Unforeseen groups
• Combinations of properties describe
groups
50
Important concepts
CLUSTERING IS TRICKY BUSINESS
51
Cluster ambiguity
• How many clusters?
52
Cluster ambiguity
• How many clusters?
Two-cluster solution
53
Cluster ambiguity
• How many clusters?
Four-cluster solution
54
Cluster ambiguity
• How many clusters?
Six-cluster solution
55
Which one?
56
Cluster separation and stability
57
Representing a cluster
• Centroid – a geometrical centre of a cluster • Medoid – data point closest to the centroid
58
What is mean by similar?
• What is meant by “similar data points”?
• Geometry – More similar data points are closer to each other in N-
dimensional feature space
• Yes, but:
• Close to the cluster “centre”
• Closeness to any other data point in a cluster
• Is it about distance between data points or their special density?
59
Any data point or centre
60
Types of clustering approaches
THERE ARE CLUSTERS OF CLUSTERING METHODS
61
Different types of clustering methods
• Membership strictness
• Hard clustering
• Each object either belongs to a cluster
or not
• Soft (fuzzy) clustering
• Each object belongs to each cluster to
some degree
62
Different types of clustering methods
• Membership exclusivity
• Strict partitioning clustering (e.g. K-means)
• Each object belongs to one and only one
cluster
• Strict partitioning clustering with outliers
• Each objects belongs to zero or one
clusters
63
Different types of clustering methods
• Overlapping clustering
• Each object can belong to one or more
“hard” clusters
• Hierarchical clustering
• Objects belonging to a child cluster also
belong to the parent cluster
64
Different types of clustering methods
• Distance-based clustering
• Group objects based on distance
among them
• Density-based clustering
• Group objects based on area they
occupy
65
Special clustering approaches
MAAANY more approaches
• Model-based clustering:
• EM clustering
• Neural network approaches – Self-organising maps
• Grid-based approaches (e.g., STING)
• Clustering algorithms for large datasets
• Clustering of stream data in real time
• Clustering (partitioning) approaches for different types of data (e.g., graphs)
• Clustering approaches for categorical data
• Clustering approaches for freeform clusters (e.g., CURE)
• Clustering approaches for high-dimensional data (e.g., CLIQUE, PROCLUS)
• Constraint-based clustering
• Semi-supervised clustering
66
Multivariate methods
• N Data points have M features
• Find K clusters so that
• Each data point is associated to
each of the K clusters to a certain
degree (0 – none, 1.0 – fully)
• Each of the K clusters is
associated with all M features to
a certain degree
• Find K which maximizes the
likelihood of the observed data
67
Neural network approaches
• Network of connected nodes that propagate signals
• Edges have coefficients that alter signal propagation
• Traditionally supervised learning method
• Backpropagation method of learning coefficients
• Learning method and network structure altered to
support unsupervised learning
• Nodes can move!
• Eventually position of nodes indicate location of
clusters
68
Graph partitioning
• Partitioning network into subgraphs
• Goal to have highly dense subgraphs with
few connections between them
69
Popular distance metric
• Way of calculating
similarity between
different data points.
• Important for methods
based on distances (e.g.,
K-Means, Hierarchical
clustering)
• Have a significant effect
on the final clustering
results
Distance metric Formula
Euclidean distance
𝑎 − 𝑏 2 =
𝑖
𝑎𝑖 − 𝑏𝑖
2
Squared Euclidean distance
𝑎 − 𝑏 2
2
=
𝑖
𝑎𝑖 − 𝑏𝑖
2
Manhattan (Hammington)
distance
𝑎 − 𝑏 1 =
𝑖
𝑎𝑖 − 𝑏𝑖
Maximum distance 𝑎 − 𝑏 ∞ = max
𝑖
𝑎𝑖 − 𝑏𝑖
70
Distance metrics example
• Euclidean: 42 + 32 = 5
• Square Euclidean: 42
+ 32
= 25
• Manhattan: 4 + 3 = 7
• Maximum: max 4, 3 = 4
3
4
71
Clustering for Learning Analytics
• Grouping of:
• Students
• Demographics
• Behavior
• Preferences
• Courses taken
• Academic performance
• Resources
• Reading materials
• Discussions
• Courses
• Course design improvement
72
Some examples
Kovanović, V., Joksimović, S., Gašević, D., Owers, J., Scott, A.-M., & Woodgate, A.
(2016). Profiling MOOC course returners: How does student behaviour change
between two course enrolments? In Proceedings of the Third ACM Conference
on Learning @ Scale (L@S’16) (pp. 269–272). New York, NY, USA: ACM.
https://doi.org/10.1145/2876034.2893431
73
Dataset
• 28 offerings of 11 different
Coursera MOOCs at the University
of Edinburgh
• 26,025 double course enrolment
records
• 52,050 course enrolment records
• K-means clustering
• Too large for clustering methods
that use pairwise distances (e.g.,
hierarchical clustering)
# Course Offering
1 Artificial Intelligence Planning 1,2,3
2 Animal Behavior and Welfare 1,2
3 AstroTech: The Science and Technology behind
Astronomical Discovery
1,2
4 Astrobiology and the Search for Extraterrestrial Life 1,2
5 The Clinical Psychology of Children and Young People 1,2
6 Critical Thinking in Global Challenges 1,2,3
7 E-learning and Digital Cultures 1,2,3
8 EDIVET: Do you have what it takes to be a veterinarian? 1,2
9 Equine Nutrition 1,2,3
10 Introduction to Philosophy 1,2,3,4
11 Warhol 1,2
74
Extracted features
• 9 different features extracted
Feature Description
Days No. of days active
Sub. No. of submitted assignments
Wiki No. of wiki page views
Disc. No. of discussion views
Posts No. of discussion messages written
Quiz. No. of quizzes attempted
Quiz.Uni. No. of different quizzes attempted
Vid.Uni. No. of different videos watched
Vid. No. of videos watched
75
Results
76
Results
Cluster label Students Students
%
Enrol only (E) 22,932 44.1
Low engagement (LE) 21,776 41.8
Videos & Quizzes (VQ) 2,120 4.1
Videos (V) 5,128 9.9
Social (S) 94 0.2
77
Some examples
Kovanović, V., Gašević, D., Joksimović, S., Hatala, M., & Adesope, O. (2015).
Analytics of communities of inquiry: Effects of learning technology use on
cognitive presence in asynchronous online discussions. The Internet and Higher
Education, 27, 74–89. https://doi.org/10.1016/j.iheduc.2015.06.002
78
Clustering features
# Type Code
1 Clustering variables
(content)
ULC UserLoginCount Total number of times student logged into the system.
2 CVC CourseViewCount Total number of times student viewed general course information.
3 AVT AssignmentViewTime Total time spent on all course assignments.
4 AVC AssignmentViewCount Total number of times student opened one of the course assignments.
5 RVT ResourceViewTime Total time spent on reading the course resources.
6 RVC ResourceViewCount Total number of times student opened one of the course resource materials.
7 Clustering variables
(discussion)
FSC ForumSearchCount Total number of times student used search function on the discussion boards.
8 DVT DiscussionViewTime Total time spent on viewing course’s online discussions.
9 DVC DiscussionViewCount Total number of time student opened one of the course’s online discussions.
10 APT AddPostTime Total time spent on posting discussion board messages.
11 APC AddPostCount Total number of the discussion board messages posted by the student.
12 UPT UpdatePostTime Total time spent on updating one of his discussion board messages.
13 UPC UpdatePostCount Total number of times student updated one of his discussion board messages.
79
Results
80
Cluster interpretations
# Size Label
1 21 Task-focused users Overall below average activity,
Above average message posting activity
2 15 Content-focused users Below average discussions-related activity,
Average content-related activity, emphasis on assignments
3 22 No-users Overall below average activity,
slightly bigger in discussion-related activities
4 3 Highly intensive users Significantly most active students,
especially in content-related activities
5 6 Content-focused intensive users Above average content-related activity,
Average discussion-related activity
6 14 Socially-focused intensive users Above average discussion-related activity,
Average content-related activity
81
Some examples
Almeda, M. V., Scupelli, P., Baker, R. S., Weber, M., & Fisher, A. (2014). Clustering
of Design Decisions in Classroom Visual Displays. In Proceedings of the Fourth
International Conference on Learning Analytics and Knowledge (pp. 44–48). New
York, NY, USA: ACM. https://doi.org/10.1145/2567574.2567605
82
Clustering visual designs of classrooms
• 30 schools in northwestern USA
• Classroom Wall Coding System, CWaCS 1.0.
• Each classroom wall was photographed
• Units of analysis were marked with a box
• Coding scheme:
1. Academic
1. Academic topics (F1)
2. Academic organizational (F2)
2. Non-academic (F3)
3. Behavioural (F4)
• Adopted K-means to cluster classrooms
based on frequency of four features (F1-F4)
• Academic organizational
1. Goals for the day
2. Group assignments
3. Job charts
4. Labels
5. Schedule day/week
6. Yearly
7. Schedule
8. Skills
9. Homework
• Behavior materials
1. Behavior management
2. Progress charts
3. Rules
4. Other behaviour
• Academic topics
1. Behavior
2. Content specific
3. Procedures
4. Resources
5. Calendars/clocks
6. Other
• Non-academic
1. Motivational slogans
2. Decorations
3. Decorative frames
4. Student art
5. Other non-academic
83
Clusters
84
Some examples
Ferguson, R., Clow, D., Beale, R., Cooper, A. J., Morris, N., Bayne, S., & Woodgate,
A. (2015). Moving Through MOOCS: Pedagogy, Learning Design and Patterns of
Engagement. In Design for Teaching and Learning in a Networked World (pp. 70–
84). Springer International Publishing. https://doi.org/10.1007/978-3-319-
24258-3_6
85
Features
Possible combinations:
• 1 = Visited content only
• 2 = Posted comment but visited no new content
• 3 = Visited content and posted comment
• 4 = Submitted the assessment late
• 5 = Visited content and submitted assessment late
• 6 = Posted late assessment, saw no new content
• 7 = Visited content, posted, late assessment
• 8 = Submitted assessment early /on time
• 9 = Visited content, assessment early /on time
• 10 = Posted, assessment early /on time, no new content
• 11 = Visited, posted, assessment early /on time
For each course week, we assigned learners
an activity score:
• 1 if they viewed content
• 2 if they posted a comment
• 4 if they submitted their assessment in a
subsequent week
• 8 if they submitted it early or on time
• Adopted K-means
86
1. Samplers
2. Strong Starters
3. Returners
4. Midway Dropouts
5. Nearly There
6. Late Completers
7. Keen Completers
87
Further examples
Lust, G., Elen, J., & Clarebout, G. (2013). Regulation of tool-use within a blended course: Student differences and performance
effects. Computers & Education, 60(1), 385–395. https://doi.org/10.1016/j.compedu.2012.09.001
Wise, A. F., Speer, J., Marbouti, F., & Hsiao, Y.-T. (2013). Broadening the notion of participation in online discussions: examining
patterns in learners’ online listening behaviors. Instructional Science, 41(2), 323–343. https://doi.org/10.1007/s11251-012-9230-9
Niemann, K., Schmitz, H.-C., Kirschenmann, U., Wolpers, M., Schmidt, A., & Krones, T. (2012). Clustering by Usage: Higher Order Co-
occurrences of Learning Objects. In Proceedings of the 2Nd International Conference on Learning Analytics and Knowledge (pp.
238–247). New York, NY, USA: ACM. https://doi.org/10.1145/2330601.2330659
Cobo, G., García-Solórzano, D., Morán, J. A., Santamaría, E., Monzo, C., & Melenchón, J. (2012). Using Agglomerative Hierarchical
Clustering to Model Learner Participation Profiles in Online Discussion Forums. In Proceedings of the 2Nd International Conference
on Learning Analytics and Knowledge (pp. 248–251). New York, NY, USA: ACM. https://doi.org/10.1145/2330601.2330660
Crossley, S., Roscoe, R., & McNamara, D. S. (2014). What Is Successful Writing? An Investigation into the Multiple Ways Writers Can
Write Successful Essays. Written Communication, 31(2), 184–214. https://doi.org/10.1177/0741088314526354
Hecking, T., Ziebarth, S., & Hoppe, H. U. (2014). Analysis of Dynamic Resource Access Patterns in Online Courses. Journal of Learning
Analytics, 1(3), 34–60.
Li, N., Kidziński, Ł., Jermann, P., & Dillenbourg, P. (2015). MOOC Video Interaction Patterns: What Do They Tell Us? In Proceedings of
the 10th European Conference on Technology Enhanced Learning (pp. 197–210). Springer International Publishing.
https://doi.org/10.1007/978-3-319-24258-3_15
88
K-Means clustering
• The most widely used clustering algorithm
• Very simple, decent results
• Produces “circular” clusters
• Iterative algorithm
• Initial position of cluster centroids random
• Often done multiple times and results averaged out (e.g., 1,000 times)
89
K-Means algorithm
1. Pick the number of clusters K
2. Pick K centroids in the N-dimensional feature space 𝑐𝑖
𝑁
, 𝑖 ∈ 1 … 𝐾
3. For each of P data points 𝑝𝑖
𝑁
:
1. Calculate the distance to each of the K centroids
2. Assign it to its closest centroid
4. Recalculate centroid positions based on the assigned data points
5. Repeat steps 3–5 until centroid positions stabilize (i.e., there is no change in step 4)
90
K-Means interactive demo
91
K-Means characteristics
• The final solution depends a lot on the original random centroid positions
• The algorithm is often repeated (restarted) many times.
• Restart K-means R (e.g., 1,000) times.
• For each of the data points there will be R cluster assignments.
• For each data point, pick the cluster assignment which was most common among
R assignments
92
K-Means characteristics
• The algorithm is easy to implement
• Petty fast, converges very quickly
• For N data points, requires calculation of N*K distances (which is 𝑂 𝑁 )
• Produces circular clusters – can be a problem in some domains
• Susceptive to outliers: Each data point will be assigned to one of the centroids and
can shift its centroid significantly “off side”
• The number of clusters must be provided
• Can be stuck in a local optima (solved often by multiple runs)
93
K-Means variants
• K-Means++
• “Smart” picking of the initial centroids (a.k.a. seeds)
• Seed selection algorithm:
• Pick the first seed randomly (uniform distribution across the whole space)
• Pick the next seed with a probability which is a squared distance from the closest seed
• Effectively “spreads” the seed centroids across the feature space
• K-Medoids & its flavours (Partitioning Around Medoids - PAM)
• The solution to outlier problem: Instead of using centroid, use medoid.
• Instead of representing clusters with centres, use existing data points to represents
clusters
94
PAM algorithm (Partitioning Around
Medoids)
1. One variant of K-Medoids
2. Pick the number of clusters K
3. Pick K data points in the N-dimensional feature space 𝑚𝑖
𝑁
, 𝑖 ∈ 1 … 𝐾 which will be initial cluster
representatives
4. Assign each of remaining M-K data points 𝑝𝑖
𝑁
to the closest representative
5. For each representative point 𝑜𝑗:
1. Pick a random non-representative data point from its cluster 𝑜 𝑟𝑎𝑛𝑑𝑜𝑚
2. Check if swapping 𝑜𝑗 with 𝑜 𝑟𝑎𝑛𝑑𝑜𝑚 produces clusters with smaller “errors” (the sum of all clusters’
absolute differences between their data points and representatives)
3. If the new cost is smaller than the original cost, keep 𝑜 𝑟𝑎𝑛𝑑𝑜𝑚 as a representative point
6. Repeat steps 4–6 until there are no changes in representative objects
95
K-Means variants
• X-Means
• Does not require number of clusters K to be specified
• Refines clustering solution by splitting existing clusters
• Keeps the clustering configuration which maximizes AIC (Akaike information
criterion) or BIC (Bayesian information criterion)
• Implemented in WEKA
• Cascading K-Means
• Restarts K-means with different K and picks the K that maximizes Calinski and
Harabasz criterion (F value in ANOVA)
• Implemented in WEKA
96
K-Means variants
• Large datasets variants: CLARA (Clustering LARge Applications) and CLARANS (Clustering Large
Applications upon RANdomized Search)
• CLARA: Use a sample of data points as potential candidate medoids and run PAM.
• CLARANS Add randomization so the sample is not fixed at the start
• Fuzzy C-means
• Each data point can belong to multiple clusters with different probabilities (up to 100% for
all clusters)
• Also assesses the compactness of each cluster
• Compact clusters will have members with high probabilities
97
Running K-means & X-Means in WEKA
98
Hierarchical clustering
• Next to k-means, very popular method for cluster analysis
• Two key flavours
• agglomerative
• divisive
• Especially usable for small datasets
• Evaluate and pick the number of clusters visually
• The height of the merge/split indicates the distance
• Used extensively in Learning Analytics
• Many variants, using Linkage Functions
99
Agglomerative hierarchical clustering
• Build the clusters from bottom-up
• Algorithm:
• Build a singleton cluster for each data point
• Repeat until all data in a single cluster:
• find two closest clusters (based on linkage function)
• merge these two together
• Run Interactive DEMO
100
Agglomerative hierarchical clustering
• Requires calculation of the distances between all cluster pairs
• At step 1 – this means calculating all pairwise distances among data points
• N data points – N*N/2 distances
• Not feasible for large datasets
101
Divisive hierarchical clustering
• All data start in a single cluster, then we split one cluster at each step.
• More complex than agglomerative (how to split a cluster?)
• Less popular than agglomerative algorithms
• Can be faster as we do not need to go all the way to the bottom of the dendrogram
• Many approaches, often use “flat” algorithm as a partitioning method (e.g., K-means)
102
Example divisive clustering with K-means
• Start with all data in a single cluster
• Use K-means to create two initial clusters A2 and B2
• Use K-means to divide A2 into A2-1 and A2-2
• Use K-means to divide B2 into B2-1 and B2-2
• Pick between:
• A2-1, A2-2, B2
• A2, B2-1, B2-2
• Call the best combination A3, B3, and C3
• Repeat the division of each cluster into two clusters. Pick between:
• A3-1, A3-2, B3, C3
• A3, B3-1, B3-2, C3
• A3, B3, C3-1, C3-2
103
Linkage functions
• Key question for agglomerative clustering: How to pick two clusters to merge
• What is meant by “closest”
• Several different criteria. Most popular
• Single-linkage: Minimal distance between any two data points
• Complete-linkage: Maximal distance between any two data points
• Average-linkage: Distance between cluster centroids
• Ward’s method: pick the pair of clusters so that the new cluster has minimal possible sum
of squares of distances. Minimizes the variation within the clusters.
104
Linkage functions visualisations
105
Linkage functions
• Different linkage functions can produce very different results
Single linkage Complete linkage Ward’s criteria
106
Hierarchical clustering in Weka
FUN PART
107
Short intro to WEKA
108
What is Weka?
Software “workbench”
Waikato Environment for Knowledge Analysis
(WEKA)
109
Installing Weka
• https://www.cs.waikato.ac.nz/ml/weka/index.html
• Very powerful, lot of resources available
• Good for fast prototyping, much faster than R/Python
• Can be used
• Through GUI, which is very quirky and has hidden “gems”
• From command line (useful for integrating with other tools/scripts)
• As a Java library
• Not the best designed UI, clearly done by the developers
• Great book about ML/DM/Weka
https://www.cs.waikato.ac.nz/ml/weka/book.html
• Many demo datasets included in Weka
https://www.cs.waikato.ac.nz/ml/weka/datasets.html
110
Weka Interfaces
Will be used throughout the course
Performance comparisons
Graphical front - alternative to Explorer
Unified interface
Command-line interface
111
Weka Explorer
112
Weka Explorer
113
Attribute Relation File Format (ARFF)
Data.CSV
5.1, 3.5, 1.4, 0.2, Iris-setosa
4.9, 3.0, 1.4, 0.2, Iris-setosa
4.7, 3.2, 1.3, 0.2, Iris-setosa
4.6, 3.1, 1.5, 0.2, Iris-setosa
5.0, 3.6, 1.4, 0.2, Iris-setosa
Data.ARFF
@RELATION iris
@ATTRIBUTE sepallength REAL
@ATTRIBUTE sepalwidth REAL
@ATTRIBUTE petallength REAL
@ATTRIBUTE petalwidth REAL
@ATTRIBUTE class {Iris-setosa,Iris-
versicolor,Iris-virginica}
@DATA
5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
4.6,3.1,1.5,0.2,Iris-setosa
5.0,3.6,1.4,0.2,Iris-setosa
114
WEKA skills
• Loading/Saving data (CSV ARFF)
• Filtering and pre-processing attributes
• Visualising data
• Clustering
• Saving clustering results
115
Lets play with WEKA
116
Selecting the number of clusters
• Clustering is user-centric and subjective
• How to pick the number of clusters?
• Based on background knowledge (e.g., educational theory)
• Use an algorithm that calculates optimal number of clusters automatically (e.g., EM)
• Use an algorithm that provides a visual overview of all clustering configurations (e.g.,
hierarchical clustering)
• Use supervised clustering algorithm where clustering process is guided by the user
• Evaluate multiple values for K manually
• “Elbow” method: trade-off point between number of clusters and within cluster variance
• Silhouette method: test robustness of cluster membership
117
Elbow method
• As K increases, the average
diameter (variance) of clusters
is also getting smaller
• Find a “sweet spot” at which
the decrease in variance sharply
changes
• Sometimes not so clear
118
Silhouette method
• Visual method for determining the number of clusters
• 𝑎 𝑖 – the average distance of point 𝑖 to all other points in its cluster
• 𝑏(𝑖) – the smallest average distance of point 𝑖 to points in another cluster (distance to the
closest neighbouring cluster)
• 𝑠 𝑖 =
𝑏 𝑖 −𝑎(𝑖)
max 𝑎 𝑖 ,𝑏(𝑖)
• 𝑠 𝑖 =
𝑏 𝑖 −𝑎(𝑖)
𝑏(𝑖)
, 𝑖𝑓𝑏 𝑖 > 𝑎(𝑖)
0, 𝑖𝑓 𝑏 𝑖 = 𝑎(𝑖)
𝑏 𝑖 −𝑎(𝑖)
𝑎(𝑖)
, 𝑖𝑓𝑏 𝑖 < 𝑎(𝑖)
119
Silhouette method
• 𝑠 𝑖 =
1 −
𝑎(𝑖)
𝑏(𝑖)
, 𝑖𝑓𝑏 𝑖 > 𝑎(𝑖)
0, 𝑖𝑓 𝑏 𝑖 = 𝑎(𝑖)
𝑏 𝑖
𝑎(𝑖)
− 1, 𝑖𝑓𝑏 𝑖 < 𝑎(𝑖)
• 𝑠 𝑖 =
1, 𝑖𝑓𝑏 𝑖 ≫ 𝑎 𝑖
0, 𝑖𝑓 𝑏 𝑖 = 𝑎(𝑖)
−1, 𝑖𝑓𝑏 𝑖 ≪ 𝑎(𝑖)
• 𝑠 𝑖 =
1, 𝑔𝑜𝑜𝑑 𝑓𝑖𝑡 𝑡𝑜 𝑡ℎ𝑒 𝑐𝑙𝑢𝑠𝑡𝑒𝑟
0, 𝑏𝑜𝑟𝑑𝑒𝑟𝑙𝑖𝑛𝑒 𝑓𝑖𝑡
−1, 𝑏𝑎𝑑 𝑓𝑖𝑡 𝑡𝑜 𝑡ℎ𝑒 𝑐𝑙𝑢𝑠𝑡𝑒𝑟
120
K=2 example
121
K=3 example
122
K=4 example
123
K=5 example
124
K=6 example
125
Silhouette coefficient
• The average 𝑠(𝑖) for all data points
• Pick the number of clusters that
maximizes the average silhouette
coefficient
126
Silhouette coefficient and Elbow
method in WEKA
127
Challenges and solutions
• High dimensionality & feature (attribute) selection
• Categorical attributes
• “Weirdly-shaped” clusters
• Outliers
128
Curse of dimensionality
• Euclidean distance metric 𝑖 𝑎𝑖 − 𝑏𝑖
2
• In a highly-dimensional space with 𝑑 dimensions (0.0–1.0):
• Highly likely that for at least for one feature 𝑖, the value 𝑎𝑖 − 𝑏𝑖 will be close to 1.0
• This puts the lower bound on the distance at 1.0
• However, the upper bound is 𝑑
• Most pairs of data points being far from this upper bound
• Most data points have distance close to average distance
• Many irrelevant dimensions for most clusters
• The noise in irrelevant dimensions masks real differences between clusters
129
Curse of dimensionality: some solutions
• Feature transformation methods (essentially compression)
• Creating smaller number of new, synthetic features based on the larger number of input
features which are then used for clustering
• Principal component analysis (PCA)
• Singular-value decomposition (SVD)
• Feature (attribute) selection methods
• Searching for a subset of features that are relevant for a given domain.
• Minimize entropy
• Idea that feature spaces that contain tight clusters have low entropy
• Subspace clustering – extension of attribute selection
130
Curse of dimensionality: some solutions
• Popular algorithms:
• CLIQUE: A Dimension-Growth Subspace Clustering Method
• Start with a single dimension and grow space by adding new dimensions
• PROCLUS: A Dimension-Reduction Subspace Clustering Method
• Starts with the complete high-dimensional space and assigns weight of each
dimension for every cluster which are used to regenerate clusters
• Explores dense subspace regions
131
Categorical data
• Most clustering algorithms focus on clustering with continuous numerical attributes (ratio
variables)
• How to cluster categorical data? E.g., clustering students based on their demographic
characteristics:
• Gender
• Program
• Study level (postgraduate vs. undergraduate)
• Domestic/international
132
Categorical data: simple solution
• Ignore the problem, threat categorical data as numerical:
• Male: 1, Female: 2
• Domestic: 1, International: 2
• Often does not produce good results.
• Distance metric is not meaningful.
• Point A: (Male, Domestic)
• Point B: (Female, Domestic)
• Point C: (Female, International)
• Is point B closer to point A or point C?
• Depends on the information value of these two features
• “Localized method“
• If two distinct clusters have few points that are close, they might be merged together
incorrectly.
133
C
A B
Gender
Dom/Intl
1 2
1
2
Categorical data: custom algorithms
• ROCK (RObust Clustering using linKs)
• A Hierarchical Clustering Algorithm for Categorical Attributes
• Two data points are similar is they have similar neighbours
• Typical example: market basket data
134
“Weirdly-shaped” clusters
• Most algorithms focus on distance
between data points
• However, often the connectedness of
data points is also important
• Different algorithms developed for
these situations
135
Different types of clustering methods
• Distance-based clustering
• Group objects based on distance
among them
• Density-based clustering
• Group objects based on area they
occupy
136
CURE
• Pick a subsample of data and cluster it using a
method such as hierarchical clustering
• Pick N characteristic points per each cluster that
are most distant from each other.
• Move representative points for a fraction towards
cluster centroid.
• Merge two clusters which have representative
points sufficiently close.
137
DBSCAN
• DBSCAN (Density-based spatial clustering of
applications with noise)
• Density-based algorithm
• Searches for areas with large number of points
• Implemented in WEKA
• General idea:
• Each data point is either core point, reachable
point or outlier
• Core points have minP (parameter) points
around them in the radius r (parameter)
• Reachable points are in radius r of a core point
• Every other data point is an outlier
138
minP=4
Red: core
Yellow: reachable
Blue: outliers
Self-organising maps (SOM)
• Special type of neural network
• Used to learn the contour of the underlying data
• Neuron laid out in a grid structure, each neuron connected
with neighbours and all input nodes
• For each data point, a neuron which is closest to it gets
adjusted, with adjustments being propagated to
neighbouring neurons
• Over time, neurons will position themselves in the shape of
the data
• Dense areas with many neurons indicate clusters
139
Self-organising maps
140
Expectation-maximization (EM) clustering
• Much more general than clustering
• Used to estimate hidden (latent) parameters
• Does not require number of clusters to be specified
• General idea:
• Pick number of clusters K
• Fit K distributions over clustering variables with their parameters P
• Estimate likelihood of all data points being generated by any of the K distributions (expectation)
• For every data point, sum likelihoods of being generated by any of the K distributions
• Combine weights with the data to produce new estimates for parameters P
• Repeat until convergence is reached (no parameter change)
141
Expectation-maximization (EM) clustering
142
Expectation-maximization (EM) clustering
143
Analysis of cluster differences
144
Analysis of cluster differences
• We can check the differences between clusters with regards to
• Clustering variables (e.g., number of logins, number of discussion posts)
• Some additional variables (e.g., student grades, age, gender)
• We can examine difference
• One variable at a time (univariate differences)
• Across multiple variables simultaneously (multivariate differences)
• Takes into consideration the interaction among multiple variables
145
Univariate analysis of cluster differences
• For every variable we can use parametric and non-parametric univariate tests:
• Two clusters: t-test and Mann-Whitney
• Three or more clusters: One-way ANOVA and Kruskal-Wallis
• Requires p-value adjustment (e.g., Bonferroni, Holm-Bonferroni correction)
• Whether to use parametric or non-parametric primarily depends on the homogeneity
(equality) of variance assumption
• Can be tested with Levene’s test
• If Levene’s test shows p<.05, use Mann-Whitney and Kruskal-Wallis
• Significant ANOVAs tests can be followed by pairwise tests (e.g., TukeyHSD)
• Significant Kruskal-Wallis tests can be followed by pairwise KW tests (with also p-
value correction)
146
Multivariate analysis of cluster differences
• We can test differences across all variables at the same time
• More holistic than ANOVA/KW
• Instead of one dependent variable, we can have multiple variables
• Use meaningful groups of variables (e.g., behavioural variables)
• MANOVA: Multivariate analysis of variance
• Step “before” ANOVAs/KWs
• Has several statistics: Wilk’s Λ, Pillai’s Trace
• Assumption: Homogeneity of covariance
• Much trickier to test: Box’s M one method, but it is very sensitive (use p<.001)
• Use Levene’s tests on each of the variables (doesn’t guarantee homogeneity of
covariance but might help)
• If assumption is violated, still can be used but with more robust metric (Pillai’s Trace)
147
Example MANOVA
“For assessing the difference between student clusters a multivariate analysis of
variance (MANOVA) was used. To validate the difference between the discovered
clusters a MANOVA model with cluster assignment as a single independent variable and
thirteen clustering variables as the dependent measures was constructed…”
Kovanović, V., Gašević, D., Joksimović, S., Hatala, M., & Adesope, O. (2015). Analytics of communities of
inquiry: Effects of learning technology use on cognitive presence in asynchronous online discussions. The
Internet and Higher Education, 27, 74–89. doi:10.1016/j.iheduc.2015.06.002
148
Example MANOVA
“Before running MANOVAs, … the homogeneity of covariances assumption was checked using
Box’s M test and homogeneity of variances using Levine’s test. To protect from the assumption
violations, we log-transformed the data and used the Pillai’s trace statistic which is considered to
be a robust against assumption violations. As a final protection measure, obtained MANOVA
results were compared with the results of the robust rank-based variation of the MANOVA
analysis”
Kovanović, V., Gašević, D., Joksimović, S., Hatala, M., & Adesope, O. (2015). Analytics of communities of
inquiry: Effects of learning technology use on cognitive presence in asynchronous online discussions. The
Internet and Higher Education, 27, 74–89. doi:10.1016/j.iheduc.2015.06.002
149
Example MANOVA
“The assumption of homogeneity of covariances was tested using Box’s M test which was not
accepted. Thus, Pillai’s trace statistic was used, as it is more robust to the assumption violations
together with the Bonferroni correction method. A statistically significant MANOVA effect was
obtained, Pillai’s Trace = 1.62, F(39, 174) = 5.28, p < 10-14”
Kovanović, V., Gašević, D., Joksimović, S., Hatala, M., & Adesope, O. (2015). Analytics of communities of
inquiry: Effects of learning technology use on cognitive presence in asynchronous online discussions. The
Internet and Higher Education, 27, 74–89. doi:10.1016/j.iheduc.2015.06.002
150
MANOVA Follow-up analyses
• Significant MANOVA can be followed-up with two types of analyses:
• Individual ANOVAs/KWs (with p-value correction)
• Which in turn can be followed with pairwise analyses: TukeyHSD/Pairwise KWs
• Discriminatory Factor Analysis (DFA)
• What combinations of variables differentiate between clusters
• DFA can be run alone (without MANOVA) but its significance then can’t be tested
151
152
Kovanović, V., Gašević, D., Joksimović, S., Hatala, M., & Adesope, O. (2015). Analytics of communities
of inquiry: Effects of learning technology use on cognitive presence in asynchronous online
discussions. The Internet and Higher Education, 27, 74–89. doi:10.1016/j.iheduc.2015.06.002
Example DFA analysis
153
Kovanović, V., Gašević, D., Joksimović, S., Hatala, M., & Adesope, O. (2015). Analytics of communities
of inquiry: Effects of learning technology use on cognitive presence in asynchronous online
discussions. The Internet and Higher Education, 27, 74–89. doi:10.1016/j.iheduc.2015.06.002
Data from
OU UK
154

Más contenido relacionado

Similar a Unsupervised Learning for Learning Analytics Researchers

Machine learning & Time Series Analysis , Finlab CTO 韓承佑
Machine learning & Time Series Analysis ,  Finlab CTO 韓承佑Machine learning & Time Series Analysis ,  Finlab CTO 韓承佑
Machine learning & Time Series Analysis , Finlab CTO 韓承佑TaiLiLuo
 
UnSupervised Learning Clustering
UnSupervised Learning ClusteringUnSupervised Learning Clustering
UnSupervised Learning ClusteringFEG
 
5 analytic hierarchy_process
5 analytic hierarchy_process5 analytic hierarchy_process
5 analytic hierarchy_processFEG
 
analytic hierarchy_process
analytic hierarchy_processanalytic hierarchy_process
analytic hierarchy_processFEG
 
An Exploration of Ranking-based Strategy for Contextual Suggestions
An Exploration of Ranking-based Strategy for Contextual SuggestionsAn Exploration of Ranking-based Strategy for Contextual Suggestions
An Exploration of Ranking-based Strategy for Contextual SuggestionsTwitter Inc.
 
Advances in Exploratory Data Analysis, Visualisation and Quality for Data Cen...
Advances in Exploratory Data Analysis, Visualisation and Quality for Data Cen...Advances in Exploratory Data Analysis, Visualisation and Quality for Data Cen...
Advances in Exploratory Data Analysis, Visualisation and Quality for Data Cen...Hima Patel
 
Combining machine learning and search through learning to rank
Combining machine learning and search through learning to rankCombining machine learning and search through learning to rank
Combining machine learning and search through learning to rankJettro Coenradie
 
Training in Analytics, R and Social Media Analytics
Training in Analytics, R and Social Media AnalyticsTraining in Analytics, R and Social Media Analytics
Training in Analytics, R and Social Media AnalyticsAjay Ohri
 
Applied Machine Learning for Ranking Products in an Ecommerce Setting
Applied Machine Learning for Ranking Products in an Ecommerce SettingApplied Machine Learning for Ranking Products in an Ecommerce Setting
Applied Machine Learning for Ranking Products in an Ecommerce SettingDatabricks
 
Local collaborative autoencoders (WSDM2021)
Local collaborative autoencoders (WSDM2021)Local collaborative autoencoders (WSDM2021)
Local collaborative autoencoders (WSDM2021)민진 최
 
Combining machine learning and search through learning to rank
Combining machine learning and search through learning to rankCombining machine learning and search through learning to rank
Combining machine learning and search through learning to rankJettro Coenradie
 
Machine Learning for Data Extraction
Machine Learning for Data ExtractionMachine Learning for Data Extraction
Machine Learning for Data ExtractionDasha Herrmannova
 
Machine learning for sensor Data Analytics
Machine learning for sensor Data AnalyticsMachine learning for sensor Data Analytics
Machine learning for sensor Data AnalyticsMATLABISRAEL
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender SystemsMarcel Kurovski
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systemsinovex GmbH
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkIvo Andreev
 
Final presentation optical flow estimation with DL
Final presentation  optical flow estimation with DLFinal presentation  optical flow estimation with DL
Final presentation optical flow estimation with DLLeapMind Inc
 
Advanced Algorithms #1 - Union/Find on Disjoint-set Data Structures.
Advanced Algorithms #1 - Union/Find on Disjoint-set Data Structures.Advanced Algorithms #1 - Union/Find on Disjoint-set Data Structures.
Advanced Algorithms #1 - Union/Find on Disjoint-set Data Structures.Andrea Angella
 

Similar a Unsupervised Learning for Learning Analytics Researchers (20)

Machine learning & Time Series Analysis , Finlab CTO 韓承佑
Machine learning & Time Series Analysis ,  Finlab CTO 韓承佑Machine learning & Time Series Analysis ,  Finlab CTO 韓承佑
Machine learning & Time Series Analysis , Finlab CTO 韓承佑
 
Machine learning & Time Series Analysis
Machine learning & Time Series AnalysisMachine learning & Time Series Analysis
Machine learning & Time Series Analysis
 
UnSupervised Learning Clustering
UnSupervised Learning ClusteringUnSupervised Learning Clustering
UnSupervised Learning Clustering
 
5 analytic hierarchy_process
5 analytic hierarchy_process5 analytic hierarchy_process
5 analytic hierarchy_process
 
analytic hierarchy_process
analytic hierarchy_processanalytic hierarchy_process
analytic hierarchy_process
 
An Exploration of Ranking-based Strategy for Contextual Suggestions
An Exploration of Ranking-based Strategy for Contextual SuggestionsAn Exploration of Ranking-based Strategy for Contextual Suggestions
An Exploration of Ranking-based Strategy for Contextual Suggestions
 
Advances in Exploratory Data Analysis, Visualisation and Quality for Data Cen...
Advances in Exploratory Data Analysis, Visualisation and Quality for Data Cen...Advances in Exploratory Data Analysis, Visualisation and Quality for Data Cen...
Advances in Exploratory Data Analysis, Visualisation and Quality for Data Cen...
 
Combining machine learning and search through learning to rank
Combining machine learning and search through learning to rankCombining machine learning and search through learning to rank
Combining machine learning and search through learning to rank
 
Training in Analytics, R and Social Media Analytics
Training in Analytics, R and Social Media AnalyticsTraining in Analytics, R and Social Media Analytics
Training in Analytics, R and Social Media Analytics
 
Applied Machine Learning for Ranking Products in an Ecommerce Setting
Applied Machine Learning for Ranking Products in an Ecommerce SettingApplied Machine Learning for Ranking Products in an Ecommerce Setting
Applied Machine Learning for Ranking Products in an Ecommerce Setting
 
Local collaborative autoencoders (WSDM2021)
Local collaborative autoencoders (WSDM2021)Local collaborative autoencoders (WSDM2021)
Local collaborative autoencoders (WSDM2021)
 
Combining machine learning and search through learning to rank
Combining machine learning and search through learning to rankCombining machine learning and search through learning to rank
Combining machine learning and search through learning to rank
 
Machine Learning for Data Extraction
Machine Learning for Data ExtractionMachine Learning for Data Extraction
Machine Learning for Data Extraction
 
Machine learning for sensor Data Analytics
Machine learning for sensor Data AnalyticsMachine learning for sensor Data Analytics
Machine learning for sensor Data Analytics
 
Machine learning
Machine learning Machine learning
Machine learning
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
 
Final presentation optical flow estimation with DL
Final presentation  optical flow estimation with DLFinal presentation  optical flow estimation with DL
Final presentation optical flow estimation with DL
 
Advanced Algorithms #1 - Union/Find on Disjoint-set Data Structures.
Advanced Algorithms #1 - Union/Find on Disjoint-set Data Structures.Advanced Algorithms #1 - Union/Find on Disjoint-set Data Structures.
Advanced Algorithms #1 - Union/Find on Disjoint-set Data Structures.
 

Más de Vitomir Kovanovic

Introduction to Learning Analytics for High School Teachers and Managers
Introduction to Learning Analytics for High School Teachers and ManagersIntroduction to Learning Analytics for High School Teachers and Managers
Introduction to Learning Analytics for High School Teachers and ManagersVitomir Kovanovic
 
Extending video interactions to support self-regulated learning in an online ...
Extending video interactions to support self-regulated learning in an online ...Extending video interactions to support self-regulated learning in an online ...
Extending video interactions to support self-regulated learning in an online ...Vitomir Kovanovic
 
Analysing social presence in online discussions through network and text anal...
Analysing social presence in online discussions through network and text anal...Analysing social presence in online discussions through network and text anal...
Analysing social presence in online discussions through network and text anal...Vitomir Kovanovic
 
Automated Analysis of Cognitive Presence in Online Discussions Written in Por...
Automated Analysis of Cognitive Presence in Online Discussions Written in Por...Automated Analysis of Cognitive Presence in Online Discussions Written in Por...
Automated Analysis of Cognitive Presence in Online Discussions Written in Por...Vitomir Kovanovic
 
Validating a theorized model of engagement in learning analytics
Validating a theorized model of engagement in learning analyticsValidating a theorized model of engagement in learning analytics
Validating a theorized model of engagement in learning analyticsVitomir Kovanovic
 
Examining the Value of Learning Analytics for Supporting Work-integrated Lear...
Examining the Value of Learning Analytics for Supporting Work-integrated Lear...Examining the Value of Learning Analytics for Supporting Work-integrated Lear...
Examining the Value of Learning Analytics for Supporting Work-integrated Lear...Vitomir Kovanovic
 
Developing Self-regulated Learning in High-school Students: The Role of Learn...
Developing Self-regulated Learning in High-school Students: The Role of Learn...Developing Self-regulated Learning in High-school Students: The Role of Learn...
Developing Self-regulated Learning in High-school Students: The Role of Learn...Vitomir Kovanovic
 
Introduction to R for Learning Analytics Researchers
Introduction to R for Learning Analytics ResearchersIntroduction to R for Learning Analytics Researchers
Introduction to R for Learning Analytics ResearchersVitomir Kovanovic
 
A novel model of cognitive presence assessment using automated learning analy...
A novel model of cognitive presence assessment using automated learning analy...A novel model of cognitive presence assessment using automated learning analy...
A novel model of cognitive presence assessment using automated learning analy...Vitomir Kovanovic
 
Introduction to Learning Analytics
Introduction to Learning AnalyticsIntroduction to Learning Analytics
Introduction to Learning AnalyticsVitomir Kovanovic
 
Introduction to Epistemic Network Analysis
Introduction to Epistemic Network AnalysisIntroduction to Epistemic Network Analysis
Introduction to Epistemic Network AnalysisVitomir Kovanovic
 
Understand students’ self-reflections through learning analytics
Understand students’ self-reflections through learning analyticsUnderstand students’ self-reflections through learning analytics
Understand students’ self-reflections through learning analyticsVitomir Kovanovic
 
Assessing cognitive presence using automated learning analytics methods
Assessing cognitive presence using automated learning analytics methodsAssessing cognitive presence using automated learning analytics methods
Assessing cognitive presence using automated learning analytics methodsVitomir Kovanovic
 
Kovanović et al. 2017 - developing a mooc experimentation platform: insight...
Kovanović et al.   2017 - developing a mooc experimentation platform: insight...Kovanović et al.   2017 - developing a mooc experimentation platform: insight...
Kovanović et al. 2017 - developing a mooc experimentation platform: insight...Vitomir Kovanovic
 
Learning Analytics for Communities of Inquiry
Learning Analytics for Communities of InquiryLearning Analytics for Communities of Inquiry
Learning Analytics for Communities of InquiryVitomir Kovanovic
 
A Novel Model of Cognitive Presence Assessment Using Automated Learning Analy...
A Novel Model of Cognitive Presence Assessment Using Automated Learning Analy...A Novel Model of Cognitive Presence Assessment Using Automated Learning Analy...
A Novel Model of Cognitive Presence Assessment Using Automated Learning Analy...Vitomir Kovanovic
 
Towards Automated Classification of Discussion Transcripts: A Cognitive Prese...
Towards Automated Classification of Discussion Transcripts: A Cognitive Prese...Towards Automated Classification of Discussion Transcripts: A Cognitive Prese...
Towards Automated Classification of Discussion Transcripts: A Cognitive Prese...Vitomir Kovanovic
 
What does effective online/blended teaching look like?
What does effective online/blended teaching look like?What does effective online/blended teaching look like?
What does effective online/blended teaching look like?Vitomir Kovanovic
 
MOOCs in the news- A European perspective
MOOCs in the news- A European perspectiveMOOCs in the news- A European perspective
MOOCs in the news- A European perspectiveVitomir Kovanovic
 
Automated Content Analysis of Discussion Transcripts
Automated Content Analysis of Discussion TranscriptsAutomated Content Analysis of Discussion Transcripts
Automated Content Analysis of Discussion TranscriptsVitomir Kovanovic
 

Más de Vitomir Kovanovic (20)

Introduction to Learning Analytics for High School Teachers and Managers
Introduction to Learning Analytics for High School Teachers and ManagersIntroduction to Learning Analytics for High School Teachers and Managers
Introduction to Learning Analytics for High School Teachers and Managers
 
Extending video interactions to support self-regulated learning in an online ...
Extending video interactions to support self-regulated learning in an online ...Extending video interactions to support self-regulated learning in an online ...
Extending video interactions to support self-regulated learning in an online ...
 
Analysing social presence in online discussions through network and text anal...
Analysing social presence in online discussions through network and text anal...Analysing social presence in online discussions through network and text anal...
Analysing social presence in online discussions through network and text anal...
 
Automated Analysis of Cognitive Presence in Online Discussions Written in Por...
Automated Analysis of Cognitive Presence in Online Discussions Written in Por...Automated Analysis of Cognitive Presence in Online Discussions Written in Por...
Automated Analysis of Cognitive Presence in Online Discussions Written in Por...
 
Validating a theorized model of engagement in learning analytics
Validating a theorized model of engagement in learning analyticsValidating a theorized model of engagement in learning analytics
Validating a theorized model of engagement in learning analytics
 
Examining the Value of Learning Analytics for Supporting Work-integrated Lear...
Examining the Value of Learning Analytics for Supporting Work-integrated Lear...Examining the Value of Learning Analytics for Supporting Work-integrated Lear...
Examining the Value of Learning Analytics for Supporting Work-integrated Lear...
 
Developing Self-regulated Learning in High-school Students: The Role of Learn...
Developing Self-regulated Learning in High-school Students: The Role of Learn...Developing Self-regulated Learning in High-school Students: The Role of Learn...
Developing Self-regulated Learning in High-school Students: The Role of Learn...
 
Introduction to R for Learning Analytics Researchers
Introduction to R for Learning Analytics ResearchersIntroduction to R for Learning Analytics Researchers
Introduction to R for Learning Analytics Researchers
 
A novel model of cognitive presence assessment using automated learning analy...
A novel model of cognitive presence assessment using automated learning analy...A novel model of cognitive presence assessment using automated learning analy...
A novel model of cognitive presence assessment using automated learning analy...
 
Introduction to Learning Analytics
Introduction to Learning AnalyticsIntroduction to Learning Analytics
Introduction to Learning Analytics
 
Introduction to Epistemic Network Analysis
Introduction to Epistemic Network AnalysisIntroduction to Epistemic Network Analysis
Introduction to Epistemic Network Analysis
 
Understand students’ self-reflections through learning analytics
Understand students’ self-reflections through learning analyticsUnderstand students’ self-reflections through learning analytics
Understand students’ self-reflections through learning analytics
 
Assessing cognitive presence using automated learning analytics methods
Assessing cognitive presence using automated learning analytics methodsAssessing cognitive presence using automated learning analytics methods
Assessing cognitive presence using automated learning analytics methods
 
Kovanović et al. 2017 - developing a mooc experimentation platform: insight...
Kovanović et al.   2017 - developing a mooc experimentation platform: insight...Kovanović et al.   2017 - developing a mooc experimentation platform: insight...
Kovanović et al. 2017 - developing a mooc experimentation platform: insight...
 
Learning Analytics for Communities of Inquiry
Learning Analytics for Communities of InquiryLearning Analytics for Communities of Inquiry
Learning Analytics for Communities of Inquiry
 
A Novel Model of Cognitive Presence Assessment Using Automated Learning Analy...
A Novel Model of Cognitive Presence Assessment Using Automated Learning Analy...A Novel Model of Cognitive Presence Assessment Using Automated Learning Analy...
A Novel Model of Cognitive Presence Assessment Using Automated Learning Analy...
 
Towards Automated Classification of Discussion Transcripts: A Cognitive Prese...
Towards Automated Classification of Discussion Transcripts: A Cognitive Prese...Towards Automated Classification of Discussion Transcripts: A Cognitive Prese...
Towards Automated Classification of Discussion Transcripts: A Cognitive Prese...
 
What does effective online/blended teaching look like?
What does effective online/blended teaching look like?What does effective online/blended teaching look like?
What does effective online/blended teaching look like?
 
MOOCs in the news- A European perspective
MOOCs in the news- A European perspectiveMOOCs in the news- A European perspective
MOOCs in the news- A European perspective
 
Automated Content Analysis of Discussion Transcripts
Automated Content Analysis of Discussion TranscriptsAutomated Content Analysis of Discussion Transcripts
Automated Content Analysis of Discussion Transcripts
 

Último

INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptxSherlyMaeNeri
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxDr.Ibrahim Hassaan
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxChelloAnnAsuncion2
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 

Último (20)

INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptx
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptx
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdf
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 

Unsupervised Learning for Learning Analytics Researchers

  • 1. UNSUPERVISED MACHINE LEARNING VITOMIR KOVANOVIĆ UNIVERSITY OF SOUTH AUSTRALIA #vkovanovic vitomir.kovanovic.info Vitomir.Kovanovic@unisa.edu.au LEARNING ANALYTICS SUMMER INSTITUTE TEACHERS COLLEGE, COLUMBIA UNIVERSITY JUNE 11-13, 2018 SREĆKO JOKSIMOVIĆ UNIVERSITY OF SOUTH AUSTRALIA #s_joksimovic www.sjoksimovic.info Srecko.Joksimovic@unisa.edu.au 1
  • 2. About me • Learning analytics researcher • Research Fellow, School of Education, UniSA Data Scientist, Teaching Innovation Unit, UniSA Member of the Centre for Change and Complexity in Learning (C3L) • Member of the SoLAR executive board • Computer science and information systems background • Used cluster analysis in several research projects 2
  • 3. About you • Introduce yourself • Name, affiliation, position • Experience with machine learning and clustering • Experience with Weka or some other ML/DM toolkit • Ideas for clustering in your own research/work 3
  • 4. Download data from Dropbox or USB http://bit.ly/lak18ul 4
  • 5. Workshop outline 1. Three days, four sessions 2. Equally theoretical and practical 3. Use of Weka Machine Learning toolkit 4. Focus on practical use 5. Examples of clustering use in learning analytics 5
  • 6. Workshop topics • Introduction to machine learning & unsupervised methods • Introduction to cluster analysis • Overview of cluster analysis use in Learning Analytics • Introduction to WEKA toolkit • Overview of the tutorial dataset • K-means algorithm • K-means demo • Hierarchical clustering algorithms • Hierarchical clustering demo 6
  • 7. Tutorial topics • How to choose the number of clusters • How to interpret clustering results • Practical challenges • More advanced cluster analysis approaches • Statistical methods for comparing clusters • Clustering real-world data from OU UK • Discussing different cluster analysis methods 7
  • 9. What is machine learning 9
  • 10. What is machine learning? Computing method for making sense of the data 10
  • 11. Data is everywhere Each minute: ● 3,600,000 Google searches ● 456,000 Twitter posts ● 46,740 Instagram photos ● 45,787 Uber trips ● 600 new Wikipedia edits ● 13 new Spotify songs Domo (2017). “Data Never Sleeps 5.0” https://www.domo.com 11
  • 12. What products should go on sale? Grouping of related products 12
  • 13. What movies to recommend? Grouping users based on their viewing preferences 13
  • 14. How to navigate streets? Processing multiple streams of information in real time 14
  • 15. Online store product recommendation 15
  • 16. Fields that influenced machine learning • Statistics • Operations research • Artificial intelligence • Data visualisation • Software engineering • Information systems management 16
  • 17. How machine learning works? WILL IT TAKE MY JOB? 17
  • 18. Two key ideas in machine learning 1.Features 2.Models 18
  • 19. What is a feature? 1. A feature is a characteristic of a data point 2. Each data point is represented as a vector of features [f1, f2, f3 ... fm] 3. A whole dataset of N data points is represented as a N x M matrix Data point Feature 1 Feature 2 .... Feature M Data point 1 Data point 2 .... Data point N 19
  • 20. What is a feature? • Performance of machine learning algorithms in large part depend on the quality of extracted features (how useful they are for a given ML task) • Expertise and prior knowledge come into play when deciding which features to extract 20
  • 21. What is a model? • Something that capture important patterns in the data • A model can be used to • Draw inferences • Understand the data • Learn hidden rules • Support decision making 21
  • 22. An example model: BMI calculator • Goal: Predicting a person’s body fat category (overweight, normal, or underweight) from his height (in m) and weight (in kg). • Model: • BMI = weight / height2 • If BMI > 25: overweight • If BMI < 18.5: underweight • Otherwise: normal • An example: 1.75m and 70kg: BMI: 70/(1.75*1.75) = 22.85 -> Normal category 22
  • 23. ML Model How machine learning works? Slow and hard Model development Model use Response (Prediction) A new data point Fast and easy ML Model Model buildingN data points feature extraction NxM feature matrix feature extraction feature vector of length M 23
  • 24. Two types of errors • Bias: The error from erroneous assumptions of the model. • High bias: miss the relevant relationships between variables (underfitting). • Variance: The error from sensitivity to small fluctuations in the data. • High variance: modelling the random noise in the data, rather than real relationships (overfitting). 24
  • 25. Two types of errors • We always work with samples • Samples always have noise • The trick is to develop models that do not fit training data, but new future data 25
  • 26. Two types of errors 26 High bias High variance
  • 27. The trick is to find optimal model complexity 27
  • 28. Key machine learning approaches 1. Supervised machine learning 1. Predicting categorical value: Classification 2. Predicting continuous value: Regression 2. Unsupervised machine learning 1. Grouping data points (rows): Cluster analysis 2. Grouping features (columns): Principal Component Analysis (PCA), Factor Analysis (FA), Latent semantic analysis (LSA), Singular Value Decomposition (SVD) 28
  • 29. Many more approaches • Models that blur the division between supervised and unsupervised • Reinforcement learning: learning the class label after making a prediction • Neural networks (can be supervised and unsupervised) • Online learning models: learning as data arrives • Feature processing methods: association rule mining 29
  • 31. 10 data points Data point 1 Data point 2 Data point 3 Data point 4 Data point 5 Data point 6 Data point 7 Data point 8 Data point 9 Data point 10 31
  • 32. ML Model How machine learning works? Slow and hard Model development Model use Response (Prediction) A new data point Fast and easy ML Model Model buildingN data points feature extraction NxM feature matrix feature extraction feature vector of length M 32
  • 33. First step: feature extraction • From each data point we extracted four features: • Number of wheels • Colour • Top speed (in km/h) • Weight (in kg) • Our feature matrix is 10 x 4 ID Wheels Color Top speed (km/h) Weight (kg) 1 4 Yellow 220 1,200 2 4 Red 180 950 3 2 Blue 260 230 4 2 Red 210 320 5 4 Yellow 160 870 6 4 Blue 170 750 7 4 Red 190 850 8 2 Yellow 140 140 9 2 Yellow 210 310 10 2 Red 240 280 33
  • 34. Supervised learning: classification • Each data point is provided with a continuous numerical label (outcome variable) • The goal is to predict the class label for a new data point ID Wheels Color Top speed (km/h) Weight (kg) Label 1 4 Yellow 220 1,200 Car 2 4 Red 180 950 Car 3 2 Blue 260 230 Bike 4 2 Red 210 320 Bike 5 4 Yellow 160 870 Car 6 4 Blue 170 750 Car 7 4 Red 190 850 Car 8 2 Yellow 140 140 Bike 9 2 Yellow 210 310 Bike 10 2 Red 240 280 Bike [4, Yellow, 260, 1100] ?Car ID Wheels Color Top speed (km/h) Weight (kg) 1 4 Yellow 220 1,200 2 4 Red 180 950 3 2 Blue 260 230 4 2 Red 210 320 5 4 Yellow 160 870 6 4 Blue 170 750 7 4 Red 190 850 8 2 Yellow 140 140 9 2 Yellow 210 310 10 2 Red 240 280 We learned a model to classify a new (unseen) vehicle as either a car or a bike 34
  • 35. Supervised learning: regression • The goal is to predict the outcome value for a new data point [4, Yellow, 260, 1100] ?140,000 ID Wheels Color Top speed (km/h) Weight (kg) Label 1 4 Yellow 220 1,200 120,000 2 4 Red 180 950 40,000 3 2 Blue 260 230 63,000 4 2 Red 210 320 53,000 5 4 Yellow 160 870 21,000 6 4 Blue 170 750 37,000 7 4 Red 190 850 21,000 8 2 Yellow 140 140 26,000 9 2 Yellow 210 310 68,000 10 2 Red 240 280 75,000 ID Wheels Color Top speed (km/h) Weight (kg) 1 4 Yellow 220 1,200 2 4 Red 180 950 3 2 Blue 260 230 4 2 Red 210 320 5 4 Yellow 160 870 6 4 Blue 170 750 7 4 Red 190 850 8 2 Yellow 140 140 9 2 Yellow 210 310 10 2 Red 240 280 We learned a model to predict a price of a new (unseen) vehicle 35
  • 37. Unsupervised learning: clustering • We want algorithm to group data points into several groups based on their similarity ID Wheels Color Top speed (km/h) Weight (kg) Group 1 4 Yellow 220 1,200 ? 2 4 Red 180 950 ? 3 2 Blue 260 230 ? 4 2 Red 210 320 ? 5 4 Yellow 160 870 ? 6 4 Blue 170 750 ? 7 4 Red 190 850 ? 8 2 Yellow 140 140 ? 9 2 Yellow 210 310 ? 10 2 Red 240 280 ? ID Wheels Color Top speed (km/h) Weight (kg) 1 4 Yellow 220 1,200 2 4 Red 180 950 3 2 Blue 260 230 4 2 Red 210 320 5 4 Yellow 160 870 6 4 Blue 170 750 7 4 Red 190 850 8 2 Yellow 140 140 9 2 Yellow 210 310 10 2 Red 240 280 [4, Yellow, 260, 1100] ?1 ID Wheels Color Top speed (km/h) Weight (kg) Group 1 4 Yellow 220 1,200 1 2 4 Red 180 950 1 3 2 Blue 260 230 2 4 2 Red 210 320 2 5 4 Yellow 160 870 1 6 4 Blue 170 750 1 7 4 Red 190 850 1 8 2 Yellow 140 140 2 9 2 Yellow 210 310 2 10 2 Red 240 280 2 Interpretation of group meaning is up to the researcher 1=?, 2=? 37
  • 38. Unsupervised learning: clustering • We want algorithm to group data points into several groups based on their similarity ID Wheels Color Top speed (km/h) Weight (kg) Group 1 4 Yellow 220 1,200 ? 2 4 Red 180 950 ? 3 2 Blue 260 230 ? 4 2 Red 210 320 ? 5 4 Yellow 160 870 ? 6 4 Blue 170 750 ? 7 4 Red 190 850 ? 8 2 Yellow 140 140 ? 9 2 Yellow 210 310 ? 10 2 Red 240 280 ? ID Wheels Color Top speed (km/h) Weight (kg) 1 4 Yellow 220 1,200 2 4 Red 180 950 3 2 Blue 260 230 4 2 Red 210 320 5 4 Yellow 160 870 6 4 Blue 170 750 7 4 Red 190 850 8 2 Yellow 140 140 9 2 Yellow 210 310 10 2 Red 240 280 [4, Yellow, 260, 1100] ?2 ID Wheels Color Top speed (km/h) Weight (kg) Group 1 4 Yellow 220 1,200 2 2 4 Red 180 950 1 3 2 Blue 260 230 2 4 2 Red 210 320 2 5 4 Yellow 160 870 1 6 4 Blue 170 750 1 7 4 Red 190 850 1 8 2 Yellow 140 140 1 9 2 Yellow 210 310 2 10 2 Red 240 280 2 Pick the grouping of data that is most useful for your own purpose 38
  • 39. Introduction to cluster analysis WE NEED TO START SOMEWHERE 39
  • 40. Homogeneity Cluster Distance Unsupervised categorization: no predefined classes What is Cluster Analysis? 40
  • 41. History • Anthropology • Biology • Computer Science • Statistics • Mathematics • Medicine • Psychology • Engineering 41
  • 42. Primary goals Gain insights Categorize Compress 42
  • 43. Social Sciences • Improve understanding of a domain • Compress and summarize large datasets • Within Learning Analytics: • Profile learners based on their course engagement, • Discover emerging topics in a corpus (student discussions, course materials) • Group courses based on their characteristics 43
  • 45. Medicine and genetics Clustering patients, symptoms, gene expressions 45
  • 46. Marketing and customer analysis Understanding customer populations 46
  • 47. Newspapers and document analysis Grouping related news articles and summarizing large collections of documents 47
  • 49. Urban planning Grouping building based on their properties 49
  • 50. What is not clustering? • Simple data partitioning • Single property • Predefined groups • Data clustering • Multiple properties • Unforeseen groups • Combinations of properties describe groups 50
  • 51. Important concepts CLUSTERING IS TRICKY BUSINESS 51
  • 52. Cluster ambiguity • How many clusters? 52
  • 53. Cluster ambiguity • How many clusters? Two-cluster solution 53
  • 54. Cluster ambiguity • How many clusters? Four-cluster solution 54
  • 55. Cluster ambiguity • How many clusters? Six-cluster solution 55
  • 57. Cluster separation and stability 57
  • 58. Representing a cluster • Centroid – a geometrical centre of a cluster • Medoid – data point closest to the centroid 58
  • 59. What is mean by similar? • What is meant by “similar data points”? • Geometry – More similar data points are closer to each other in N- dimensional feature space • Yes, but: • Close to the cluster “centre” • Closeness to any other data point in a cluster • Is it about distance between data points or their special density? 59
  • 60. Any data point or centre 60
  • 61. Types of clustering approaches THERE ARE CLUSTERS OF CLUSTERING METHODS 61
  • 62. Different types of clustering methods • Membership strictness • Hard clustering • Each object either belongs to a cluster or not • Soft (fuzzy) clustering • Each object belongs to each cluster to some degree 62
  • 63. Different types of clustering methods • Membership exclusivity • Strict partitioning clustering (e.g. K-means) • Each object belongs to one and only one cluster • Strict partitioning clustering with outliers • Each objects belongs to zero or one clusters 63
  • 64. Different types of clustering methods • Overlapping clustering • Each object can belong to one or more “hard” clusters • Hierarchical clustering • Objects belonging to a child cluster also belong to the parent cluster 64
  • 65. Different types of clustering methods • Distance-based clustering • Group objects based on distance among them • Density-based clustering • Group objects based on area they occupy 65
  • 66. Special clustering approaches MAAANY more approaches • Model-based clustering: • EM clustering • Neural network approaches – Self-organising maps • Grid-based approaches (e.g., STING) • Clustering algorithms for large datasets • Clustering of stream data in real time • Clustering (partitioning) approaches for different types of data (e.g., graphs) • Clustering approaches for categorical data • Clustering approaches for freeform clusters (e.g., CURE) • Clustering approaches for high-dimensional data (e.g., CLIQUE, PROCLUS) • Constraint-based clustering • Semi-supervised clustering 66
  • 67. Multivariate methods • N Data points have M features • Find K clusters so that • Each data point is associated to each of the K clusters to a certain degree (0 – none, 1.0 – fully) • Each of the K clusters is associated with all M features to a certain degree • Find K which maximizes the likelihood of the observed data 67
  • 68. Neural network approaches • Network of connected nodes that propagate signals • Edges have coefficients that alter signal propagation • Traditionally supervised learning method • Backpropagation method of learning coefficients • Learning method and network structure altered to support unsupervised learning • Nodes can move! • Eventually position of nodes indicate location of clusters 68
  • 69. Graph partitioning • Partitioning network into subgraphs • Goal to have highly dense subgraphs with few connections between them 69
  • 70. Popular distance metric • Way of calculating similarity between different data points. • Important for methods based on distances (e.g., K-Means, Hierarchical clustering) • Have a significant effect on the final clustering results Distance metric Formula Euclidean distance 𝑎 − 𝑏 2 = 𝑖 𝑎𝑖 − 𝑏𝑖 2 Squared Euclidean distance 𝑎 − 𝑏 2 2 = 𝑖 𝑎𝑖 − 𝑏𝑖 2 Manhattan (Hammington) distance 𝑎 − 𝑏 1 = 𝑖 𝑎𝑖 − 𝑏𝑖 Maximum distance 𝑎 − 𝑏 ∞ = max 𝑖 𝑎𝑖 − 𝑏𝑖 70
  • 71. Distance metrics example • Euclidean: 42 + 32 = 5 • Square Euclidean: 42 + 32 = 25 • Manhattan: 4 + 3 = 7 • Maximum: max 4, 3 = 4 3 4 71
  • 72. Clustering for Learning Analytics • Grouping of: • Students • Demographics • Behavior • Preferences • Courses taken • Academic performance • Resources • Reading materials • Discussions • Courses • Course design improvement 72
  • 73. Some examples Kovanović, V., Joksimović, S., Gašević, D., Owers, J., Scott, A.-M., & Woodgate, A. (2016). Profiling MOOC course returners: How does student behaviour change between two course enrolments? In Proceedings of the Third ACM Conference on Learning @ Scale (L@S’16) (pp. 269–272). New York, NY, USA: ACM. https://doi.org/10.1145/2876034.2893431 73
  • 74. Dataset • 28 offerings of 11 different Coursera MOOCs at the University of Edinburgh • 26,025 double course enrolment records • 52,050 course enrolment records • K-means clustering • Too large for clustering methods that use pairwise distances (e.g., hierarchical clustering) # Course Offering 1 Artificial Intelligence Planning 1,2,3 2 Animal Behavior and Welfare 1,2 3 AstroTech: The Science and Technology behind Astronomical Discovery 1,2 4 Astrobiology and the Search for Extraterrestrial Life 1,2 5 The Clinical Psychology of Children and Young People 1,2 6 Critical Thinking in Global Challenges 1,2,3 7 E-learning and Digital Cultures 1,2,3 8 EDIVET: Do you have what it takes to be a veterinarian? 1,2 9 Equine Nutrition 1,2,3 10 Introduction to Philosophy 1,2,3,4 11 Warhol 1,2 74
  • 75. Extracted features • 9 different features extracted Feature Description Days No. of days active Sub. No. of submitted assignments Wiki No. of wiki page views Disc. No. of discussion views Posts No. of discussion messages written Quiz. No. of quizzes attempted Quiz.Uni. No. of different quizzes attempted Vid.Uni. No. of different videos watched Vid. No. of videos watched 75
  • 77. Results Cluster label Students Students % Enrol only (E) 22,932 44.1 Low engagement (LE) 21,776 41.8 Videos & Quizzes (VQ) 2,120 4.1 Videos (V) 5,128 9.9 Social (S) 94 0.2 77
  • 78. Some examples Kovanović, V., Gašević, D., Joksimović, S., Hatala, M., & Adesope, O. (2015). Analytics of communities of inquiry: Effects of learning technology use on cognitive presence in asynchronous online discussions. The Internet and Higher Education, 27, 74–89. https://doi.org/10.1016/j.iheduc.2015.06.002 78
  • 79. Clustering features # Type Code 1 Clustering variables (content) ULC UserLoginCount Total number of times student logged into the system. 2 CVC CourseViewCount Total number of times student viewed general course information. 3 AVT AssignmentViewTime Total time spent on all course assignments. 4 AVC AssignmentViewCount Total number of times student opened one of the course assignments. 5 RVT ResourceViewTime Total time spent on reading the course resources. 6 RVC ResourceViewCount Total number of times student opened one of the course resource materials. 7 Clustering variables (discussion) FSC ForumSearchCount Total number of times student used search function on the discussion boards. 8 DVT DiscussionViewTime Total time spent on viewing course’s online discussions. 9 DVC DiscussionViewCount Total number of time student opened one of the course’s online discussions. 10 APT AddPostTime Total time spent on posting discussion board messages. 11 APC AddPostCount Total number of the discussion board messages posted by the student. 12 UPT UpdatePostTime Total time spent on updating one of his discussion board messages. 13 UPC UpdatePostCount Total number of times student updated one of his discussion board messages. 79
  • 81. Cluster interpretations # Size Label 1 21 Task-focused users Overall below average activity, Above average message posting activity 2 15 Content-focused users Below average discussions-related activity, Average content-related activity, emphasis on assignments 3 22 No-users Overall below average activity, slightly bigger in discussion-related activities 4 3 Highly intensive users Significantly most active students, especially in content-related activities 5 6 Content-focused intensive users Above average content-related activity, Average discussion-related activity 6 14 Socially-focused intensive users Above average discussion-related activity, Average content-related activity 81
  • 82. Some examples Almeda, M. V., Scupelli, P., Baker, R. S., Weber, M., & Fisher, A. (2014). Clustering of Design Decisions in Classroom Visual Displays. In Proceedings of the Fourth International Conference on Learning Analytics and Knowledge (pp. 44–48). New York, NY, USA: ACM. https://doi.org/10.1145/2567574.2567605 82
  • 83. Clustering visual designs of classrooms • 30 schools in northwestern USA • Classroom Wall Coding System, CWaCS 1.0. • Each classroom wall was photographed • Units of analysis were marked with a box • Coding scheme: 1. Academic 1. Academic topics (F1) 2. Academic organizational (F2) 2. Non-academic (F3) 3. Behavioural (F4) • Adopted K-means to cluster classrooms based on frequency of four features (F1-F4) • Academic organizational 1. Goals for the day 2. Group assignments 3. Job charts 4. Labels 5. Schedule day/week 6. Yearly 7. Schedule 8. Skills 9. Homework • Behavior materials 1. Behavior management 2. Progress charts 3. Rules 4. Other behaviour • Academic topics 1. Behavior 2. Content specific 3. Procedures 4. Resources 5. Calendars/clocks 6. Other • Non-academic 1. Motivational slogans 2. Decorations 3. Decorative frames 4. Student art 5. Other non-academic 83
  • 85. Some examples Ferguson, R., Clow, D., Beale, R., Cooper, A. J., Morris, N., Bayne, S., & Woodgate, A. (2015). Moving Through MOOCS: Pedagogy, Learning Design and Patterns of Engagement. In Design for Teaching and Learning in a Networked World (pp. 70– 84). Springer International Publishing. https://doi.org/10.1007/978-3-319- 24258-3_6 85
  • 86. Features Possible combinations: • 1 = Visited content only • 2 = Posted comment but visited no new content • 3 = Visited content and posted comment • 4 = Submitted the assessment late • 5 = Visited content and submitted assessment late • 6 = Posted late assessment, saw no new content • 7 = Visited content, posted, late assessment • 8 = Submitted assessment early /on time • 9 = Visited content, assessment early /on time • 10 = Posted, assessment early /on time, no new content • 11 = Visited, posted, assessment early /on time For each course week, we assigned learners an activity score: • 1 if they viewed content • 2 if they posted a comment • 4 if they submitted their assessment in a subsequent week • 8 if they submitted it early or on time • Adopted K-means 86
  • 87. 1. Samplers 2. Strong Starters 3. Returners 4. Midway Dropouts 5. Nearly There 6. Late Completers 7. Keen Completers 87
  • 88. Further examples Lust, G., Elen, J., & Clarebout, G. (2013). Regulation of tool-use within a blended course: Student differences and performance effects. Computers & Education, 60(1), 385–395. https://doi.org/10.1016/j.compedu.2012.09.001 Wise, A. F., Speer, J., Marbouti, F., & Hsiao, Y.-T. (2013). Broadening the notion of participation in online discussions: examining patterns in learners’ online listening behaviors. Instructional Science, 41(2), 323–343. https://doi.org/10.1007/s11251-012-9230-9 Niemann, K., Schmitz, H.-C., Kirschenmann, U., Wolpers, M., Schmidt, A., & Krones, T. (2012). Clustering by Usage: Higher Order Co- occurrences of Learning Objects. In Proceedings of the 2Nd International Conference on Learning Analytics and Knowledge (pp. 238–247). New York, NY, USA: ACM. https://doi.org/10.1145/2330601.2330659 Cobo, G., García-Solórzano, D., Morán, J. A., Santamaría, E., Monzo, C., & Melenchón, J. (2012). Using Agglomerative Hierarchical Clustering to Model Learner Participation Profiles in Online Discussion Forums. In Proceedings of the 2Nd International Conference on Learning Analytics and Knowledge (pp. 248–251). New York, NY, USA: ACM. https://doi.org/10.1145/2330601.2330660 Crossley, S., Roscoe, R., & McNamara, D. S. (2014). What Is Successful Writing? An Investigation into the Multiple Ways Writers Can Write Successful Essays. Written Communication, 31(2), 184–214. https://doi.org/10.1177/0741088314526354 Hecking, T., Ziebarth, S., & Hoppe, H. U. (2014). Analysis of Dynamic Resource Access Patterns in Online Courses. Journal of Learning Analytics, 1(3), 34–60. Li, N., Kidziński, Ł., Jermann, P., & Dillenbourg, P. (2015). MOOC Video Interaction Patterns: What Do They Tell Us? In Proceedings of the 10th European Conference on Technology Enhanced Learning (pp. 197–210). Springer International Publishing. https://doi.org/10.1007/978-3-319-24258-3_15 88
  • 89. K-Means clustering • The most widely used clustering algorithm • Very simple, decent results • Produces “circular” clusters • Iterative algorithm • Initial position of cluster centroids random • Often done multiple times and results averaged out (e.g., 1,000 times) 89
  • 90. K-Means algorithm 1. Pick the number of clusters K 2. Pick K centroids in the N-dimensional feature space 𝑐𝑖 𝑁 , 𝑖 ∈ 1 … 𝐾 3. For each of P data points 𝑝𝑖 𝑁 : 1. Calculate the distance to each of the K centroids 2. Assign it to its closest centroid 4. Recalculate centroid positions based on the assigned data points 5. Repeat steps 3–5 until centroid positions stabilize (i.e., there is no change in step 4) 90
  • 92. K-Means characteristics • The final solution depends a lot on the original random centroid positions • The algorithm is often repeated (restarted) many times. • Restart K-means R (e.g., 1,000) times. • For each of the data points there will be R cluster assignments. • For each data point, pick the cluster assignment which was most common among R assignments 92
  • 93. K-Means characteristics • The algorithm is easy to implement • Petty fast, converges very quickly • For N data points, requires calculation of N*K distances (which is 𝑂 𝑁 ) • Produces circular clusters – can be a problem in some domains • Susceptive to outliers: Each data point will be assigned to one of the centroids and can shift its centroid significantly “off side” • The number of clusters must be provided • Can be stuck in a local optima (solved often by multiple runs) 93
  • 94. K-Means variants • K-Means++ • “Smart” picking of the initial centroids (a.k.a. seeds) • Seed selection algorithm: • Pick the first seed randomly (uniform distribution across the whole space) • Pick the next seed with a probability which is a squared distance from the closest seed • Effectively “spreads” the seed centroids across the feature space • K-Medoids & its flavours (Partitioning Around Medoids - PAM) • The solution to outlier problem: Instead of using centroid, use medoid. • Instead of representing clusters with centres, use existing data points to represents clusters 94
  • 95. PAM algorithm (Partitioning Around Medoids) 1. One variant of K-Medoids 2. Pick the number of clusters K 3. Pick K data points in the N-dimensional feature space 𝑚𝑖 𝑁 , 𝑖 ∈ 1 … 𝐾 which will be initial cluster representatives 4. Assign each of remaining M-K data points 𝑝𝑖 𝑁 to the closest representative 5. For each representative point 𝑜𝑗: 1. Pick a random non-representative data point from its cluster 𝑜 𝑟𝑎𝑛𝑑𝑜𝑚 2. Check if swapping 𝑜𝑗 with 𝑜 𝑟𝑎𝑛𝑑𝑜𝑚 produces clusters with smaller “errors” (the sum of all clusters’ absolute differences between their data points and representatives) 3. If the new cost is smaller than the original cost, keep 𝑜 𝑟𝑎𝑛𝑑𝑜𝑚 as a representative point 6. Repeat steps 4–6 until there are no changes in representative objects 95
  • 96. K-Means variants • X-Means • Does not require number of clusters K to be specified • Refines clustering solution by splitting existing clusters • Keeps the clustering configuration which maximizes AIC (Akaike information criterion) or BIC (Bayesian information criterion) • Implemented in WEKA • Cascading K-Means • Restarts K-means with different K and picks the K that maximizes Calinski and Harabasz criterion (F value in ANOVA) • Implemented in WEKA 96
  • 97. K-Means variants • Large datasets variants: CLARA (Clustering LARge Applications) and CLARANS (Clustering Large Applications upon RANdomized Search) • CLARA: Use a sample of data points as potential candidate medoids and run PAM. • CLARANS Add randomization so the sample is not fixed at the start • Fuzzy C-means • Each data point can belong to multiple clusters with different probabilities (up to 100% for all clusters) • Also assesses the compactness of each cluster • Compact clusters will have members with high probabilities 97
  • 98. Running K-means & X-Means in WEKA 98
  • 99. Hierarchical clustering • Next to k-means, very popular method for cluster analysis • Two key flavours • agglomerative • divisive • Especially usable for small datasets • Evaluate and pick the number of clusters visually • The height of the merge/split indicates the distance • Used extensively in Learning Analytics • Many variants, using Linkage Functions 99
  • 100. Agglomerative hierarchical clustering • Build the clusters from bottom-up • Algorithm: • Build a singleton cluster for each data point • Repeat until all data in a single cluster: • find two closest clusters (based on linkage function) • merge these two together • Run Interactive DEMO 100
  • 101. Agglomerative hierarchical clustering • Requires calculation of the distances between all cluster pairs • At step 1 – this means calculating all pairwise distances among data points • N data points – N*N/2 distances • Not feasible for large datasets 101
  • 102. Divisive hierarchical clustering • All data start in a single cluster, then we split one cluster at each step. • More complex than agglomerative (how to split a cluster?) • Less popular than agglomerative algorithms • Can be faster as we do not need to go all the way to the bottom of the dendrogram • Many approaches, often use “flat” algorithm as a partitioning method (e.g., K-means) 102
  • 103. Example divisive clustering with K-means • Start with all data in a single cluster • Use K-means to create two initial clusters A2 and B2 • Use K-means to divide A2 into A2-1 and A2-2 • Use K-means to divide B2 into B2-1 and B2-2 • Pick between: • A2-1, A2-2, B2 • A2, B2-1, B2-2 • Call the best combination A3, B3, and C3 • Repeat the division of each cluster into two clusters. Pick between: • A3-1, A3-2, B3, C3 • A3, B3-1, B3-2, C3 • A3, B3, C3-1, C3-2 103
  • 104. Linkage functions • Key question for agglomerative clustering: How to pick two clusters to merge • What is meant by “closest” • Several different criteria. Most popular • Single-linkage: Minimal distance between any two data points • Complete-linkage: Maximal distance between any two data points • Average-linkage: Distance between cluster centroids • Ward’s method: pick the pair of clusters so that the new cluster has minimal possible sum of squares of distances. Minimizes the variation within the clusters. 104
  • 106. Linkage functions • Different linkage functions can produce very different results Single linkage Complete linkage Ward’s criteria 106
  • 107. Hierarchical clustering in Weka FUN PART 107
  • 108. Short intro to WEKA 108
  • 109. What is Weka? Software “workbench” Waikato Environment for Knowledge Analysis (WEKA) 109
  • 110. Installing Weka • https://www.cs.waikato.ac.nz/ml/weka/index.html • Very powerful, lot of resources available • Good for fast prototyping, much faster than R/Python • Can be used • Through GUI, which is very quirky and has hidden “gems” • From command line (useful for integrating with other tools/scripts) • As a Java library • Not the best designed UI, clearly done by the developers • Great book about ML/DM/Weka https://www.cs.waikato.ac.nz/ml/weka/book.html • Many demo datasets included in Weka https://www.cs.waikato.ac.nz/ml/weka/datasets.html 110
  • 111. Weka Interfaces Will be used throughout the course Performance comparisons Graphical front - alternative to Explorer Unified interface Command-line interface 111
  • 114. Attribute Relation File Format (ARFF) Data.CSV 5.1, 3.5, 1.4, 0.2, Iris-setosa 4.9, 3.0, 1.4, 0.2, Iris-setosa 4.7, 3.2, 1.3, 0.2, Iris-setosa 4.6, 3.1, 1.5, 0.2, Iris-setosa 5.0, 3.6, 1.4, 0.2, Iris-setosa Data.ARFF @RELATION iris @ATTRIBUTE sepallength REAL @ATTRIBUTE sepalwidth REAL @ATTRIBUTE petallength REAL @ATTRIBUTE petalwidth REAL @ATTRIBUTE class {Iris-setosa,Iris- versicolor,Iris-virginica} @DATA 5.1,3.5,1.4,0.2,Iris-setosa 4.9,3.0,1.4,0.2,Iris-setosa 4.7,3.2,1.3,0.2,Iris-setosa 4.6,3.1,1.5,0.2,Iris-setosa 5.0,3.6,1.4,0.2,Iris-setosa 114
  • 115. WEKA skills • Loading/Saving data (CSV ARFF) • Filtering and pre-processing attributes • Visualising data • Clustering • Saving clustering results 115
  • 116. Lets play with WEKA 116
  • 117. Selecting the number of clusters • Clustering is user-centric and subjective • How to pick the number of clusters? • Based on background knowledge (e.g., educational theory) • Use an algorithm that calculates optimal number of clusters automatically (e.g., EM) • Use an algorithm that provides a visual overview of all clustering configurations (e.g., hierarchical clustering) • Use supervised clustering algorithm where clustering process is guided by the user • Evaluate multiple values for K manually • “Elbow” method: trade-off point between number of clusters and within cluster variance • Silhouette method: test robustness of cluster membership 117
  • 118. Elbow method • As K increases, the average diameter (variance) of clusters is also getting smaller • Find a “sweet spot” at which the decrease in variance sharply changes • Sometimes not so clear 118
  • 119. Silhouette method • Visual method for determining the number of clusters • 𝑎 𝑖 – the average distance of point 𝑖 to all other points in its cluster • 𝑏(𝑖) – the smallest average distance of point 𝑖 to points in another cluster (distance to the closest neighbouring cluster) • 𝑠 𝑖 = 𝑏 𝑖 −𝑎(𝑖) max 𝑎 𝑖 ,𝑏(𝑖) • 𝑠 𝑖 = 𝑏 𝑖 −𝑎(𝑖) 𝑏(𝑖) , 𝑖𝑓𝑏 𝑖 > 𝑎(𝑖) 0, 𝑖𝑓 𝑏 𝑖 = 𝑎(𝑖) 𝑏 𝑖 −𝑎(𝑖) 𝑎(𝑖) , 𝑖𝑓𝑏 𝑖 < 𝑎(𝑖) 119
  • 120. Silhouette method • 𝑠 𝑖 = 1 − 𝑎(𝑖) 𝑏(𝑖) , 𝑖𝑓𝑏 𝑖 > 𝑎(𝑖) 0, 𝑖𝑓 𝑏 𝑖 = 𝑎(𝑖) 𝑏 𝑖 𝑎(𝑖) − 1, 𝑖𝑓𝑏 𝑖 < 𝑎(𝑖) • 𝑠 𝑖 = 1, 𝑖𝑓𝑏 𝑖 ≫ 𝑎 𝑖 0, 𝑖𝑓 𝑏 𝑖 = 𝑎(𝑖) −1, 𝑖𝑓𝑏 𝑖 ≪ 𝑎(𝑖) • 𝑠 𝑖 = 1, 𝑔𝑜𝑜𝑑 𝑓𝑖𝑡 𝑡𝑜 𝑡ℎ𝑒 𝑐𝑙𝑢𝑠𝑡𝑒𝑟 0, 𝑏𝑜𝑟𝑑𝑒𝑟𝑙𝑖𝑛𝑒 𝑓𝑖𝑡 −1, 𝑏𝑎𝑑 𝑓𝑖𝑡 𝑡𝑜 𝑡ℎ𝑒 𝑐𝑙𝑢𝑠𝑡𝑒𝑟 120
  • 126. Silhouette coefficient • The average 𝑠(𝑖) for all data points • Pick the number of clusters that maximizes the average silhouette coefficient 126
  • 127. Silhouette coefficient and Elbow method in WEKA 127
  • 128. Challenges and solutions • High dimensionality & feature (attribute) selection • Categorical attributes • “Weirdly-shaped” clusters • Outliers 128
  • 129. Curse of dimensionality • Euclidean distance metric 𝑖 𝑎𝑖 − 𝑏𝑖 2 • In a highly-dimensional space with 𝑑 dimensions (0.0–1.0): • Highly likely that for at least for one feature 𝑖, the value 𝑎𝑖 − 𝑏𝑖 will be close to 1.0 • This puts the lower bound on the distance at 1.0 • However, the upper bound is 𝑑 • Most pairs of data points being far from this upper bound • Most data points have distance close to average distance • Many irrelevant dimensions for most clusters • The noise in irrelevant dimensions masks real differences between clusters 129
  • 130. Curse of dimensionality: some solutions • Feature transformation methods (essentially compression) • Creating smaller number of new, synthetic features based on the larger number of input features which are then used for clustering • Principal component analysis (PCA) • Singular-value decomposition (SVD) • Feature (attribute) selection methods • Searching for a subset of features that are relevant for a given domain. • Minimize entropy • Idea that feature spaces that contain tight clusters have low entropy • Subspace clustering – extension of attribute selection 130
  • 131. Curse of dimensionality: some solutions • Popular algorithms: • CLIQUE: A Dimension-Growth Subspace Clustering Method • Start with a single dimension and grow space by adding new dimensions • PROCLUS: A Dimension-Reduction Subspace Clustering Method • Starts with the complete high-dimensional space and assigns weight of each dimension for every cluster which are used to regenerate clusters • Explores dense subspace regions 131
  • 132. Categorical data • Most clustering algorithms focus on clustering with continuous numerical attributes (ratio variables) • How to cluster categorical data? E.g., clustering students based on their demographic characteristics: • Gender • Program • Study level (postgraduate vs. undergraduate) • Domestic/international 132
  • 133. Categorical data: simple solution • Ignore the problem, threat categorical data as numerical: • Male: 1, Female: 2 • Domestic: 1, International: 2 • Often does not produce good results. • Distance metric is not meaningful. • Point A: (Male, Domestic) • Point B: (Female, Domestic) • Point C: (Female, International) • Is point B closer to point A or point C? • Depends on the information value of these two features • “Localized method“ • If two distinct clusters have few points that are close, they might be merged together incorrectly. 133 C A B Gender Dom/Intl 1 2 1 2
  • 134. Categorical data: custom algorithms • ROCK (RObust Clustering using linKs) • A Hierarchical Clustering Algorithm for Categorical Attributes • Two data points are similar is they have similar neighbours • Typical example: market basket data 134
  • 135. “Weirdly-shaped” clusters • Most algorithms focus on distance between data points • However, often the connectedness of data points is also important • Different algorithms developed for these situations 135
  • 136. Different types of clustering methods • Distance-based clustering • Group objects based on distance among them • Density-based clustering • Group objects based on area they occupy 136
  • 137. CURE • Pick a subsample of data and cluster it using a method such as hierarchical clustering • Pick N characteristic points per each cluster that are most distant from each other. • Move representative points for a fraction towards cluster centroid. • Merge two clusters which have representative points sufficiently close. 137
  • 138. DBSCAN • DBSCAN (Density-based spatial clustering of applications with noise) • Density-based algorithm • Searches for areas with large number of points • Implemented in WEKA • General idea: • Each data point is either core point, reachable point or outlier • Core points have minP (parameter) points around them in the radius r (parameter) • Reachable points are in radius r of a core point • Every other data point is an outlier 138 minP=4 Red: core Yellow: reachable Blue: outliers
  • 139. Self-organising maps (SOM) • Special type of neural network • Used to learn the contour of the underlying data • Neuron laid out in a grid structure, each neuron connected with neighbours and all input nodes • For each data point, a neuron which is closest to it gets adjusted, with adjustments being propagated to neighbouring neurons • Over time, neurons will position themselves in the shape of the data • Dense areas with many neurons indicate clusters 139
  • 141. Expectation-maximization (EM) clustering • Much more general than clustering • Used to estimate hidden (latent) parameters • Does not require number of clusters to be specified • General idea: • Pick number of clusters K • Fit K distributions over clustering variables with their parameters P • Estimate likelihood of all data points being generated by any of the K distributions (expectation) • For every data point, sum likelihoods of being generated by any of the K distributions • Combine weights with the data to produce new estimates for parameters P • Repeat until convergence is reached (no parameter change) 141
  • 144. Analysis of cluster differences 144
  • 145. Analysis of cluster differences • We can check the differences between clusters with regards to • Clustering variables (e.g., number of logins, number of discussion posts) • Some additional variables (e.g., student grades, age, gender) • We can examine difference • One variable at a time (univariate differences) • Across multiple variables simultaneously (multivariate differences) • Takes into consideration the interaction among multiple variables 145
  • 146. Univariate analysis of cluster differences • For every variable we can use parametric and non-parametric univariate tests: • Two clusters: t-test and Mann-Whitney • Three or more clusters: One-way ANOVA and Kruskal-Wallis • Requires p-value adjustment (e.g., Bonferroni, Holm-Bonferroni correction) • Whether to use parametric or non-parametric primarily depends on the homogeneity (equality) of variance assumption • Can be tested with Levene’s test • If Levene’s test shows p<.05, use Mann-Whitney and Kruskal-Wallis • Significant ANOVAs tests can be followed by pairwise tests (e.g., TukeyHSD) • Significant Kruskal-Wallis tests can be followed by pairwise KW tests (with also p- value correction) 146
  • 147. Multivariate analysis of cluster differences • We can test differences across all variables at the same time • More holistic than ANOVA/KW • Instead of one dependent variable, we can have multiple variables • Use meaningful groups of variables (e.g., behavioural variables) • MANOVA: Multivariate analysis of variance • Step “before” ANOVAs/KWs • Has several statistics: Wilk’s Λ, Pillai’s Trace • Assumption: Homogeneity of covariance • Much trickier to test: Box’s M one method, but it is very sensitive (use p<.001) • Use Levene’s tests on each of the variables (doesn’t guarantee homogeneity of covariance but might help) • If assumption is violated, still can be used but with more robust metric (Pillai’s Trace) 147
  • 148. Example MANOVA “For assessing the difference between student clusters a multivariate analysis of variance (MANOVA) was used. To validate the difference between the discovered clusters a MANOVA model with cluster assignment as a single independent variable and thirteen clustering variables as the dependent measures was constructed…” Kovanović, V., Gašević, D., Joksimović, S., Hatala, M., & Adesope, O. (2015). Analytics of communities of inquiry: Effects of learning technology use on cognitive presence in asynchronous online discussions. The Internet and Higher Education, 27, 74–89. doi:10.1016/j.iheduc.2015.06.002 148
  • 149. Example MANOVA “Before running MANOVAs, … the homogeneity of covariances assumption was checked using Box’s M test and homogeneity of variances using Levine’s test. To protect from the assumption violations, we log-transformed the data and used the Pillai’s trace statistic which is considered to be a robust against assumption violations. As a final protection measure, obtained MANOVA results were compared with the results of the robust rank-based variation of the MANOVA analysis” Kovanović, V., Gašević, D., Joksimović, S., Hatala, M., & Adesope, O. (2015). Analytics of communities of inquiry: Effects of learning technology use on cognitive presence in asynchronous online discussions. The Internet and Higher Education, 27, 74–89. doi:10.1016/j.iheduc.2015.06.002 149
  • 150. Example MANOVA “The assumption of homogeneity of covariances was tested using Box’s M test which was not accepted. Thus, Pillai’s trace statistic was used, as it is more robust to the assumption violations together with the Bonferroni correction method. A statistically significant MANOVA effect was obtained, Pillai’s Trace = 1.62, F(39, 174) = 5.28, p < 10-14” Kovanović, V., Gašević, D., Joksimović, S., Hatala, M., & Adesope, O. (2015). Analytics of communities of inquiry: Effects of learning technology use on cognitive presence in asynchronous online discussions. The Internet and Higher Education, 27, 74–89. doi:10.1016/j.iheduc.2015.06.002 150
  • 151. MANOVA Follow-up analyses • Significant MANOVA can be followed-up with two types of analyses: • Individual ANOVAs/KWs (with p-value correction) • Which in turn can be followed with pairwise analyses: TukeyHSD/Pairwise KWs • Discriminatory Factor Analysis (DFA) • What combinations of variables differentiate between clusters • DFA can be run alone (without MANOVA) but its significance then can’t be tested 151
  • 152. 152 Kovanović, V., Gašević, D., Joksimović, S., Hatala, M., & Adesope, O. (2015). Analytics of communities of inquiry: Effects of learning technology use on cognitive presence in asynchronous online discussions. The Internet and Higher Education, 27, 74–89. doi:10.1016/j.iheduc.2015.06.002
  • 153. Example DFA analysis 153 Kovanović, V., Gašević, D., Joksimović, S., Hatala, M., & Adesope, O. (2015). Analytics of communities of inquiry: Effects of learning technology use on cognitive presence in asynchronous online discussions. The Internet and Higher Education, 27, 74–89. doi:10.1016/j.iheduc.2015.06.002