K means

supervised
k-means is learning ?
unsupervised

隨機選取資料組中的k筆資料當作初始群中⼼心u1～uk

計算每個資料xi 對應到最短距離的群中⼼心
(固定 ui 求解所屬群 ci)

利⽤用⽬目前得到的分類重新計算群中⼼心
(固定 ci 求解群中⼼心 ui)

重複step 2,3直到收斂
(達到最⼤大疊代次數 or 群⼼心中移動距離很⼩小)

K-means algorithm
input:
K(number of cluster)
Training set{x(1),x(2),…,x(m)}

K-means algorithm
cluster assignment step
move centroid step

Random initialization
bad
good

how can we choose the better cluster?

cost function J
x(i)
sum square error
distance uc

cost function J
so we have to find
the min(J)

different initialization
different J

If we use k-means.
we having to choose k.
But what is the right value k ?
k = cluster number

k depends on your target
s
m
l
l
m
s
xs
xl

bisecting k-means
1. Pick a cluster to split.
2. Find 2 sub-clusters using the basic K-means
algorithm. (Bisecting step)
3. Repeat step 2, the bisecting step, for ITER
times and take the split that produces the
clustering with the highest overall similarity.
4. Repeat steps 1, 2 and 3 until the desired
number of clusters is reached.

bisecting k-means
algorithm：
1. 把所有數據作為⼀一個cluster加⼊入cluster list
2. Repeat
3. 从cluster list中挑選⼀一個較⼤大cost function(J)的cluster出来
4. for i=1 to 預設的疊代次数
5. ⽤用k-means算法把挑出来的cluster分成兩個⼦子cluster
6. 计算兩個⼦子cluster的J
7. end for
8. 把for循環中最⼩小J的那兩個⼦子cluster加⼊入cluster list
9. until cluster list 擁有k 個cluster

reference
coursera stanford machine learning
bisecting k-means
K means 演算法

K means

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (15)

Similar a K means

Similar a K means (20)

Último

Último (20)

K means