#lannister/machinelearning
The k-means algorithm takes k as a parameter, divides n objects into k clusters, so that the clusters have a high degree of similarity, and the similarity between clusters Degree is low. The processing process is as follows:
1. Randomly select k points as the initial cluster center;
2. For the remaining points, according to their distance from the cluster center, they are classified into the nearest cluster< br>3. For each cluster, calculate the mean of all points as the new cluster center
4. Repeat 2 and 3 until the cluster center no longer changes
Clustering is to find the relations /connections between data without labels.
K-means is one of the most widely used algorithms.
K-means for non-separated clusters(T-shirt sizing)
-
Find closest centroids
1
< div class="line">
2
3
< br /> 4
5
6
7
8
9< br />
10
11
< /td>
K = size(centroids, 1) ;
distance = zeros(K, 1);% to store and return the min distance
idx = zeros(size(X,1), 1);
for i = 1:size(X, 1)
for k = 1:K
distance(k) = sqrt(sum((X(i,:)-centroids (k,:)).^2));
end
[mini, index] = min(distance);
idx(i) = index;
< br />
end
-
Compute Means
< br />
1
2
3
4
< br /> 5
< br />
6
7
8
[mn] = size(X);
centroids = zeros( K, n);
< br />
fo rk=1:K
< br /> log
Large column
KMeans Clusteringic = idx==k;
centroids(k,:) = 1/sum(logic)* sum(X.*logic);
% sum(logic) is the number of examples assigned to kth centroid
end
-
Randomly initi alize cluster centroids
1
2
3
< br />
4
centroids = zeros(K, size(X, 2));
randidx = randperm(size(X, 1));
% Take the first K examples as centroids
< /div>
centroids = X(randidx(1:K), :);< br />
-
K -Means Clustering on Pixels
1
2
3
< br />
4
5
6
7
< /div>
8
div>
9
10
11
12131415
% Run K-Means
for i=1:max_iters
% Output progress
fprintf('K-Means iteration %d/%d...n', i, max_iters );
if exist('OCTAVE_VERSION')
fflush(stdout);
end
< br />
% For each example in X, assign it to the closest centroid
idx = findClosestCentroids(X, centroids);< br />
% Given the memberships, compute new centroids
centroids = computeCentroids( X, idx, K);
end
1
2
3
4
5
6
7
8
9
10
11
K = size(centroids, 1);
distance = zeros(K, 1);% to store and return the min distance
idx = zeros(size(X,1), 1);
for i = 1:size(X, 1)
for k = 1:K
distance( k) = sqrt(sum((X(i,:)-centroids(k,:)).^2));
end
[mini, index] = min (distance);
idx(i) = index;
end
1
2
3
4
5
6
7
8
[mn] = size(X);
centroids = zeros(K, n);< /p>
for k=1:K
log big column KMeans Clusteringic = idx==k;
centroids(k,:) = 1/sum(logic)*sum(X.*logic);
% sum(logic) is the number of examples assigned to kth centroid
end
< p>1
2
3
4
centroids = zeros(K, size(X, 2));
p>
randidx = randperm(size(X, 1));
% Take the first K examples as centroids
centroids = X(randidx(1:K), :);
1
2
3
4
5
6
7
8
9
10
11
12< /p>
13
14
15
% Run K-Means
for i=1:max_iters p>
% Output progress
fprintf(‘K-Means iteration %d/%d…n’, i, max_iters);
< p>if exist(‘OCTAVE_VERSION’)
fflush(stdout);
end
% For each example in X, assign it to the closest centroid
idx = findClosestCentroids(X, centroids);
% Given the memberships, compute new centroids
centroids = computeCentroids(X, idx, K);
end
< /p>