KMeans Clustering - clustering, kmeans

#lannister/machinelearning

The k-means algorithm takes k as a parameter, divides n objects into k clusters, so that the clusters have a high degree of similarity, and the similarity between clusters Degree is low. The processing process is as follows:
1. Randomly select k points as the initial cluster center;
2. For the remaining points, according to their distance from the cluster center, they are classified into the nearest cluster 3. For each cluster, calculate the mean of all points as the new cluster center
4. Repeat 2 and 3 until the cluster center no longer changes

Clustering is to find the relations /connections between data without labels.
K-means is one of the most widely used algorithms.

K-means for non-separated clusters(T-shirt sizing)

Find closest centroids

 
 
 
 
 
 1
 
 
 

 
 
 < div class="line">
 
 
 2
 
 
 
 
 
 
 
 
 3
 
 
 

 
 
 
 
 < br /> 4
 
 
 

 
 
 
 
 
 5
 
 
 

 
 
 
 
 
 6
 
 
 

 
 
 
 
 
 7
 
 
 

 
 
 
 
 
 8 
 
 
 

 
 
 
 
 
 9< br /> 
 
 

 
 
 
 
 
 10
 
 
 

 
 
 
 
 
 11

< /td>

 
 
 
 
 
 K = size(centroids, 1) ;
 
 
 

 
 
 
 
 
 distance = zeros(K, 1);% to store and return the min distance
 
 
 

 
 
 
 
 
 idx = zeros(size(X,1), 1);
 
 
 

 
 
 

 
 
 
 
 
 for i = 1:size(X, 1)
 
 
 

 
 
 
 
 
 for k = 1:K
 
 
 
 
 
 
 
 
 
 distance(k) = sqrt(sum((X(i,:)-centroids (k,:)).^2));
 
 
 

 
 
 
 
 
 end
 
 
 

 
 
 
 
 
 [mini, index] = min(distance);
 
 
 

 
 
 
 
 
 idx(i) = index;
 
 
 

 
 < br /> 
 
 
 end

Compute Means

 
 
 
 
 
 [mn] = size(X);
 
 
 

 
 
 
 
 
 centroids = zeros( K, n);
 
 
 

 
 
 

 < br /> 
 
 
 
 fo rk=1:K
 
 
 

 
 
 
 
 < br /> log 
 
 
 Large column
 
 
 KMeans Clusteringic = idx==k;
 
 
 

 
 
 
 
 
 centroids(k,:) = 1/sum(logic)* sum(X.*logic); 
 
 
 

 
 
 
 
 
% sum(logic) is the number of examples assigned to kth centroid
 
 
 

 
 
 
 
 
 end

Randomly initi alize cluster centroids

 
 
 
 
 
 centroids = zeros(K, size(X, 2));
 
 
 

 
 
 
 
 
 randidx = randperm(size(X, 1));
 
 
 
 
 
 
 
 
% Take the first K examples as centroids
 
 
 < /div>
 
 
 
 
 
 centroids = X(randidx(1:K), :);< br />

K -Means Clustering on Pixels

 
 
 
 
 
% Run K-Means
 
 
 

 
 
 
 
 
 for i=1:max_iters
 
 
 

 
 
  
 
 
 

 
 
 
 
 
% Output progress
 
 
 

 
 
 
 
 
 fprintf('K-Means iteration %d/%d...n', i, max_iters );
 
 
 

 
 
 
 
 
 if exist('OCTAVE_VERSION')
 
 
 

 
 
 
 
 
 fflush(stdout);
 
 
 

 
 
 
 
 
 end
 < br /> 
 

 
 
  
 
 
 

 
 
 
 
 
% For each example in X, assign it to the closest centroid
 
 
 

 
 
 
 
 
 idx = findClosestCentroids(X, centroids);< br /> 
 
 

 
 
  
 
 
 

 
 
 
 
 
% Given the memberships, compute new centroids
 
 
 

 
 
 
 
 
 centroids = computeCentroids( X, idx, K);
 
 
 

 
 
 
 
 
 end

K = size(centroids, 1);

distance = zeros(K, 1);% to store and return the min distance

idx = zeros(size(X,1), 1);

for i = 1:size(X, 1)

for k = 1:K

distance( k) = sqrt(sum((X(i,:)-centroids(k,:)).^2));

end

[mini, index] = min (distance);

idx(i) = index;

end

[mn] = size(X);

centroids = zeros(K, n);

for k=1:K

log big column KMeans Clusteringic = idx==k;

centroids(k,:) = 1/sum(logic)*sum(X.*logic);

% sum(logic) is the number of examples assigned to kth centroid

end

1

centroids = zeros(K, size(X, 2));

randidx = randperm(size(X, 1));

% Take the first K examples as centroids

centroids = X(randidx(1:K), :);

12

% Run K-Means

for i=1:max_iters

% Output progress

fprintf(‘K-Means iteration %d/%d…n’, i, max_iters);

if exist(‘OCTAVE_VERSION’)

fflush(stdout);

end

% For each example in X, assign it to the closest centroid

idx = findClosestCentroids(X, centroids);

% Given the memberships, compute new centroids

centroids = computeCentroids(X, idx, K);

end

Leave a Comment Cancel reply