Member-only story

Unsupervised Learning: Partitioning (Distance-based) Clustering Using K-Means and K-Medoids Algorithms

Abstract

Ashkan Beheshti
13 min readMar 5, 2023

In this tutorial we will discuss the concept of partitioning clustering, a popular unsupervised learning technique used to group similar data points into clusters based on their similarities and differences. The technique divides the data points into a fixed number of clusters, where each data point belongs to only one cluster. The primary objective of partitioning clustering is to minimize the intra-cluster distance and maximize the inter-cluster distance. We describe two main types of partitioning clustering, K-Means and K-Medoids, and explains the difference between centroid and medoid. It provides mathematical formulas to calculate the centroid and medoid of a cluster.

The tutorial also discusses the importance of choosing the optimal number of clusters and introduces the elbow method for determining the optimal number of clusters. It explains the KMeans() arguments and parameters, such as n_clusters, init, n_init, max_iter, tol, precompute_distances, verbose, random_state, copy_x, and n_jobs.

We also take a look at a practical example of how to use the K-means algorithm to cluster the iris dataset using Python’s Scikit-learn library. This example outlines the steps involved, such as importing necessary libraries, loading the dataset, performing K-means clustering, visualizing the clusters, interpreting the results, experimenting with different values of K, and…

--

--

Ashkan Beheshti
Ashkan Beheshti

Written by Ashkan Beheshti

Psychologist-Data Scientist, exploring the interplay between human learning & machine learning

No responses yet