Member-only story
Unsupervised Learning: Partitioning (Distance-based) Clustering Using K-Means and K-Medoids Algorithms
Abstract
In this tutorial we will discuss the concept of partitioning clustering, a popular unsupervised learning technique used to group similar data points into clusters based on their similarities and differences. The technique divides the data points into a fixed number of clusters, where each data point belongs to only one cluster. The primary objective of partitioning clustering is to minimize the intra-cluster distance and maximize the inter-cluster distance. We describe two main types of partitioning clustering, K-Means and K-Medoids, and explains the difference between centroid and medoid. It provides mathematical formulas to calculate the centroid and medoid of a cluster.
The tutorial also discusses the importance of choosing the optimal number of clusters and introduces the elbow method for determining the optimal number of clusters. It explains the KMeans() arguments and parameters, such as n_clusters, init, n_init, max_iter, tol, precompute_distances, verbose, random_state, copy_x, and n_jobs.
We also take a look at a practical example of how to use the K-means algorithm to cluster the iris dataset using Python’s Scikit-learn library. This example outlines the steps involved, such as importing necessary libraries, loading the dataset, performing K-means clustering, visualizing the clusters, interpreting the results, experimenting with different values of K, and…