Fri 27 Oct – projects

Having performed both the k means clustering and DBSCAN algorithm on the data provided, I noticed that both operate on different principles and have distinct characteristics.

K-means is a partitioning technique divides data points into K clusters by calculating the least squared distance between each point and the cluster centre. DBSCAN is a density-based method that detects outliers in low-density areas while clustering nearby data points in high-density regions.
K-means makes the spherical, roughly equal-size cluster assumption and need to specify the number of clusters before running the algorithm. This means that when dealing with clusters that have different densities, sizes, or irregular shapes, it might not perform effectively.DBSCAN is more resilient when handling clusters with irregular shapes since it can recognise clusters of any shape and automatically determines the number of clusters based on data’s density structure. There is no set cluster shape that it must adhere to.

DBSCAN algorithm efficiently recognises and classifies outliers as noise. It works well with our dataset because it is resistant against noise and does not allocate noise points to individual clusters. Whereas K-means does not specifically address the noise or outliers.
K-means is appropriate for large datasets with modest dimensions and can be computationally efficient. It may, however, be influenced by the original cluster centres selected. DBSCAN can handle datasets with different densities and is less sensitive to the initial parameters. It is less effective for high-dimensional data, though, as its complexity increases with dataset size and dimensionality.

Leave a Reply Cancel reply