This is ‘Unsupervised Learning with Clustering’ tutorial which is a part of the Machine Learning course offered by Simplilearn. We will learn machine learning clustering algorithms and K-means clustering algorithm majorly in this tutorial.
Let us look at the objectives covered in this Clustering Tutorial.
Cluster analysis or clustering is the most commonly used technique of unsupervised learning. It is used to find data clusters such that each cluster has the most closely matched data.
The types of Clustering Algorithms are:
Want to learn more about machine learning clustering algorithms? Click here!
Prototype-based clustering assumes that most data is located near prototypes; example: centroids (average) or medoid (most frequently occurring point) K-means, a Prototype-based method, is the most popular method for clustering that involves:
K-means clustering is an unsupervised learning algorithm. In this case, you don't have labeled data unlike in supervised learning. You have a set of data that you want to group into and you want to put them into clusters, which means objects that are similar in nature and similar in characteristics need to be put together. This is what k-means clustering is all about. The term K is basically is a number and you need to tell the system how many clusters you need to perform. If K is equal to 2, there will be 2 clusters if K is equal to 3, 3 clusters and so on and so forth. That's what the K stands for and of course, there is a way of finding out what is the best or optimum value of K.
Let us understand K-means clustering examples below.
Let’s say, in California, the government tries to identify high-density clusters to build hospitals (no other ground truth or features are provided apart from the population data). How can the clusters be identified?
Step 1: Randomly Pick K Centroids
Start by picking k centroids. Assume, k = 3
Finding the number of clusters: Use Elbow Method (to be reviewed later)
Step 2: Assign Each Point To The Nearest Centroid Μ(J), J ∈ {1,…, K}.
The points are assigned such that the Euclidean distance of each point from the respective centroid is minimized
Step 3: Move Each Centroid To The Centre Of The Respective Cluster
Step 4: Calculate Distance Of The Centroids From Each Point Again
Calculate the Euclidean distance between each point and its centroid.
Step 5: Move Points Across Clusters And Re-calculate The Distance From The Centroid
Step 6: Keep Moving The Points Across Clusters Until The Euclidean Distance Is Minimized
Repeat the steps until the within-cluster Euclidean distance is minimized for each cluster (or a user-defined limit on the number of iterations is reached)
The analysis was based on a lot of calculations. Now let’s understand the mathematical aspect.
Learn more about the k-means clustering algorithm. Click here!
Scikit-learn cluster module has the K-means function. In the code shown,
Some of the examples related to K-means Clustering.
Let us quickly go through what you have learned so far in this tutorial.
This concludes “Unsupervised Learning with Clustering.” With this we come to an end to the Machine Learning Tutorial.
Name | Date | Place | |
---|---|---|---|
Machine Learning | 27 Feb -3 Apr 2021, Weekend batch | Your City | View Details |
Machine Learning | 8 Mar -26 Mar 2021, Weekdays batch | New York City | View Details |
Machine Learning | 12 Mar -16 Apr 2021, Weekdays batch | San Francisco | View Details |
To learn more, take the Course
Machine Learning Certification Training100% Money Back Gaurantee
A Simplilearn representative will get back to you in one business day.