A common job of machine learning algorithms is to recognize objects and being able to separate them into categories. This process is called classification, and it helps us segregate vast quantities of data into discrete values, i.e. :distinct, like 0/1, True/False, or a pre-defined output label class.
What is Supervised Learning?
Before we dive into Classification, let’s take a look at what Supervised Learning is. Suppose you are trying to learn a new concept in maths and after solving a problem, you may refer to the solutions to see if you were right or not. Once you are confident in your ability to solve a particular type of problem, you will stop referring to the answers and solve the questions put before you by yourself.
This is also how Supervised Learning works with machine learning models. In Supervised Learning, the model learns by example. Along with our input variable, we also give our model the corresponding correct labels. While training, the model gets to look at which label corresponds to our data and hence can find patterns between our data and those labels.
Some examples of Supervised Learning include:
- It classifies spam Detection by teaching a model of what mail is spam and not spam.
- Speech recognition where you teach a machine to recognize your voice.
- Object Recognition by showing a machine what an object looks like and having it pick that object from among other objects.
We can further divide Supervised Learning into the following:
Figure 1: Supervised Learning Subdivisions
What is Classification?
Classification is defined as the process of recognition, understanding, and grouping of objects and ideas into preset categories a.k.a “sub-populations.” With the help of these pre-categorized training datasets, classification in machine learning programs leverage a wide range of algorithms to classify future datasets into respective and relevant categories.
Classification algorithms used in machine learning utilize input training data for the purpose of predicting the likelihood or probability that the data that follows will fall into one of the predetermined categories. One of the most common applications of classification is for filtering emails into “spam” or “non-spam”, as used by today’s top email service providers.
In short, classification is a form of “pattern recognition,”. Here, classification algorithms applied to the training data find the same pattern (similar number sequences, words or sentiments, and the like) in future data sets.
We will explore classification algorithms in detail, and discover how a text analysis software can perform actions like sentiment analysis - used for categorizing unstructured text by opinion polarity (positive, negative, neutral, and the like).
Figure 2: Classification of vegetables and groceries
- Naive Bayes: Naive Bayes is a classification algorithm that assumes that predictors in a dataset are independent. This means that it assumes the features are unrelated to each other. For example, if given a banana, the classifier will see that the fruit is of yellow color, oblong-shaped and long and tapered. All of these features will contribute independently to the probability of it being a banana and are not dependent on each other. Naive Bayes is based on Bayes’ theorem, which is given as:
Figure 3 : Bayes’ Theorem
P(A | B) = how often happens given that B happens
P(A) = how likely A will happen
P(B) = how likely B will happen
P(B | A) = how often B happens given that A happens
- Decision Trees: A Decision Tree is an algorithm that is used to visually represent decision-making. A Decision Tree can be made by asking a yes/no question and splitting the answer to lead to another decision. The question is at the node and it places the resulting decisions below at the leaves. The tree depicted below is used to decide if we can play tennis.
Figure 4: Decision Tree
In the above figure, depending on the weather conditions and the humidity and wind, we can systematically decide if we should play tennis or not. In decision trees, all the False statements lie on the left of the tree and the True statements branch off to the right. Knowing this, we can make a tree which has the features at the nodes and the resulting classes at the leaves.
- K-Nearest Neighbors: K-Nearest Neighbor is a classification and prediction algorithm that is used to divide data into classes based on the distance between the data points. K-Nearest Neighbor assumes that data points which are close to one another must be similar and hence, the data point to be classified will be grouped with the closest cluster.
Figure 5: Data to be classified
Figure 6: Classification using K-Nearest Neighbours
To evaluate the accuracy of our classifier model, we need some accuracy measures. The following methods are used to see how well our classifiers are predicting:
- Holdout Method: It is one of the most common methods of evaluating the accuracy of our classifiers. In this method, we divide the data into two sets: a Training set and a Testing set. The training set is shown to our model, and the model learns from the data in it. The data in the testing set is withheld from the model, and after the model is trained, the testing set is used to test its accuracy. The training set will have both the features and the corresponding label, but the testing set will only have the features and the model will have to predict the corresponding label.
The predicted labels are then compared to the actual labels and accuracy is found out seeing how many labels the model got right.
- Bias and Variance: Bias is the difference between our actual and predicted values. Bias is the simple assumptions that our model makes about our data to be able to predict on new data. It directly corresponds to the patterns found in our data. When the Bias is high, assumptions made by our model are too basic, the model can’t capture the important features of our data, this is called underfitting.
Figure 7: Bias
We can define variance as the model’s sensitivity to fluctuations in the data. Our model may learn from noise. This will cause our model to consider trivial features as important. When the Variance is high, our model will capture all the features of the data given to it, will tune itself to the data, and predict on it very well but new data may not have the exact same features and the model won’t be able to predict on it very well. We call this Overfitting.
Figure 8: Example of Variance
- Precision and Recall: Precision is used to calculate the model's ability to classify values correctly. It is given by dividing the number of correctly classified data points by the total number of classified data points for that class label.
TP = True Positives, when our model correctly classifies the data point to the class it belongs to.
FP = False Positives, when the model falsely classifies the data point.
Recall is used to calculate the ability of the mode to predict positive values. But, "How often does the model predict the correct positive values?". This is calculated by the ratio of true positives and the total number of actual positive values.
Acelerate your career in AI and ML with the Post Graduate Program in AI and Machine Learning with Purdue University collaborated with IBM.
In this article - Everything you need to know about Classification in Machine learning, we have taken a look at what Supervised Learning is, and its sub-branch Classification, and also learned about some of the classification models which are commonly used and how to predict the accuracy of those models and see if they are trained perfectly. Hopefully, you now know everything you need about Classification!
Was this article on Classification useful to you? Do you have any doubts or questions for us? Mention them in this article's comments section, and we'll have our experts answer them for you at the earliest!