Lesson 5 of 10By Simplilearn
Last updated on May 27, 202018282Support vector machine (SVM) is a supervised machine learning algorithm that analyzes and classifies data into one of two categories — also known as a binary classifier.
In this tutorial you will learn what all that means by covering the following basics:
A computer’s ability to learn from data without explicit programming is called machine learning.
It works like this: The machine learns from the existing data and predicts or makes decisions about future data. Your data set must contain known outcomes so that the machine can learn, take the data and adjust it, and apply the machine learning algorithm. The algorithm learns, creates a model, analyzes the model, and then uses that model to make predictions.
There are three main categories of machine learning algorithms:
Supervised learning refers to a data set with known outcomes. If it is unsupervised, there are no known outcomes and you won’t have the categories or classes necessary for the machine to learn.
There are two major types of machine learning algorithms in the supervised learning category:
With classification, you predict categories while in regression, and you generally predict values.
In supervised learning, classification is multi-dimensional in the sense that sometimes you only have two classes (“yes” or “no”, or, “true” or “false”). But, sometimes you have more than two. For instance, under risk management or risk modeling, you can have “low risk”, “medium risk”, or “high risk.” SVM is a binary classifier (a classifier used for those true/false, yes/no types of classification problems).
Features are important in supervised learning. If there are several features, SVM may be the better classification algorithm choice as opposed to logistic regression. Under supervised learning, you present the computer with example inputs and their desired outputs (those known outcomes). The goal is to learn a general rule that maps inputs to those outputs.
Bug detection, customer churn, stock price prediction (not the value of the stock price, but whether or not it will rise or fall), and weather prediction (sunny/not sunny; rain/no rain) are all examples.
Classification algorithms generally take past data (data for which you have known outcomes), train the model, take new data once the model is trained, ingest it, and create predictions (e.g., is it a truck or is it a car?).
SVM is a type of classification algorithm that classifies data based on its features. An SVM will classify any new element into one of the two classes.
Once you give it some inputs, the algorithm will segregate and classify the data and then create the outputs. When you ingest more new data (an unknown fruit variable in this example), the algorithm will correctly classify the fruit: e.g., “apple” versus “orange”.
The following are some examples to understand SVM in detail:
The goal of this example is to classify cricket players into batsmen or bowlers using the runs-to-wicket ratio. A player with more runs would be considered a batsman and a player with more wickets would be considered a bowler.
If you take a data set of cricket players with runs and wickets in columns next to their names, you could create a two-dimensional plot showing a clear separation between bowlers and batsmen. Here we present a data set with clear segregation between bowlers versus batsmen to help understand SVM.
Before separating anything using high-level mathematics, let’s look at an unknown value, which is new data being introduced into the dataset without a predesignated classification.
The next step is to draw a decision boundary, or a line separating the two classes to help classify the new data points.
You can actually draw several boundaries, as shown above. Then, you need to find the line of best fit that clearly separates those two groups. The correct line will help you classify the new data point.
You can find the best line by computing the maximum margin from equidistant support vectors. Support vectors in this context simply mean the two points — one from each class that are closest together, but that maximize the distance between them or the margin.
Note: You may think that the word vector refers to data points. While this may be the case in two-dimensional or three-dimensional spaces, once you get into higher dimensions with more features in your data set, you need to look at these as vectors. The reason they are support vectors is that the two vectors closest together maximize the distance between the two groups supporting the algorithm.
There are a couple of points at the top that are pretty close to one another, and similarly at the bottom of the graph. Shown below are the points that you need to consider. The rest of the points are too far away. The bowler points to the right and the batsman points to the left.
Mathematically, you can calculate the distance among all of these points and minimize that distance. Once you pick the support vectors, draw a dividing line, and then measure the distance from each support vector to the line. The best line will always have the greatest margin or distance between the support vectors.
For instance, if you consider the yellow line as a decision boundary, the player with the new data point is the bowler. But, as the margins don’t appear to be maximum, you can come up with a better line.
Use other support vectors, draw the decision boundary between those, and then calculate the margin. Notice now that the unknown data point would be considered a batsman.
Continue doing this until you find the correct decision boundary with the greatest margin.
If you look at the green decision boundary, the line appears to have a maximum margin compared to the other two. This the boundary of greatest margin and when you classify your unknown data value, you can see that it clearly belongs to the batsman's class. The green line divides the data perfectly because it has the maximum margin between the support vectors. At this point, you can be confident with the classification — the new data point is indeed a batsman.
Technically, this dividing line is called a hyperplane. In two-dimensional spaces, we typically refer to the dividing lines as “lines,” but in three-dimensional and higher dimensions, they're considered “planes” or ”hyperplanes.” Technically, they are all hyperplanes.
The hyperplane with the maximum distance from the support vectors is the one you want. Sometimes called the positive hyperplane (D+), it is the shortest distance to the closest positive point and (D-), or the negative hyperplane, which is the shortest distance to the closest negative point.
The sum of (D+) and (D-) is called the distance margin. You should always try to maximize the distance margin to avoid misclassification. For instance, you can see the yellow margin is much smaller than the green margin.
This problem set is two-dimensional because the classification is only between two classes. It is called a linear SVM.
The data set shown below has no clear linear separation between the two classes. In machine learning parlance, you would say that these are not linearly separable. How can you get the support vector machine to work on such data?
Since you can't separate it into two classes using a line, you need to transform it into a higher dimension by employing a kernel function to the data set.
A higher dimension enables you to clearly separate the two groups with a plane. Here, you can draw some planes between the green dots and the red dots — with the end goal of maximizing the margin.
If you let R=the number of dimensions, the kernel function will convert a two-dimensional space (R2) to a three-dimensional space (R3). Once the data is separated into three dimensions, you can apply SVM and separate the two groups using a two-dimensional plane.
This is similar in the higher dimensions (3+D):
There are many types of kernel functions, such as:
Depending on the dimensions and how you want to transform the data, you can choose from any of these kernel functions.
Let’s discuss a use case where we use SVM to classify new data as horses or mules.
Problem statement: Classify horses and mules using height and weight as the two features. Horses and mules typically have different weights and heights, with horses being heavier and taller.
The following are the steps to make the classification:
Here's the R code:
SVM relies on supervised learning algorithms to perform classifications. It is a powerful method to classify unstructured data, make reliable predictions, and reduce redundant information.
What’s more, SVM has applications in different areas of daily life, such as:
After that deep dive into SVM, it’s worth backing up to look at the big picture – data science is already widespread in our daily lives, and it’s only going to become more so in the future. It’s also one of the hottest and most lucrative careers around now. Surely, if you took the time to read this article, you have some interest. Whether you’re a beginner or are looking to move your existing career to the next level, Simplilearn’s expertly designed programs help you with everything you need to know about this promising field. Explore our Machine Learning Certification Course and Data Science with R Certification program to gain all the skills you need to make a great career move. Our Data Science with R Certification program is co-developed with IBM, and offers you the right blend of highly effective learning approaches – from lifetime access to self-paced learning, to live online classes, to work on real-world industry projects. Sign up now to boost your career!
Simplilearn is one of the world’s leading providers of online training for Digital Marketing, Cloud Computing, Project Management, Data Science, IT, Software Development, and many other emerging technologies.
Data Scientist
Data Science Certification Training - R Programming
Data Science with Python
*Lifetime access to high-quality, self-paced e-learning content.
Explore CategoryHow to Become a Data Scientist?
Data Science Career Guide: A comprehensive playbook to becoming a Data Scientist
A Day in the Life of a Data Scientist
How to Build a Career in Data Science?
Data Science with R: Getting Started
Data Science Interview Guide