Analyzing data with the help of charts and graphs makes you understand your data better. Exploratory data analysis is crucial in any data analytics process. It allows you to find trends in your data that you can’t notice just by looking at the data. Python Seaborn library helps you visualize the data and draw conclusions. In this tutorial, you’ll understand the Python Seaborn library and how to create different plots using multiple datasets.
Below are the topics that will this tutorial on Python Seaborn will cover:
- What is Seaborn?
- Importing libraries in Jupyter Notebook
- Loading dataset
- Python Seaborn Plotting Functions
- Bar plot
- Count plot
- Distribution plot
- Scatter plot
- Pair plot
- Linear Regression plot
- Box plot
What Is Seaborn in Python?
Python Seaborn library is a widely popular data visualization library that is commonly used for data science and machine learning tasks. You build it on top of the matplotlib data visualization library and can perform exploratory analysis. You can create interactive plots to answer questions about your data.
To understand the Seaborn library and the different plotting functions in detail, you’ll need to use a few datasets to create the visualizations.
Importing Libraries in Jupyter Notebook
While working on an exploratory data analysis project using Python, you will need NumPy, Pandas, Matplotlib, and Seaborn libraries. Now, go ahead and import them.
You must use the popular mtcars dataset for the learning. The data is taken from the 1974 Motor Trend US magazine. It has information about fuel consumption and 10 different aspects of automobile design and performance for 32 cars.
- Let’s load this dataset using the Pandas read_csv() function.
- Below is how the head of the data frame looks like.
- Now, use the info() function to print the summary of the data frame. It returns information regarding the index dtype, column dtypes, non-null values, and memory usage.
- Now, check the shape of the mtcars dataframe.
Python Seaborn Plotting Functions
The Seaborn library provides a range of plotting functions that makes the visualization and analysis of data easier. You’ll cover some of the crucial plots in this tutorial.
A bar plot gives an estimate of the central tendency for a numeric variable with the height of each rectangle. It provides some indication of the uncertainty around that estimate using error bars. To build this plot, you usually choose a categorical column on the x-axis and a numerical column on the y-axis.
In the above plot, you have used the barplot() function and passed it in the cylinder (cyl) column in the x-axis and carburetors (carb) in the y-axis.
The code depicted below is another way to create the same bar plot.
Here you are exclusively defining the x and y-axis columns and also passing the name of the data frame using the data argument.
Python Seaborn allows the users to assign colors to the bars. The bar chart below will convert all the bars to yellow color.
Seaborn library also has the palette attribute which you can use to give different colors to the bars.
In the example below, there is a bar plot that uses palette = ‘rocket’.
The countplot() function in the Python Seaborn library returns the count of total values for each category using bars.
The below count plot returns the number of vehicles for each category of cylinders.
The next count plot shows the number of cars for each carburetor.
Python Seaborn allows you to create horizontal count plots where the feature column is in the y-axis and the count is on the x-axis.
The below visualization shows the count of cars for each category of gear.
From the above plot, you can see that we have 15 vehicles with 3 gears, 12 vehicles with 4 gears, and 5 vehicles with 5 gears.
Now, you can also create a grouped count plot using the hue parameter. The hue parameter accepts the column name for color encoding.
In the below count plot, you have the count of cars for each category of gears that are grouped based on the number of cylinders.
The Seaborn library supports the distplot() function that creates the distribution of any continuous data.
In the below example, you must plot the distribution of miles per gallon of the different vehicles. The mpg metrics measure the total distance the car can travel per gallon of fuel.
Heatmaps in the Seaborn library lets you visualize matrix-like data. The values of the variables are contained in a matrix and are represented as colors.
Below is an example of the heatmap where you are finding the correlation between each variable in the mtcars dataset.
The Seaborn scatterplot() function helps you create plots that can draw relationships between two continuous variables.
Moving ahead, to understand scatter plots and other plotting functions, you must use the IRIS flower dataset.
So, go ahead and load the iris dataset.
The scatter plot below shows the relationship between sepal length and petal length for different species of iris flowers.
Now, you can classify the different species of flowers using the hue parameter as “species” in the function.
From the below plot, you can easily differentiate the three types of iris flowers based on their sepal length and petal length.
The Python Seaborn library lets you visualize data using pair plots that produce a matrix of relationships between each variable in the dataset.
In the below plot, all the plots are histograms that represent the distribution of each feature.
You can convert the diagonal visuals to KDE plots and the rest to scatter plots using the hue parameter. This makes the pairplot easier to classify each type of flower.
Linear Regression Plot
The lmplot() function in the Seaborn library draws a linear relationship as determined through regression for the continuous variables.
The plot below shows the relationship between petal length and petal width of the different species of iris flowers.
The hue parameter can differentiate between each species of flower and you can set markers for different species.
A boxplot, also known as a box and whisker plot, depicts the distribution of quantitative data. The box represents the quartiles of the dataset. The whiskers show the rest of the distribution, except for the outlier points.
The boxplot below shows the distribution of the three species of iris flowers based on their sepal width.
Looking forward to a career in Data Analytics? Check out the Data Analytics Bootcamp and get certified today.
Data visualization plays an important role in exploratory data analysis and the Seaborn library makes that task really easy and interesting by providing in-built plotting functions. In this tutorial, you explored a few of them using two datasets - mtcars and iris.
Do you have questions about this Python Seaborn tutorial? If you do, then please post them in the comments section. Our team will resolve them at the earliest.
To get certified in Data Analytics, click on the following link: Data Analytics Simplilearn