An Interesting Guide to Visualizing Data Using Python Seaborn

Analyzing data with the help of charts and graphs makes you understand your data better. Exploratory data analysis is crucial in any data analytics process. It allows you to find trends in your data that you can’t notice just by looking at the data. Python Seaborn library helps you visualize the data and draw conclusions. In this tutorial, you’ll understand the Python Seaborn library and how to create different plots using multiple datasets.

Learn Data Analytics From IBM Experts!

Data Analyst Master’s ProgramExplore Program
Learn Data Analytics From IBM Experts!

Below are the topics that will this tutorial on Python Seaborn will cover:

  • What is Seaborn?
  • Importing libraries in Jupyter Notebook
  • Loading dataset
  • Python Seaborn Plotting Functions
  • Bar plot
  • Count plot
  • Distribution plot
  • Heatmap
  • Scatter plot
  • Pair plot
  • Linear Regression plot
  • Box plot

What Is Seaborn in Python?

Python Seaborn library is a widely popular data visualization library that is commonly used for data science and machine learning tasks. You build it on top of the matplotlib data visualization library and can perform exploratory analysis. You can create interactive plots to answer questions about your data.

To understand the Seaborn library and the different plotting functions in detail, you’ll need to use a few datasets to create the visualizations.

Want to Become a Data Analyst? Learn From Experts!

Data Analyst Master’s ProgramExplore Program
Want to Become a Data Analyst? Learn From Experts!

Importing Libraries in Jupyter Notebook

While working on an exploratory data analysis project using Python, you will need NumPy, Pandas, Matplotlib, and Seaborn libraries. Now, go ahead and import them.

Python_Seaborn_1

Loading Dataset

You must use the popular mtcars dataset for the learning. The data is taken from the 1974 Motor Trend US magazine. It has information about fuel consumption and 10 different aspects of automobile design and performance for 32 cars.

  • Let’s load this dataset using the Pandas read_csv() function.

Python_Seaborn_2

  • Below is how the head of the data frame looks like.

Python_Seaborn_3.

  • Now, use the info() function to print the summary of the data frame. It returns information regarding the index dtype, column dtypes, non-null values, and memory usage.

Python_Seaborn_4

  • Now, check the shape of the mtcars dataframe.

Python_Seaborn_5

Python Seaborn Plotting Functions

The Seaborn library provides a range of plotting functions that makes the visualization and analysis of data easier. You’ll cover some of the crucial plots in this tutorial.

Barplot

A bar plot gives an estimate of the central tendency for a numeric variable with the height of each rectangle. It provides some indication of the uncertainty around that estimate using error bars. To build this plot, you usually choose a categorical column on the x-axis and a numerical column on the y-axis.

Python_Seaborn_6

In the above plot, you have used the barplot() function and passed it in the cylinder (cyl) column in the x-axis and carburetors (carb) in the y-axis. 

The code depicted below is another way to create the same bar plot.

Here you are exclusively defining the x and y-axis columns and also passing the name of the data frame using the data argument. 

Python_Seaborn_7.

Python Seaborn allows the users to assign colors to the bars. The bar chart below will convert all the bars to yellow color.

Python_Seaborn_8.

Seaborn library also has the palette attribute which you can use to give different colors to the bars.

In the example below, there is a bar plot that uses palette = ‘rocket’.

Python_Seaborn_9

Countplot

The countplot() function in the Python Seaborn library returns the count of total values for each category using bars.

The below count plot returns the number of vehicles for each category of cylinders.

Python_Seaborn_10

The next count plot shows the number of cars for each carburetor.

Python_Seaborn_11.

Python Seaborn allows you to create horizontal count plots where the feature column is in the y-axis and the count is on the x-axis.

The below visualization shows the count of cars for each category of gear.

Python_Seaborn_12.

From the above plot, you can see that we have 15 vehicles with 3 gears, 12 vehicles with 4 gears, and 5 vehicles with 5 gears.

Now, you can also create a grouped count plot using the hue parameter. The hue parameter accepts the column name for color encoding.

In the below count plot, you have the count of cars for each category of gears that are grouped based on the number of cylinders. 

Python_Seaborn_13

Free Course: Python for Beginners

Master the fundamentals of PythonEnroll Now
Free Course: Python for Beginners

Distribution Plot

The Seaborn library supports the distplot() function that creates the distribution of any continuous data.

In the below example, you must plot the distribution of miles per gallon of the different vehicles. The mpg metrics measure the total distance the car can travel per gallon of fuel.

Python_Seaborn_14

Heatmap 

Heatmaps in the Seaborn library lets you visualize matrix-like data. The values of the variables are contained in a matrix and are represented as colors.

Below is an example of the heatmap where you are finding the correlation between each variable in the mtcars dataset.

Python_Seaborn_15

Scatterplot

The Seaborn scatterplot() function helps you create plots that can draw relationships between two continuous variables.

Moving ahead, to understand scatter plots and other plotting functions, you must use the IRIS flower dataset.

So, go ahead and load the iris dataset.

Python_Seaborn_16

The scatter plot below shows the relationship between sepal length and petal length for different species of iris flowers.

Python_Seaborn_17

Now, you can classify the different species of flowers using the hue parameter as “species” in the function.

From the below plot, you can easily differentiate the three types of iris flowers based on their sepal length and petal length.

Python_Seaborn_18

Pairplot

The Python Seaborn library lets you visualize data using pair plots that produce a matrix of relationships between each variable in the dataset.

In the below plot, all the plots are histograms that represent the distribution of each feature.

Python_Seaborn_19.

You can convert the diagonal visuals to KDE plots and the rest to scatter plots using the hue parameter. This makes the pairplot easier to classify each type of flower. 

Python_Seaborn_20.

Free Course: Introduction to Data Visualization

Know Data Visualization Principles & FindingsEnroll Now
Free Course: Introduction to Data Visualization

Linear Regression Plot

The lmplot() function in the Seaborn library draws a linear relationship as determined through regression for the continuous variables.

The plot below shows the relationship between petal length and petal width of the different species of iris flowers.

Python_Seaborn_21

The hue parameter can differentiate between each species of flower and you can set markers for different species.

Python_Seaborn_22

Boxplot

A boxplot, also known as a box and whisker plot, depicts the distribution of quantitative data. The box represents the quartiles of the dataset. The whiskers show the rest of the distribution, except for the outlier points.

The boxplot below shows the distribution of the three species of iris flowers based on their sepal width.

Python_Seaborn_23

Looking forward to a career in Data Analytics? Check out the Data Analytics Bootcamp and get certified today.

Conclusion

Data visualization plays an important role in exploratory data analysis and the Seaborn library makes that task really easy and interesting by providing in-built plotting functions. In this tutorial, you explored a few of them using two datasets - mtcars and iris.

Become an Expert in Data Analytics!

Data Analyst Master’s ProgramExplore Program
Become an Expert in Data Analytics!

Do you have questions about this Python Seaborn tutorial? If you do, then please post them in the comments section. Our team will resolve them at the earliest. 

To get certified in Data Analytics, click on the following link: Data Analytics Simplilearn

About the Author

Ravikiran A SRavikiran A S

Ravikiran A S works with Simplilearn as a Research Analyst. He an enthusiastic geek always in the hunt to learn the latest technologies. He is proficient with Java Programming Language, Big Data, and powerful Big Data Frameworks like Apache Hadoop and Apache Spark.

View More
  • Disclaimer
  • PMP, PMI, PMBOK, CAPM, PgMP, PfMP, ACP, PBA, RMP, SP, and OPM3 are registered marks of the Project Management Institute, Inc.
  • *According to Simplilearn survey conducted and subject to terms & conditions with Ernst & Young LLP (EY) as Process Advisors