Data visualization is an important aspect of machine learning. It allows you to explore your data and represent it in a form that other people can easily understand. While quite a few libraries are available for data visualization, the Python Bokeh library is easily the easiest and most flexible to use.
Not only can you customize the graphs easily, but you can also create web layouts with them. In this tutorial on Python bokeh, you will take a look at the various ways to plot different graphs with bokeh. You will see how they can be customized and create a web layout too.
What Is Bokeh?
Figure 1: Python Bokeh
Scatter plots are a plot of each data point in the data. You use them to plot two numeric variables against one another, and each data point on the x-axis has a corresponding, individually plotted value on the y-axis. As a result, the plot looks like a bunch of scattered points.
Now, see how scatter plots are plotted in Python bokeh. You need to first start by importing all the necessary modules.
Figure 2: Importing Bokeh
To create a scatter plot, draw small circles corresponding to the x and y-coordinate points.
Figure 3: Scatter plot
A line chart represents data as a series of points connected by a line. It is used to see trends in your data and track the way data changes over a period of time.
A line plot can be drawn with the help of the line function in the plotting module of bokeh. Plotting contains all the graphs that can be plotted in Python bokeh.
Figure 4: Line plot
To better understand how Python bokeh works, use the among us dataset. This contains information about 2227 games played by 29 users. Among us is a mobile game which 4 -10 people can play. The game takes place on a spaceship, and 1 - 2 people are the imposters while the others are crewmates. The imposters have to kill the crewmates, and the crewmates have to figure out who the imposters are. The game ends when all the imposters have been outed, or the same number of crewmates and imposters remain.
This dataset is available on Kaggle. You need to start by importing the data. As each user data is stored in a different file, you read the contents of each file into a pandas dataset.
Figure 5: Importing our dataset
The data looks as shown below:
Figure 6: Among us dataset
Now, use the describe function to see statistical information about your dataset.
Figure 7: Dataset Statistics
The ‘Game Length’ column tells you the duration of each game. The time contained in the column is in the form of minutes and seconds. Next, split the column and extract only the minutes from it.
Figure 8: Creating a new column
The murdered column contains only yes/no entries. Now, change them to Murdered, Not Murdered, and Missing. Your final dataset is shown below.
Figure 9: Changing a column
A pie chart is a circular chart divided into slices to represent how much data belongs to a specific category. It is a quick and easy way to see the classes in your data and the percentage of the dataset they represent. Donut charts are like pie charts with a hole in the center.
Now, use the among us data to plot a pie chart. A pie chart can be plotted by using plot_bokeh. The kind attribute is used to specify the kind of graph to be plotted. In this case, you will have to set it to ‘pie’.
Figure 10: Pie Chart
From the above pie chart, you can infer that around 75% of the data falls in the crewmate category and 25% in the imposter category.
Another type of circular chart is a Donut Chart. A donut chart is a kind of pie chart with a circular space in the middle. The extra space can be used to represent data, or another graph can be added.
Now, sum the counts of different categories present in the Murdered column. You will then convert the counts of each type into angles. And finally, you will allocate a different color to each category. All of this information will be stored in a new dataset df_mur.
Figure 11: Creating a new dataset for donut plot
You can plot the donut chart by using the angular_wedge function.
Figure 12: Creating a donut plot
The below graph shows the donut plot obtained after running the above code. The values of Murdered and Not Murdered are close to each other, with a significant amount of values missing. Because of this, you cannot determine if the majority of the people were murdered or not.
Figure 13: Donut plot
Histograms are used to plot numerical data according to the range they fall into. The data is plotted in bins or rectangles. The y-axis corresponds to the amount of data present in a specific range or at a certain point.
You use the plot_bokeh function to plot a histogram of the minutes' column. This column tells you the duration of each game. Here, you need to change the kind attribute to ‘hist’.
Figure 14: Histogram
From the above graph, you can say that most games last for 6 - 14 minutes.
To better plot categorical data, you can assign different colors to each category to understand how they compare. The below histogram shows the number of imposters and crewmates left at the end of each game. You are plotting the game length with the team column.
Figure 15: Stacked Histogram
The above graph tells you that the longer a game goes on, the higher the chances of imposters getting caught. The games which go on beyond 4.5 minutes barely have any imposters left.
While histograms plot numerical data distributions, bar plots represent the data distribution for categorical data. It also uses bins to plot the amount of data. When the bins are stacked on top of each other, it is called a stacked bar chart.
To plot a bar graph, you are going to use the ‘bar’ attribute of plot_bokeh. The below bar graph shows you the number of people who have completed the tasks given to them.
Figure 16: Bar graph
Now, go ahead plot two bar graphs on top of each other. This type of bar chart is called a stacked bar graph. Here, plot the teams and outcomes against each other. A stacked bar graph can be plotted by changing the stacked attribute to true.
Figure 17: Stacked Bar graph
You can also plot bars horizontally using the ‘barh’ method of plot_bokeh. Here, you need to plot the outcomes and tasks together.
Figure 18: Horizontally Stacked Bar graph
Another bar graph is the stacked vertical bar graph. Here, one half of the graph is negative, and the other half is positive. To plot this, multiply the loss column by -1.
Figure 19: Creating negative columns
The below bar graph shows the users who won and lost and their user id.
Figure 20: Stacked vertical bar graph
An area chart combines the line and bar chart when the area below a line is shaded. Using an area chart, you can see how the value of different groups changes over a period of time. The area chart has different baselines to show the vertical range of data.
You can plot an area chart by using the varea_stack method. Here, you plot the sabotages fixed, and the time they were fixed at.
Figure 21: Area Plot
From the above figure, you can see that fewer sabotages are fixed as time increases.
The layout function in Python Bokeh is used to arrange our various plots and widgets. This makes it possible for us to see multiple graphs at the same time. Used primarily for designing dashboards, it lets you build grids of plots.
You can create a layout by using the grid function from bokeh.layouts. You start by creating multiple graphs. Here, you first plot a lollipop graph of the top 10 users with the most wins. You then make a donut graph of the ratio of murdered crewmates. You also must make a donut plot of the number of crewmates and imposters.
Figure 22: Creating multiple plots
After plotting our graphs, you need to use the grid function to arrange them in a layout. The charts which are going to appear in the same row are placed in the same list. The lists are comma-separated to determine which graphs appear at the top and bottom.
Figure 23: Bokeh Layout
Interactivity With Bokeh
Now, use Python bokeh to design a dashboard to represent the horsepower in cars. You will also make your graph interactive and see how maximum information can be conveyed with a single graph.
Start by importing all necessary modules into your program.
Figure 24: Importing necessary modules
Then, you need to read your data in the form of a dataframe. You need to use the ColumnDataSource() function to convert the data into a format accepted by python bokeh. It is used to provide data to glyphs in bokeh.
Figure 25: Reading in data
The below figure shows the car’s data frame. The data consists of the car name, horsepower, and price of each vehicle. There is a column that has the link to the car image.
Figure 26: Cars Dataset
Now, make a horizontal bar graph of the above data. You start by creating a plot of width 800 and height 600. The title you give to our plot is ‘Cars with Top Horsepower’, and the x-axis has the label ‘Horsepower’. You also need to specify the tools that will be used.
Then add a horizontal bar graph to the graph and add a color palette.
Figure 27: Creating a horizontal bar graph
Finally, add a hover box and customize it with HTML to display the car price, horsepower, and image according to the link and display the graph.
Figure 28: Creating a hover tool
The result is the graph as shown below. The cars are arranged in ascending order of their horsepower. The more the horsepower of a car, the darker their bin. The legend for the graph is given on the top right and tells you the car each color is associated with.
Figure 29: Cars Graph
Master Deep Learning, Machine Learning, and other programming languages with Artificial Intelligence Engineer Master’s Program
In this Python bokeh tutorial, you first looked into bokeh and its different uses. You then took a look at how different types of graphs can be plotted in bokeh and how a layout can be created. Finally, you used a cars dataset to create a layout in Python bokeh.
We hope this helped you understand how to make interactive plots with Python Bokeh. To learn more about deep learning and machine learning, check out Simplilearn's Artificial Intelligence course. On the other hand, if you need any clarifications on this Python Bokeh tutorial, share them with us by commenting below, and we will have our experts review them at the earliest!