Convolutional Neural Network Tutorial

Have you ever pondered how facial recognition works on social media, or how object detection helps in building self-driving cars, or how disease detection is done using visual imagery in healthcare? It’s all possible thanks to convolutional neural networks (CNN).

Master deep learning concepts, implement deep learning algorithms, and more with the Deep Learning with TensorFlow Training Course. Click to enroll now!

Here’s an example of convolutional neural networks that illustrates how they work:

Imagine there’s an image of a bird, and you want to identify whether it’s a bird or some other object. The first thing you do is feed the pixels of the image in the form of arrays to the input layer of the neural network (multi-layer networks used to classify things). The hidden layers carry out feature extraction by performing different calculations and manipulations. There are multiple hidden layers like the convolution layer, the ReLU layer, and pooling layer, that perform feature extraction from the image. Finally, there’s a fully connected layer that identifies the object in the image.


Fig: Convolutional Neural Network to identify the image of a bird

With this, let’s deep dive into learning convolutional neural networks in detail. We’ll be covering the following topics:

  • Introduction to CNN
  • What is a convolutional neural network?
  • How do CNNs recognize images?
  • Layers in CNN
  • Use case implementation using CNN

Introduction to CNN

Yann LeCun, director of Facebook’s AI Research Group, is the pioneer of convolutional neural networks. He built the first convolutional neural network called LeNet in 1988. LeNet was used for character recognition tasks like reading zip codes and digits.

What is Convolutional Neural Network?

A convolutional neural network is a feed-forward neural network that is generally used to analyze visual images by processing data with grid-like topology. It’s also known as a ConvNet. A convolutional neural network is used to detect and classify objects in an image.

Below is a neural network that identifies two types of flowers: Orchid and Rose.


In CNN, every image is represented in the form of an array of pixel values.


The convolution operation forms the basis of any convolutional neural network. Let’s understand the convolution operation using two matrices, a and b, of 1 dimension.

a = [5,3,7,5,9,7]

b = [1,2,3]

In convolution operation, the arrays are multiplied element-wise, and the product is summed to create a new array, which represents a*b.


The first three elements of the matrix are multiplied with the elements of matrix b. The product is summed to get the result.

The next three elements from the matrix a are multiplied by the elements in matrix b, and the product is summed up.


This process continues until the convolution operation is complete.

How Does CNN Recognize Images?

Consider the following two images:


The boxes that are colored represent a pixel value of 1, and 0 if not colored.

When you press backslash (\), the below image gets processed.


When you press forward-slash (/), the below image is processed:


Here is another example to depict how CNN recognizes an image:


As you can see from the above diagram, only those values are lit that have a value of 1.

Layers in a Convolutional Neural Network

A convolution neural network has multiple hidden layers that help in extracting information from an image. The four important layers in CNN are:

  1. Convolution layer
  2. ReLU layer
  3. Pooling layer
  4. Fully connected layer

Convolution Layer

This is the first step in the process of extracting valuable features from an image. A convolution layer has a number of filters that perform the convolution operation. Every image is considered as a matrix of pixel values.

Consider the following 5x5 image whose pixel values are either 0 or 1. There’s also a filter matrix with a dimension of 3x3. Slide the filter matrix over the image and compute the dot product to get the convolved feature matrix.


ReLU Layer

ReLU stands for the rectified linear unit. Once the feature maps are extracted, the next step is to move them to a ReLU layer. 

ReLU performs an element-wise operation and sets all the negative pixels to 0. It introduces non-linearity to the network, and the generated output is a rectified feature map. Below is the graph of a ReLU function:


The original image is scanned with multiple convolution and ReLU layers for locating the features.

Feature Map

Rectified Feature Map

Pooling Layer

Pooling is a down-sampling operation that reduces the dimensionality of the feature map. The rectified feature map now goes through a pooling layer to generate a pooled feature map.


The pooling layer uses various filters to identify different parts of the image like edges, corners, body, feathers, eyes, and beak.


Here’s how the structure of the convolution neural network looks so far:


The next step in the process is called flattening. Flattening is used to convert all the resultant 2-Dimensional arrays from pooled feature maps into a single long continuous linear vector.


The flattened matrix is fed as input to the fully connected layer to classify the image.



Let’s summarize to see the entire process about how CNN recognizes a bird:

  • The pixels from the image are fed to the convolutional layer that performs the convolution operation 
  • It results in a convolved map 
  • The convolved map is applied to a ReLU function to generate a rectified feature map 
  • The image is processed with multiple convolution and ReLU layers for locating the features 
  • Different pooling layers with various filters are used to identify specific parts of the image 
  • The pooled feature map is flattened and fed to a fully connected layer to get the final output


Use Case Implementation Using CNN

We’ll be using the CIFAR-10 dataset from the Canadian Institute For Advanced Research for classifying images across ten categories using CNN.


1. Download the data set:


2. Import the CIFAR data set:


3. Read the label names:


4. Display the images using matplotlib:



5. Use the helper function to handle data:


6. Create the model:


7. Apply the helper functions:


8. Create the layers for convolution and pooling:


9. Create the flattened layer by reshaping the pooling layer:


10. Create a fully connected layer:


11. Set the output to y_pred variable:


12. Apply the loss function:


13. Create the optimizer:


14. Create a variable to initialize all the global tf variables:


15. Run the model by creating a graph session:


Learn More About CNN and Deep Learning

By this point, you’ve seen how to build a CNN with multiple hidden layers and how to identify a bird using its pixel values. You’ve also completed a demo to classify images cross ten categories using the CIFAR dataset. 

To learn even more about convolutional neural networks and how they work, watch this tutorial video.

We've partnered with Purdue University and collaborated with IBM to offer you this unique Post Graduate Program in AI and Machine Learning. Transform yourself into an expert in deep learning techniques using TensorFlow, the open-source software library designed to conduct machine learning and deep neural network research. In this course, you’ll master deep learning and TensorFlow concepts, learn to implement algorithms, build artificial neural networks, and traverse layers of data abstraction to understand the power of data.

About the Author

Avijeet BiswalAvijeet Biswal

Avijeet is a Senior Research Analyst at Simplilearn. Passionate about Data Analytics, Machine Learning, and Deep Learning, Avijeet is also interested in politics, cricket, and football.

View More
  • Disclaimer
  • PMP, PMI, PMBOK, CAPM, PgMP, PfMP, ACP, PBA, RMP, SP, and OPM3 are registered marks of the Project Management Institute, Inc.