A multilayer artificial neuron network is an integral part of deep learning. And this lesson will help you with an overview of multilayer ANN along with overfitting and underfitting. Not just that, by the end of the lesson you will also learn:

  • Analyze how to regularize and minimize the cost function in a neural network
  • Carry out backpropagation to adjust weights in a neural network
  • Inspect convergence in a multilayer ANN
  • Explore multilayer ANN
  • Implement forward propagation in multilayer perceptron (MLP)
  • Understand how the capacity of a model is affected by underfitting and overfitting

Understanding Single-layer ANN

Perceptron rule and Adaline rule were used to train a single-layer neural network.


Weights are updated based on a unit function in perceptron rule or on a linear function in Adaline Rule.

History of Multi-layer ANN

Deep Learning deals with training multi-layer artificial neural networks, also called Deep Neural Networks. After Rosenblatt perceptron was developed in the 1950s, there was a lack of interest in neural networks until 1986, when Dr.Hinton and his colleagues developed the backpropagation algorithm to train a multilayer neural network. Today it is a hot topic with many leading firms like Google, Facebook, and Microsoft which invest heavily in applications using deep neural networks.

FREE Machine Learning Certification Course

To become a Machine Learning EngineerExplore Course
FREE Machine Learning Certification Course

Multi-layer ANN

A fully connected multi-layer neural network is called a Multilayer Perceptron (MLP).


It has 3 layers including one hidden layer. If it has more than 1 hidden layer, it is called a deep ANN. An MLP is a typical example of a feedforward artificial neural network. In this figure, the ith activation unit in the lth layer is denoted as ai(l).

The number of layers and the number of neurons are referred to as hyperparameters of a neural network, and these need tuning. Cross-validation techniques must be used to find ideal values for these.

The weight adjustment training is done via backpropagation. Deeper neural networks are better at processing data. However, deeper layers can lead to vanishing gradient problems. Special algorithms are required to solve this issue.


In the representation below:


  • ai(in) refers to the ith value in the input layer
  • ai(h) refers to the ith unit in the hidden layer
  • ai(out) refers to the ith unit in the output layer
  • ao(in) is simply the bias unit and is equal to 1; it will have the corresponding weight w0
  • The weight coefficient from layer l to layer l+1 is represented by wk,j(l)

A simplified view of the multilayer is presented here. This image shows a fully connected three-layer neural network with 3 input neurons and 3 output neurons. A bias term is added to the input vector.


Forward Propagation

In the following topics, let us look at  the forward propagation in detail.

MLP Learning Procedure

The MLP learning procedure is as follows:

  • Starting with the input layer, propagate data forward to the output layer. This step is the forward propagation.
  • Based on the output, calculate the error (the difference between the predicted and known outcome). The error needs to be minimized.
  • Backpropagate the error. Find its derivative with respect to each weight in the network, and update the model.

Repeat the three steps given above over multiple epochs to learn ideal weights.

Finally, the output is taken via a threshold function to obtain the predicted class labels.

Forward Propagation in MLP

In the first step, calculate the activation unit al(h) of the hidden layer.


Activation unit is the result of applying an activation function φ to the z value. It must be differentiable to be able to learn weights using gradient descent. The activation function φ is often the sigmoid (logistic) function.


It allows nonlinearity needed to solve complex problems like image processing.

PCP in AI and Machine Learning

In Partnership with Purdue UniversityExplore Course
PCP in AI and Machine Learning

Sigmoid Curve

The sigmoid curve is an S-shaped curve.


Activation of Hidden Layer

The activation of the hidden layer is represented as:

z(h) = a(in) W(h)

a(h) =

For the output layer:

Z(out) = A(h) W(out)

A(out) =

Acelerate your career in AI and ML with the Post Graduate Program in AI and Machine Learning with Purdue University collaborated with IBM.


New age technologies like AI, machine learning and deep learning are proliferating at a rapid pace. And if you wish to secure your job, mastering these new technologies is going to be a must.

This tutorial covered everything about multilayer artificial neural networks. However, if you wish to master AI and machine learning, Simplilearn’s PG Program in Artificial Intelligence and machine learning, in partnership with Purdue university and in collaboration with IBM, must be your next stop. Together with Purdue’s top faculty masterclasses and Simplilearn’s online bootcamp, become an AI and machine learning pro like never before!

About the Author

Mayank BanoulaMayank Banoula

Mayank is a Research Analyst at Simplilearn. He is proficient in Machine learning and Artificial intelligence with python.

View More
  • Disclaimer
  • PMP, PMI, PMBOK, CAPM, PgMP, PfMP, ACP, PBA, RMP, SP, and OPM3 are registered marks of the Project Management Institute, Inc.
  • *According to Simplilearn survey conducted and subject to terms & conditions with Ernst & Young LLP (EY) as Process Advisors