Perceptrons and Multi-Layer Perceptrons: The Artificial Neuron at the Core of Deep Learning
The perceptron is the basic unit powering what is today known as deep learning. It is the artificial neuron that, when put together with many others like it, can solve complex, undefined problems much like humans do.
Understanding the mechanics of the perceptron (working on its own) and multilayer perceptrons (working together) will give you an important foundation for understanding and working with modern neural networks.
In this article you will learn:
- What is a Perceptron?
- What is a Multilayer Perceptron?
- Structure of a Perceptron
- The Perceptron Learning Process
- From the Classic Perceptron to a Full-Fledged Neural Network
- Neural Networks in the Real World
What Is a Perceptron?
A perceptron is a simple binary classification algorithm, proposed by Cornell scientist Frank Rosenblatt. It helps to divide a set of input signals into two parts—“yes” and “no”. But unlike many other classification algorithms, the perceptron was modeled after the essential unit of the human brain—the neuron and has an uncanny ability to learn and solve complex problems.
A perceptron is a very simple learning machine. It can take in a few inputs, each of which has a weight to signify how important it is, and generate an output decision of “0” or “1”. However, when combined with many other perceptrons, it forms an artificial neural network. A neural network can, theoretically, answer any question, given enough training data and computing power.
What Is a Multilayer Perceptron?
A multilayer perceptron (MLP) is a perceptron that teams up with additional perceptrons, stacked in several layers, to solve complex problems.
The diagram below shows an MLP with three layers. Each perceptron in the first layer on the left (the input layer), sends outputs to all the perceptrons in the second layer (the hidden layer), and all perceptrons in the second layer send outputs to the final layer on the right (the output layer).
Each perceptron sends multiple signals, one signal going to each perceptron in the next layer. For each signal, the perceptron uses different weights. In the diagram above, every line going from a perceptron in one layer to the next layer represents a different output.
Each layer can have a large number of perceptrons, and there can be multiple layers, so the multilayer perceptron can quickly become a very complex system.
The multilayer perceptron has another, more common name—a neural network. A three-layer MLP, like the diagram above, is called a Non-Deep or Shallow Neural Network. An MLP with four or more layers is called a Deep Neural Network.
One difference between an MLP and a neural network is that in the classic perceptron, the decision function is a step function and the output is binary. In neural networks that evolved from MLPs, other activation functions can be used which result in outputs of real values, usually between 0 and 1 or between -1 and 1. This allows for probability-based predictions or classification of items into multiple labels.
Structure of a Perceptron
The perceptron, or neuron in a neural network, has a simple but ingenious structure. It consists of four parts, illustrated below.
The Perceptron Learning Process
A perceptron follows these steps:
1. Takes the inputs, multiplies them by their weights, and computes their sum
Why It’s Important
The weights allow the perceptron to evaluate the relative importance of each of the outputs. Neural network algorithms learn by discovering better and better weights that result in a more accurate prediction. There are several algorithms used to fine tune the weights, the most common is called backpropagation.
2. Adds a bias factor, the number 1 multiplied by a weight
Why It’s Important
This is a technical step that makes it possible to move the activation function curve up and down, or left and right on the number graph. It makes it possible to fine-tune the numeric output of the perceptron. For more details see our guide on neural network bias.
3. Feeds the sum through the activation function
Why It’s Important
The activation function maps the input values to the required output values. For example, input values could be between 1 and 100, and outputs can be 0 or 1. The activation function also helps the perceptron to learn, when it is part of a multilayer perceptron (MLP). Certain properties of the activation function, especially its non-linear nature, make it possible to train complex neural networks.
For more details see our guide on activation functions.
4. The result is the perceptron output
The perceptron output is a classification decision. In a multilayer perceptron, the output of one layer’s perceptrons is the input of the next layer. The output of the final perceptrons, in the “output layer”, is the final prediction of the perceptron learning model.
From the Classic Perceptron to a Full-Fledged Deep Neural Network
Although multilayer perceptrons (MLP) and neural networks are essentially the same thing, you need to add a few ingredients before an MLP becomes a full neural network. These are:
- Backpropagation—the backpropagation algorithm allows you to perform a “backward pass”, which helps tune the weights of the inputs. Backpropagation performs iterative backward passes which attempt to minimize the “loss”, or the difference between the known correct prediction and the actual model prediction. With each backward pass, the weights move towards an optimum that minimizes the loss function and results in the most accurate prediction. For more details see our guide on backpropagation.
- Hyperparameters—in a modern neural network, aspects of the multilayer structure such as the number of layers, initial weights, the type of activation function, and details of the learning process, are treated as parameters and tuned to improve the performance of the neural network. Tuning hyperparameters is an art, and can have a huge impact on the performance of a neural network. For more details see our guide on hyperparameters.
- Advanced structures—many neural networks use a complex structure that builds on the multilayer perceptron. For example, a Recurrent Neural Network (RNN) uses two neural networks in parallel—one runs the training data from beginning to end, the other from the end to the beginning, which helps with language processing. A Convolutional Neural Network (CNN) uses a three-dimensional MLP—essentially, three multilayer perceptron structures that learn the same data point. This is useful for color images which have three layers of “depth”—red, green and blue.
Neural Networks in the Real World
We hope this article has given you a basic understanding of the most basic unit of a neural network. In the real world, perceptrons work under the hood. You will run neural networks using deep learning frameworks such as TensorFlow, Keras, and PyTorch. These frameworks ask you for hyperparameters such as the number of layers, activation function, and type of neural network, and construct the network of perceptrons automatically.
When you work on real, production-scale deep learning projects, you will find that the operations side of things can become a bit daunting:
Running experiments at scale and tracking results, source code, metrics, and hyperparameters. To succeed at deep learning you need to run large numbers of experiments and manage them correctly to see what worked.
Running experiments across multiple machines—in most cases neural networks are computationally intensive. To work efficiently, you’ll need to run experiments on multiple machines. This requires provisioning these machines and distributing the work.
Manage training data—the more training data you provide, the better the model will learn and perform. There are files to manage and copy to the training machines. If your model’s input is multimedia, those files can weigh anywhere from Gigabytes to Petabytes.