An Overview
How They Work and What Are Their Applications
Which Neural Network Is Right for You?
What You Should Remember
Concepts and Models
The Artificial Neuron at the Core of Deep Learning
Bias Neuron, Overfitting and Underfitting
Optimization Methods and Real World Model Management
How to Build One in Keras & PyTorch
Concepts, Process, and Real World Applications
Is it the Right Choice?
Process, Example & Code
Uses, Types, and Basic Structure
How to Choose?
Regression models have been around for many years and have proven very useful in modeling real world problems and providing useful predictions, both in scientific and in industry and business environments. In parallel, neural networks and deep learning are growing in adoption, and are able to model complex problems and provide predictions that resemble the learning process of the human brain. What’s the connection between neural networks and regression problems? Can you use a neural network to run a regression? Is there any benefit to doing so?
The short answer is yes—because most regression models will not perfectly fit the data at hand. If you need a more complex model, applying a neural network to the problem can provide much more prediction power compared to a traditional regression.
In this article we’ll explain the pros and cons of using neural networks for regression, and show how to easily scale and manage deep learning experiments using the MissingLink deep learning platform.
Regression analysis can help you model the relationship between a dependent variable (which you are trying to predict) and one or more independent variables (the input of the model). Regression analysis can show if there is a significant relationship between the independent variables and the dependent variable, and the strength of the impact—when the independent variables move, by how much you can expect the dependent variable to move. The simplest, linear regression equation looks like this:
Suitable for dependent variables which are continuous and can be fitted with a linear function (straight line).
Suitable for dependent variables which are best fitted by a curve or a series of curves. Polynomial models are prone to overfitting, so it is important to remove outliers which can distort the prediction curve.
Suitable for dependent variables which are binary. Binary variables are not normally distributed—they follow a binomial distribution, and cannot be fitted with a linear regression function.
An automated regression technique that can deal with high dimensionality—a large number of independent variables. Stepwise regression observes statistical values to detect which variables are significant, and drops or adds co-variates one by one to see which combination of variables maximizes prediction power. Image source: Penn State University
A regression technique that can help with multicollinearity—independent variables that are highly correlated, making variances large and causing a large deviation in the predicted value. Ridge regression adds a bias to the regression estimate, reducing or “penalizing’ the coefficients using a shrinkage parameter. Ridge regression shrinks coefficients using least squares, meaning that the coefficients cannot reach zero. Ridge regression is a form of regularization—it uses L2 regularization (learn about bias in neural networks in our guide).
Least Absolute Shrinkage and Selection Operator (LASSO) regression, similar to ridge regression, shrinks the regression coefficients to solve the multicollinearity problem. However, Lasso regression shrinks the absolute values, not the least squares, meaning some of the coefficients can become zero. This leads to “feature selection”—if a group of dependent variables are highly correlated, it picks one and shrinks the others to zero. Lasso regression is also a type of regularization—it uses L1 regularization.
ElasticNet combines Ridge and Lasso regression, and is trained successively with L1 and L2 regularization, thus trading-off between the two techniques. The advantage is that ElasticNet gains the stability of Ridge regression while allowing feature selection like Lasso. Whereas Lasso will pick only one variable of a group of correlated variables, ElasticNet encourages a group effect and may pick more than one correlated variables.
Artificial Neural Networks (ANN) are comprised of simple elements, called neurons, each of which can make simple mathematical decisions. Together, the neurons can analyze complex problems, emulate almost any function including very complex ones, and provide accurate answers. A shallow neural network has three layers of neurons: an input layer, a hidden layer, and an output layer. A Deep Neural Network (DNN) has more than one hidden layers, which increases the complexity of the model and can significantly improve prediction power.
Neural networks are reducible to regression models—a neural network can “pretend” to be any type of regression model. For example, this very simple neural network, with only one input neuron, one hidden neuron, and one output neuron, is equivalent to a logistic regression. It takes several dependent variables = input parameters, multiplies them by their coefficients = weights, and runs them through a sigmoid activation function and a unit step function, which closely resembles the logistic regression function with its error term. When this neural network is trained, it will perform gradient descent (to learn more see our in-depth guide on backpropagation ) to find coefficients that are better and fit the data, until it arrives at the optimal linear regression coefficients (or, in neural network terms, the optimal weights for the model).
The logistic regression we modeled above is suitable for binary classification. What if we need to model multi-class classification? We can increase the complexity of the model by using multiple neurons in the hidden layer, to achieve one-vs-all classification. Each classification option can be encoded using three binary digits, as shown below. For the output of the neural network, we can use the Softmax activation function (see our complete guide on neural network activation functions ). The Softmax calculation can include a normalization term, ensuring the probabilities predicted by the model are “meaningful” (sum up to 1). This illustrates how a neural network can not only simulate a regression function, but can also model more complex scenarios by increasing the number of neurons, layers, and modifying other hyperparameters (see our complete guide on neural network hyperparameters ). To summarize, if a regression model perfectly fits your problem, don’t bother with neural networks. But if you are modeling a complex data set and feel you need more prediction power, give deep learning a try. Chances are that a neural network can automatically construct a prediction function that will eclipse the prediction power of your traditional regression model. Stay tuned for part 2 of this article which will show how to run regression models in Tensorflow and Keras, leveraging the power of the neural network to improve prediction power.
Running traditional regression functions is typically done in R or other math or statistics libraries. To run a neural network model equivalent to a regression function, you will need to use a deep learning framework such as TensorFlow, Keras or Caffe, which has a steeper learning curve. As we hinted in the article, while neural networks have their overhead and are a bit more difficult to understand, they provide prediction power uncomparable to even the most sophisticated regression models. It’s extremely rare to see a regression equation that perfectly fits all expected data sets, and the more complex your scenario, the more value you’ll derive from “crossing the Rubicon” to the land of deep learning. When you get your start in deep learning, you’ll find that with only a basic understanding of neural network concepts, the frameworks will do all the work for you. Specify the parameters and they’ll build your neural network, run your experiments and deliver results. However, as you scale up your deep learning work, you’ll discover additional challenges:
Tracking progress across multiple experiments and storing source code, metrics and hyperparameters. Neural networks require constant trial and error to get the model right and it’s easy to get lost among hundreds or thousands of experiments.
Running experiments across multiple machines—unlike regression models, neural networks are computationally intensive. You’ll quickly find yourself having to provision additional machines, as you won’t be able to run large scale experiments on your development laptop. Managing those machines can be a pain.
Manage training data—depending on the project, training data can get big. If you’re processing images, video or large quantities of unstructured data, managing this data and copying it to the machines that run the experiments can become difficult.
MissingLink is a deep learning platform that does all of this for you, and lets you concentrate on becoming a deep learning expert. Learn more to see how easy it is.
The most comprehensive platform to manage experiments, data and resources more frequently, at scale and with greater confidence.
The most comprehensive platform to manage experiments, data and resources more frequently, at scale and with greater confidence.
MissingLink is the most comprehensive deep learning platform to manage experiments, data, and resources more frequently, at scale and with greater confidence.
Request your personal demo to start training models faster