## Neural Networks for Regression (Part 1)—Overkill or Opportunity?

Regression models have been around for many years and have proven very useful in modeling real world problems and providing useful predictions, both in scientific and in industry and business environments. In parallel, neural networks and deep learning are growing in adoption, and are able to model complex problems and provide predictions that resemble the learning process of the human brain. What’s the connection between neural networks and regression problems? Can you use a neural network to run a regression? Is there any benefit to doing so? The short answer is yes—because most regression models will not perfectly fit the data at hand. If you need a more complex model, applying a neural network to the problem can provide much more prediction power compared to a traditional regression. **In Part 1 of this article you will learn:**

- What is regression analysis and common types of regressions
- What is a neural network
- How a neural network can be used to mimic and run any regression model
- When should you use neural networks to run regression models
- Running regression with neural networks in real life

**In Part 2 of this article [coming soon]:**

- How to run neural networks mimicking regression models in Tensorflow
- How to run neural network regressions in Keras

## What is Regression Analysis?

Regression analysis can help you model the relationship between a dependent variable (which you are trying to predict) and one or more independent variables (the input of the model). Regression analysis can show if there is a significant relationship between the independent variables and the dependent variable, and the strength of the impact—when the independent variables move, by how much you can expect the dependent variable to move. **The simplest, linear regression equation looks like this:**

- y → dependent variable—the value the regression model is aiming to predict
- X2,3..k → independent variables—one or more values that the model takes as an input, using them to predict the dependent variables
- [beta]1,2,3..k → Coefficients—these are weights that define how important each of the variables is for predicting the dependent variable
- [error] → Error—the distance between the value predicted by the model and the actual dependent variable y. Statistical methods can be used to estimate and reduce the size of the error term, to improve the predictive power of the model.

## Types of Regression Analysis

Linear Regression

Suitable for dependent variables which are continuous and can be fitted with a linear function (straight line).

Polynomial Regression

Suitable for dependent variables which are best fitted by a curve or a series of curves. Polynomial models are prone to overfitting, so it is important to remove outliers which can distort the prediction curve.

Logistic Regression

Suitable for dependent variables which are binary. Binary variables are not normally distributed—they follow a binomial distribution, and cannot be fitted with a linear regression function.

Stepwise Regression

An automated regression technique that can deal with high dimensionality—a large number of independent variables. Stepwise regression observes statistical values to detect which variables are significant, and drops or adds co-variates one by one to see which combination of variables maximizes prediction power. Image source: __Penn State University__

Ridge Regression

A regression technique that can help with multicollinearity—independent variables that are highly correlated, making variances large and causing a large deviation in the predicted value. Ridge regression adds a bias to the regression estimate, reducing or “penalizing’ the coefficients using a shrinkage parameter. Ridge regression shrinks coefficients using least squares, meaning that the coefficients cannot reach zero. Ridge regression is a form of regularization—it uses L2 regularization (learn about bias in neural networks in our guide).

Lasso Regression

Least Absolute Shrinkage and Selection Operator (LASSO) regression, similar to ridge regression, shrinks the regression coefficients to solve the multicollinearity problem. However, Lasso regression shrinks the absolute values, not the least squares, meaning some of the coefficients can become zero. This leads to “feature selection”—if a group of dependent variables are highly correlated, it picks one and shrinks the others to zero. Lasso regression is also a type of regularization—it uses L1 regularization.

ElasticNet Regression

ElasticNet combines Ridge and Lasso regression, and is trained successively with L1 and L2 regularization, thus trading-off between the two techniques. The advantage is that ElasticNet gains the stability of Ridge regression while allowing feature selection like Lasso. Whereas Lasso will pick only one variable of a group of correlated variables, ElasticNet encourages a group effect and may pick more than one correlated variables.

## What Is a Neural Network?

Artificial Neural Networks (ANN) are comprised of simple elements, called neurons, each of which can make simple mathematical decisions. Together, the neurons can analyze complex problems, emulate almost any function including very complex ones, and provide accurate answers. A shallow neural network has three layers of neurons: an input layer, a hidden layer, and an output layer. A Deep Neural Network (DNN) has more than one hidden layers, which increases the complexity of the model and can significantly improve prediction power.

## Regression in Neural Networks

Neural networks are reducible to regression models—a neural network can “pretend” to be any type of regression model. **For example, this very simple neural network, with only one input neuron, one hidden neuron, and one output neuron, is equivalent to a logistic regression**. It takes several dependent variables = input parameters, multiplies them by their coefficients = weights, and runs them through a sigmoid activation function and a unit step function, which closely resembles the logistic regression function with its error term. When this neural network is trained, it will perform gradient descent (to learn more see our in-depth guide on **backpropagation** ) to find coefficients that are better and fit the data, until it arrives at the optimal linear regression coefficients (or, in neural network terms, the optimal weights for the model).

### A More Complex Model

The logistic regression we modeled above is suitable for binary classification. What if we need to model multi-class classification? We can increase the complexity of the model by using multiple neurons in the hidden layer, to achieve one-vs-all classification. Each classification option can be encoded using three binary digits, as shown below. For the output of the neural network, we can use the Softmax activation function (see our complete guide on **neural network activation functions** ). The Softmax calculation can include a normalization term, ensuring the probabilities predicted by the model are “meaningful” (sum up to 1). This illustrates how a neural network can not only simulate a regression function, but can also model more complex scenarios by increasing the number of neurons, layers, and modifying other hyperparameters (see our complete guide on **neural network hyperparameters** ). To summarize, if a regression model perfectly fits your problem, don’t bother with neural networks. But if you are modeling a complex data set and feel you need more prediction power, give deep learning a try. Chances are that a neural network can automatically construct a prediction function that will eclipse the prediction power of your traditional regression model. *Stay tuned for part 2 of this article** which will show how to run regression models in Tensorflow and Keras, leveraging the power of the neural network to improve prediction power.*

## Regression with Neural Networks in Real Life

Running traditional regression functions is typically done in R or other math or statistics libraries. To run a neural network model equivalent to a regression function, you will need to use a deep learning framework such as TensorFlow, Keras or Caffe, which has a steeper learning curve. As we hinted in the article, while neural networks have their overhead and are a bit more difficult to understand, they provide prediction power uncomparable to even the most sophisticated regression models. It’s extremely rare to see a regression equation that perfectly fits all expected data sets, and the more complex your scenario, the more value you’ll derive from “crossing the Rubicon” to the land of deep learning. When you get your start in deep learning, you’ll find that with only a basic understanding of neural network concepts, the frameworks will do all the work for you. Specify the parameters and they’ll build your neural network, run your experiments and deliver results. However, as you scale up your deep learning work, you’ll discover additional challenges:

**Tracking progress across multiple experiments** and storing source code, metrics and hyperparameters. Neural networks require constant trial and error to get the model right and it’s easy to get lost among hundreds or thousands of experiments.

**Running experiments across multiple machines**—unlike regression models, neural networks are computationally intensive. You’ll quickly find yourself having to provision additional machines, as you won’t be able to run large scale experiments on your development laptop. Managing those machines can be a pain.

**Manage training data**—depending on the project, training data can get big. If you’re processing images, video or large quantities of unstructured data, managing this data and copying it to the machines that run the experiments can become difficult.