### WIKI

Gradient descent is an iterative optimization algorithm used to find the local minima of a differentiable function, usually toward a goal of error prediction. It is often used when values can’t be easily calculated, but must be discovered through trial and error.

### Important terms related to gradient descent

Coefficient - A function’s parameter values; through iterations, it is reevaluated until the cost value is as close to 0 as possible (or good enough).

Cost - This is the function itself that is evaluated; gradient descent is used to find the minimum.

Delta - The derivative of the cost function.

Random initialization - The formulaic guessing process used to initialize gradient descent.

Local minima - When the derivative of the function is as close to 0 as acceptable.

True local minima - When the derivative of the function is equal to a perfect 0.

Learning rate -  This is a hyperparameter that controls how quickly models are adapted to a given problem.

Iterations (batch) - An indication of the number of times a gradient descent algorithm’s parameters are updated.

## Why is gradient descent important?

Gradient descent estimates error gradient within machine learning models. This helps minimize their cost function and optimize computation time to quickly deliver models and predictions. While there are other optimization algorithms with better convergence guarantees, few are as computationally efficient as gradient descent.

This efficiency enables gradient descent algorithms to train neural networks on large datasets with reasonable turnaround. Gradient descent is a simple, effective tool that proves useful for straightforward, quantitative neural network training.

## Gradient Descent vs. Other Technologies & Methodologies

Compared to gradient descent, Stochastic gradient descent is much faster and more suitable for large datasets. The gradient is not calculated for the entire dataset, but only for one random point with each iteration, so the variance of the updates is higher.

### Gradient descent vs newton method

Like gradient descent, the Newton method is ideal when finding local minima. Where the Newton method differs from gradient descent, however, is in its approach. The Newton method finds the root of a function instead of its minima or maxima.

Gradient descent is an optimization algorithm for minimizing the error rate of a predictive model with to train a dataset. Backpropagation is an automatic differentiation algorithm for calculating gradients for the weights in a neural network graph structure. These two algorithms are used in tandem to effectively train neural network models.