I’ll implement Gradient Descent algorithm on a linear regression. First, we need to set an objective equation.

Objective Equation
Since the algorithm seeks to minimize error, we need an error equation. I’ll use Sum of Squared Error or SSE as the objective equation.


Let’s suppose we have this data set:

Next, we supply the starting points, in this case, it is \(m\) and \(\beta\)

We then can write the first for loop()  to calculate the SSE where \(m = 0\) and \(\beta = 0\)

The loop will loop through all observations in the data. Then assign the error to \(sse\). Then it will sum up to \(sum sse\) When it is done; it will put in a storage dataset to store \(\beta\),\( m\), and \(sum sse\).

Up until this point, it is the usual \(SSE\) calculation. Now is the bread and butter of gradient descent: Partial Derivatives. Partial derivatives are to calculate the derivative of a function (equation in this case) with more than one variable. So, it works perfectly in our case.

$$\frac{\partial}{\partial m}=\frac{2}{N}\sum_{i=1}^{N}-x_{i}(y_{i}-(mx_{i}+\beta))$$

$$\frac{\partial}{\partial \beta}=\frac{2}{N}\sum_{i=1}^{N}-(y_{i}-(mx_{i}+\beta))$$

Paul’s Online Math Notes website gives a very detailed explanation of Partial Derivative (link).

First, we calculate the derivatives, then assign the values to gradient values. After that, we will reassign the new value to \(m\) and \(\beta\). Another important variable in the calculation is the learning rate \(\alpha\). The value will control how much we will move down the slope.

After we get the new value of \(m\) and \(\beta\), we then recalculate the \(SSE\) as many times as we want.

So, to put all of that in a function.

Let’s give it a try.