 In this blog, you and I are going to establish a useful framework for thinking about machine learning techniques.

Basically, this is going to be our basis for thinking about the gradient descent algorithm.

So at the most basic level what are we doing?

1. We’re doing is feeding a whole bunch of data into a computer
2. It gives us back some solution, some answer

Our Computer/Machine will learn

Our computer is actually learning the relationship in the data.

Now the question is, how is it that we can feed a whole bunch of data into our Python program and our program spits out a function that describes the relationship in this data?

What are the steps involved in how our machine learns this mathematical function?

In a very simple linear regression example,

A three-step process to arrive at their solution.

Step one is to make a prediction, predict what exactly? Well, the coefficients in our function, for example, the theta zero and theta one. Our machine is learning a function, so it has to start by predicting the coefficients in that function. Now, the very first time this happens the very first prediction is pretty much like a completely random guess.

Step two, After making the prediction, step two is calculating the error – in other words, we need to measure how good the prediction was. We need to calculate how far off we were from the data and that’s why we calculate the size of our error.

Step three is where we adjust our initial prediction. And this is the crucial part, right?

In Simple Words, we can say that,

Second, we compared our prediction to the data and now it’s time to learn from our mistakes.

Having figured out how far off we were in the previous step, we can now make a change to the coefficients.

This was only the first run-through. At this point, we’re going to go back to step one and make a new prediction.

This new prediction is going to have our modified coefficients.

So using this new prediction, we once again calculate how badly we did and calculate the error.

Hopefully, this time round the error is smaller than the first time around.

So, having measured the error and how badly we did, we adjust our prediction once again and then rinse and repeat.

Summery

So, in summary, there are three steps. Number one is to predict or infer the theta values of the function.

Number two is to calculate the error and measure how far off we were in our prediction from the data.

And Number three is making an adjustment to have a smaller error the next time around and slowly learning the best coefficients. And this is the learning process.

Now there is actually a name for this kind of step-by-step approach that we just described. This is called an algorithm. An algorithm is a set of instructions for solving a problem.