We've discussed a lot of the foundation of how to pretty much start a model. Whether we want to predict or have some inference. Either way, we need to start with some base-level model. Now we want to know, we chose a model, how accurate is that model? Is it a good model? If I have data that's here, here, here's maybe X_i. Doesn't really matter, this is a general case. Something like this. This is my data, these are my data points. For some X_i, I have the response variable. We can see that generally and intuitionally it's telling us that as X_i increases, Y will increase, but at a slower speed. The slope won't be quite one. Maybe it will be something like this. This is just a hypothetical example, toy example I just generally made up. But we want to know what's the quality of fit of my model. Here is my model, here is my F. We want to quantify how close is my predicted response. This is my estimate for F. How close this response is to the true response, which is actually the data point, the y. We want to take Y_i, which is, let's say for this case, it's just, each data point in terms of Y, and we want to subtract my estimated value at the X_i point. For this point right here, Y_i is this. This is Y_i, let's just call it the first point. That's Y_1, in this case, my estimate evaluated at X_1, so at X_1. But I want to see where at X_1 it hits my estimated function. This is actually F hat of X_1 right here. How close is my model, my F hat estimated function? How close is this to resembling the actual data? We need a way to quantify that quality of fit. One way to do that is to take the mean squared error. Here, the difference between the true value and my estimated value, we'll call the error. That's the error part, that's this part right here. Mean Squared Error, so we need to square the error. We don't want to have to deal with negative and positive here, because a big negative error would cancel out a big positive error, and you'd be under the impression that your model is amazing, when in reality your model is actually very bad. You just have a lot of negatives canceling out a lot of positives. We don't want that. It's very bad to have a large negative error, so we want that to show. It's very bad to have a large positive error, we want that to show. We square it to deal with this. It does two things; it gets rid of the negative and the positive errors canceling out. That's one thing that the squared term does. That's nice. The second thing that it does which is very nice, is that it amplifies errors that are very large. If your error is a two, two squared will give us four. A not too crazy of an increase. If your error is 10, 10 squared will give us 100. That is a massive increase from my original 10. From 2-4 to small bump, because I started with a small error. From 10-100, is a very large error, because I started with a large error. It amplifies errors, which we want, because we don't want to be that far away from our true value at any given point. We have our square error. Then if you sum up a bunch of things and then divide by the number of things, you're going to get the average or the mean. It's a good way to think about decomposing this into what actually is represented by a mean square error or mean square. We take the mean or the average of the error square. That's what's happening here. In general, it's just talking about these deviations that occur between the actual data and my estimated function. These are the errors, and we want to find when we square these errors and take the average of them, we of course, want the smallest mean squared error. For a mean squared error is the smallest, that means that these deviations between the real values and our estimated values were very small. That's what we want. We want an F, an estimated function that's very close to the data.