In this video, we'll draw on our knowledge of basic statistical inference, in order to describe the importance of statistical inference in the context of linear regression. In order to do that, let's consider again the marketing data from the previous module. If you remember, this data had four variables and 200 units on which those variables were measured. The variables were a YouTube marketing budget, a Facebook marketing budget, a newspaper marketing budget. All of those were measured in thousands of dollars. Then there was a sales column, sales variable and that was measured in thousands of units sold. In the last module, we fit a few different linear regressions to this data. In this lesson, let's take a look at the multiple linear regression that we fit. That was the regression where sales was the response and the Facebook, YouTube, and newspaper marketing budgets where the predictors. For this video, let's focus our attention on the estimated coefficient for newspaper. Now, you'll notice that there's a small but negative impact on sales from newspaper. The formal interpretation of this estimate would be that, assuming the model is correct if you increased the newspaper advertising budget by $1,000, that's associated with roughly one fewer unit sold on average and also adjusting for Facebook and YouTube advertising budgets. Remember this value that comes out of the regression, the negative 0.001, it's an estimate of a true regression parameter. Estimates never exactly pinpoint the true value and so we should think about a few different questions related to the estimate and the true parameter. The first question we might look at is, is the negative association reflected in that negative estimate between sales and newspaper reflecting a real trend in the population? Is there something really in the population where a newspaper advertising budget is negatively impacting sales? At the population level, do we really see that impact? That's one important question. Another way to ask that question is to ask, is there really a signal in the data? Or is the estimation procedure just picking up on some noise that's in the data and giving you a value that's non-zero just based on some measurement error or some other random fluctuations on the data? That first set of questions, you could say is related to something called statistical significance. Statistical significance is about whether there's evidence to suggest that the parameter associated with newspaper in this case is statistically different from zero. Can we find that signal in the noise? Do we think that signal is really there or is everything just noise? There's a second set of questions that we could ask here. They might sound something like even if the negative association between say, sales and newspaper adjusting for the other predictors is reflecting a real trend in the population, is that trend actually important? What we have in mind here is the fact that while the newspaper budget, if you increase it by $1,000, it seems like it's impacting sales in a negative way, but only by one unit. We'd have to ask ourselves as marketing folks or, researchers in this area of business, is that an important difference? Now, from a business perspective, it seems like a bad idea to spend any money on advertising where you would get no return for it or you would get a very slight negative return. There are good reasons to believe that maybe based on this model, we should not invest any money into a newspaper advertising. We said the first set of questions are related to statistical significance. This second set of questions we might say is related to practical significance or relevance or importance, something like that. Really the goal of this unit is to focus in on statistical significance. Statistical significance can be important. If we don't have evidence that a regression parameter is different from zero, we might be warranted in treating that parameter as zero, which is really equivalent to taking the predictor out of the model. The predictor won't have any impact on the response in the model but it's also important to think about practical significance. Issues related to practical significance can be a bit tricky. We'll try to pick up with that aspect of statistical modeling later on in this course or specialization. Now in order to answer questions related to statistical significance, we'll need to understand the estimation procedure, namely least squares, at a deeper level than we've looked at so far. To do that, let's recall the fact that the least squares estimator was estimated using sample data. As such, the values that you get from the least squares procedure are sensitive to the exact sample that you have. The data that you have in front of you. What that means is that if you had a different sample, a different set of values, then you would get a different least squares estimator, even if you were taking another random sample from the same population. Let's be a little bit more precise about this idea. Suppose that you are actually able to fix each predictor at the values that are in the sample. You would draw a sample of size n. You would fix the predictors and then you would maybe draw another sample of size n at those same predictors, and another sample of size n at the same predictors. For example, in the marketing data, if our population consists of all companies that sell product P, to repeatedly sample from the population, we could fix a sample of size n and then fix the set of predictors. In this dataset, we had company one had the Facebook measurement fixed at something, the YouTube measurement fix at something, et cetera, keep those values fixed but then suppose that we could remeasure the sales for that company. Same thing for company 2, company 3, et cetera. Same predictors, but imagine we are re-sampling, so we get a different response. At a fixed value of the predictors, if we were to randomly resample from the population, we would obtain a new response value and thus when we perform least squares, we would obtain a different estimate. Now, the natural question that arises is how different will the estimate be? One thing to note is that if a structural part of the model is correct, so the linearity piece of the regression model is correct, then the least squares estimator is unbiased, which means that on average it gets the right value and that average is taken across all of the resampling of size n with fixed predictors. But the fact that something happens on average doesn't mean that it will ever actually happen, and it doesn't mean that there won't be a lot of variability in the way it actually does happen. We also have to understand how much variability there would be from sample to sample in order to get a good sense of how good the least squares estimate actually is of the true parameter values. It turns out that the amount of variability in the least squares procedure, will depend on the fixed predictor values and it will also depend on the variance term sigma squared. With a high sigma squared, the least squares estimator will be quite different from sample to sample. With a low sigma squared, the least squares estimate would be pretty similar across different samples. Now that's for a fixed sample size. If you increase the sample size, you would have a more precise estimator, which means that your estimate from sample to sample would vary less. This idea of resampling from the population gives us something called a sampling distribution and we'll actually formalize the notion of the sampling distribution in the next video.