In this video, we'll define prediction and explanation, and we'll try to differentiate between the two. As often is the case in applied data science work prediction and explanation are often conflated, so we'll try to work with these concepts separately and contrast them. Often a data scientist will fail to think through and articulate the goal of the modeling process beforehand, and then will make concrete modeling choices. For example, they'll build a model that's better for prediction when the goal was really to provide an explanation that will really impact the way that the model will perform for the data science project. In order to try to avoid making this mistake, we should try to answer the question, what really is an explanation, and what is a prediction? How might these two concepts differ? Philosophers have thought a lot about how to define an explanation and a prediction and how they might differ. Many of them have settled on the idea that an explanation really has two different parts. Now the first part, you might call an explanandum, which is a Latin term, and it really refers to a set of propositions describing the thing being explained. For example, if sales of product P have sharply declined, analysts working for a company selling product P might be looking for an explanation. The explanandum is the proposition sales of product P have sharply declined. Now the other part of an explanation is sometimes called the explanans, another Latin word. The explanans is the set of propositions that does the explaining. An explanation for a sharp decline in sales of product P would include a set of propositions that describes the causes, conditions, or circumstances that brought about that decline. Now, explanatory modeling is an explanans that makes use of statistical models for testing and providing causal explanations. For example, we may have certain hypotheses about the decline in sales of product P, and those might be related to advertising budgets. If we have data on different advertising budgets for the sale of product P, then we can use that data to fit different models as we've done already in the course, and we could test different causal explanations. For example, we might test the explanation that a decline in the Facebook budget cause the decline in sales. Now more broadly, we might think about explanation as arising in a few different scenarios. One might be a data analysis problem where the parameters in the regression model would have physical meaning. One example of that is Hooke's law, which we mentioned earlier in the course of basic law about the physics of springs. In such cases, an explanation might be relatively clear. For example, in Hooke's law, the spring constant K characterizes the stiffness of the spring and it really directly corresponds to some physical properties of the spring. Now, in other cases, second case might be relatively easy to conduct an experiment to assign a causal explanation for a given parameter in a model. Often experiments need to be designed very carefully, if there are human subjects involved in the experiment, we should think about the ethical implications of doing that experiment and get the right approvals. We'll talk more about experimental design in the second course in the specialization on ANOVA experimental design. A third scenario in which you might do explanatory in modeling would be in an observational study, not a study where you conduct an experiment, but one where you just observe data. Observational studies require special care and certain assumptions in order to make causal claims about them. It is possible to come up with a model for observational data and an additional set of assumptions and then give that model a causal explanation, but that's something that should be done with some care and with some training in causal modeling. Let's move on to prediction. A prediction is really an empirical consequence of a theory, and it's an empirical consequence that is yet to be verified at the time that the theory was constructed. It really has something to do with empirical data, data that you could collect, but after you fit your model, basically after you construct your theory. Really a predictive model is a statistical model used to provide a value, or as we'll see, a range of values of the response in the model based on values of the predictors that were not used in training or fitting the model. One example might be that we want to predict the price that Amazon will charge for say, the book Lila by Marilynne Robinson, one of my favorites. Let's suppose Lila is not in our original Amazon dataset, it's not in the dataset that we've studied so far, imagine we want to predict the Amazon list price from the book's original list price. Maybe that original list price is $17. We could also use as a predictor the number of pages. The book, Lila contains 271 pages. When we're constructing predictive models, we care less or maybe not even at all, about causal relationships between, say, the number of pages the book has, the list price, and the Amazon price. Maybe it's not the causal connection. Rather, we really only care about whether the model makes accurate predictions. Does it do a good job at predicting future data, or data that we haven't seen and used, to train the model? Now, these two concepts seem similar, so we should try to come up with some examples about how they differ. Here's a silly example. Consider a tree that's 15 meters tall, and imagine it casts a shadow on the flat ground of X meters in length. Now, why is the shadow X meters long, and in addition to them, what is X? How long is the shadow? Well, if we think about the sun having a certain angle from the ground, suppose, maybe, the angle of the sun is 45 degrees. Then there's a general law of trigonometry that says the tangent of 45 degrees is equal to one, and tangent of 45 should be equal to the length of the shadow divided by the length of the tree. From that relationship, we can solve for X, the length of the shadow, and it turns out that it's 15 meters. In addition, that could really explain why the shadow has a length of 15 meters long. The reason for that is because it seems like the general law about tangent of 45 degrees equal to 1. That seems like an explanation. It seems like the reason why the shadow is 15 meters long, is because of the tree was 15 meters tall, and because the angle of incidence was 45 degrees. There seems to be a causal relationship there. Now, contrast that with a slightly different case. Suppose that we know the length of the shadow is 15 meters, but we don't know the height of the tree. Let's call that Y. Now we could show that the tree is 15 meters tall if we have the same angle of incidence at 45 degrees, by using the same law of trigonometry. Namely, the tangent of 45 degrees would be equal to 15 over Y, and then we can solve and get Y equal to 15, because we know the tangent of 45 is one. Now, this calculation can be reasonably construed as a prediction. We could say we're predicting that the tree is 15 meters in height, and maybe we could be wrong, based on measurement error. But it would seem weird to say that this is an explanation for why the tree is 15 meters tall. The reason for that is because there's no causal relationship between the length of the shadow and the height of the tree. The height of the tree, the reason why the tree is 15 meters tall, has something to do with biology and environmental factors. It doesn't have to do with the length of the shadow, and the angle of incidence for the rays of light. There's this asymmetric relationship between explanation and prediction. Now consider another example, maybe a bit more statistical, and it's that of Hooke's law. Remember that Hooke's Law says that F, the force, is equal to k, the spring constant, times x, the displacement. We have a spring, and we're exerting a certain amount of force, and based on that spring constant, we see a certain amount of displacement on the spring. Now suppose we had a force of F equals 9 dine on this given spring, and it moves, say, three centimeters. Well, why does this happen? What's the explanation? Well, I think a reasonable explanation is that the spring has certain known properties, and those known properties are summarized by the spring constant k, in this case, it would be equal to three, plugging into that formula. Those properties of the spring summarized through the spring constant, really cause the force and displacement relationship. Again, there's a causal relationship here, and that's why we have an explanation. But now, again, imagine a slightly different scenario, where we don't have the spring constant for the given spring. Well, we could predict it using Hooke's law, just by rearranging Hooke's law, and we could get a prediction of k is equal to 3. But the fact that we have exerted a force of F equals 9 dine on the spring, and that it moved three meters, that doesn't really explain the spring constant k. The spring constant is really explained by these physical properties of the spring, and as a result of those physical properties, you have the force-displacement relationship. Again, there's a sort of asymmetric relationship between explanation and prediction. In this module, we'll explore that relationship a bit. We'll see that models used for explanation are a bit different than models used for prediction. For example, we'll have a difference between a confidence interval for the average value of the response given some predictors, and a prediction interval for a particular value of the response, given some predictors, we'll explore this relationship a bit so that you can get a better sense of the purpose of regression modeling.