Hello everyone. In this video, you will learn the model validation and improvement techniques. Specifically, the residual analysis. A residual or an error of the model is the difference between the observed value y and the predicted value of y hat, which represents the variation not explained by the model. There are two possible causes for a residual. First, the residual is a purely random noise that is unpredictable and cannot be eliminated. Second, the residual is the result of errors in the model. Either, model misses some important variables that may significantly affect the dependent variable y or the model has the wrong form. For example, the model should be non-linear rather than linear. If the first cause holds true, then the model is valid. If the second cause holds true, then the model is not valid. To validate the model and to find out which cause led to the residuals, we can conduct a residual analysis. The key idea is that if we can explain all patterns in a data, then the residuals should be purely random and thus, satisfy the following four conditions: First, linearity, which means that the residuals are randomly distributed. Second, independence. That is, the residuals are independent over all observations. Third, normality. The residuals are normally distributed. And fourth, equal variance. That is, the residuals have constant variance over all the observations. If all these conditions are met, then the residuals are purely random and the regression model is valid. Otherwise, there might be distinct patterns in the residuals which means that the model either misses some variables or has the wrong form. Thus, the residual analysis can provide important insight on how to improve the model. To explain conditions, let's look at some examples. For linearity, the figure on your left shows what a purely random noise may look like. The figure on your right shows a non-random pattern, which indicates that the hypothesized linear model is not valid, and we may need a non-linear model. For independence, the figure on the left shows that the residuals are independent over time. The figure on your right shows that the residuals alternate cyclically. That is, a positive residual is always followed by a negative residual and vice versa, which means that the residuals are correlated over time. Regarding normality, we can draw a histogram of the residuals. If the histogram looks like the figure on your left, that is, a bell-shaped histogram, then the residuals are normally distributed. Otherwise, the residuals are not normally distributed, like this example on your right. Please note that a moderate departure from normality is generally not problematic. Finally, regarding equal covariance, the figure on your left shows a constant variability of the residuals for different x values. However, the figure on your right shows that the variability of the residuals is increasing as x increases. Thus, the variance is not equal and the model may not be valid. Now, let's come back to our case. Here are the residuals of the trend model on AK demand data. Can you please comment on whether the residuals meet the conditions of linearity, independence, and equal variance? Obviously, the variance of the residuals is increasing over time. As you can see, these dots in red circle and a general increasing trend of variability in the residuals, which implies that the equal variance condition is not met. In addition, the residuals may have some periodical patterns, which needs to be confirmed by more analysis. In summary, the residual analysis shows that the variability of residuals is increasing. So the model may miss some important variables. The potential variable missed could be some environmental variables such as price, home sales, and so on.