Hello everyone. In this video, you will learn how to model and format seasonality in demand forecasting. We speculate that both holidays and wedding seasons may stimulate the demand in cookware. For example, during the holiday season, that is, November to December, people may buy cookware for themselves or as gifts. In wedding, people often buy cookware as a gift for the new couples. The wedding census of the US clearly indicates that the peak wedding seasons are in the summer months. Seasonality such as the holiday season and the wedding season can only choose a limited number of options or categories such as the 12 months. Thus, we should model the seasonality as categorical variables. In contrast, the continuous variables such as price and home sales can be any fraction. For seasonality, we use binary or dummy variables to indicate the month of the year. For example, we use the binary variable Feb, to indicate the month of February, where Feb equals to one if the month is February, zero otherwise. In the figure on your right, you can see how we format the Feb variable for different month-year of the data. For the February of any year, the Feb variable equals to one. For all other months, the Feb variable equals to zero. The same format applies to the binary variables of all the other months. Because we have 12 months, now the question is should we create a binary variable for each month of the year? The answer is no because of the famous N-1 rule, which says that we cannot create a binary variable for each category and we must leave some categories out as the default. For example, let's consider the gender variable, which equals one for men, zero for women. In this case, women is the default. We should not create two binary variables, one for man and another for women, which is completely unnecessary and actually creates a multi-collinearity issue. Looking at the data, we select the month January and September as default because they're neither the peak month nor the valley month for the demand. Together with the time and price variables, now we have totally 12 independent variables. Because Excel Data Analysis Add-In allows at most 16 independent variables, so we're fine. Now, let's build a multiple regression model based on all these 12 independent variables by first clicking on Data, then select Data Analysis, and then select Regression to bring up the Regression dialog box. Now, in this box, let's first select the sold units to be the input y-range. Then, let's select all the independent variables including their labels on the first row as the input x-range. Then we check the box of labels, specifying the output range, and then in the residuals box, check residuals, residual plots, and line fit plots and then click "OK". Let's now look at the outputs. In the Summary Output, we found that the R-square has increased to 94.9 percent. The coefficients table to your right shows that the time, sale price, and November variables are very significant because the p-values are smaller than 0.05. The October variable is also significant at the 0.1 significance level. All other variables are not significant. Now, let me explain the meanings of the coefficients. The coefficient of time means that the sold units increase by 15.58 units on average for every month. The coefficient of the sale price means that on average, the monthly sold units increase by 4.89 units per dollar of price drop. The coefficient of November means that, November increases the sold units by an average of 324 units relative to the default months, which are January and September. Similarly, October increases the sold units by 259 units relative to the default months. Based on the coefficients table, we can determine the regression equation. Please see the equation in a table format to your left, where the sold units equals to the intercept which is 4,855.14 plus the multiplications between the coefficients and the corresponding independent variables. The line fit plot with respect to time is shown on your right, where you can see that the model fits the data very well. The fit of the model can also be verified by the scatter plot between the predicted sold units and the actual sold units. Here are the residuals of the model. Overall, the model seems valid because the residuals meet the conditions of linearity, independence and equal variance, except that there may exist a slight curve in the residuals. In summary, our analysis shows that time, price and November have a significant impact on the demand, but the wedding season may not. Together, these variables can explain nearly 95 percent of the variation in the data. Our residual analysis shows that there might exist a slight n-shape curve in the residuals, but other than this, the model is largely valid. Now, the question is how to make a forecast for 2013 and test its accuracy?