[MUSIC] Let's extend our understanding of these statistical methods to develop a forecast of website visits based on social media mentions. A scatter plot simply plots each observation of our two variables, social media mentions and website visits, on a chart, with social media mentions, our independent variable, on the x-axis, and website visits, our dependent variable, on the y-axis. It's a simple chart, and forms the basis of our simple linear regression-based forecast. Let's create the plot now by selecting both columns of the dataset. Next, we'll click the Insert tab on the ribbon and click the scatter icon. From the list of scatter plot options, let's select the first plot. Excellent, let's cut the chart and paste it back in our problem working sheet. Let's also resize it slightly. Now let's add a few elements to our chart which will assist in our forecast. First, let's add a linear trendline. This linear trendline is the result of linear regression on the dataset. We will do this by clicking on the chart, and then clicking the Design tab on the ribbon. Next, we'll click Add Chart Element and select Trendline from the dropdown menu. Lastly, we'll select linear to add a linear trendline or regression plot to the chart. We see now that Excel has drawn a trendline across the scatter plot of our data. This is the best fit line, or the line which minimizes the distance from it to each of the data points, on average. Let's pause for a moment and observe the trend. Consistent with what we've calculated in problem number 1A, we see that, in general, increases in social media mentions result in increase web sites visits. We know this because the trendline is sloping upward to the right. If we knew the equation of this trendline, we could further describe the average relationship between social media mentions and website visits. Excel makes this simple for us to calculate and display on the chart. Let's right-click on the trendline and select format trendline. In the format pane, click the checkboxes next to Display Equation on chart, and also Display R-squared value on chart. You'll notice that this matches the R-squared value that we previously calculated. Let's explore the equation of the trendline to better understand the relationship between social media mentions and website visits. Notice that the y-value is our dependent variable, website visits, and that the x-value is our independent variable, social media mentions. We can read this equation then as for every 1x, or social media mention, we observe 23 website visits, plus some additional constant value, which is not directly explained by increases in social media mentions. This constant value may be the number of website visits we receive due to other marketing efforts. With approximately 23 visits for every social media mention, it would seem prudent for us to focus some of our marketing efforts on social media. Does this all seem a bit familiar? It should. You'll recall from algebra that the 23 in our equation is the slope of the trendline. If we want to calculate the slope of the trendline without creating a chart, Excel offers its slope function. Slope requires two parameters, in this case, the value of our two variables in our dataset. It's important to note that slope requires the known y's, in our case, website visits, as its first parameter. Let's parameterize slope with website visits and social media mentions, and set it to work. We see that we've calculated the same slope as the trendline equation above. Isn't that great? Remember the constant value in the equation above? This is the intercept of the trendline. Said differently, this can be thought of as the website visits we would obtain with 0 social media mentions, as it is the value of the equation when we set x to 0. We can calculate the intercept using Excel's intercept function. Similar to slope, intercept requires website visits as its first parameter, with social media mentions as its second. Let's set it to work and calculate the intercepts, or constant term, in our regression equation. We can explain the resulting 2,140.54 value as the number of websites visits we observe without any social media mentions at all. This constant term is not explained by changes in the independent variable. How can we use what we have calculated thus far to produce a forecast of the number of website visits we should expect if we have 100 social media mentions? How about 1,500 social media mentions? It should be intuitive by now that we could look up these values for social media mentions on the trendline and determine that corresponding website visits' value. Let's do this formulaically with Excel's forecast function. Forecast requires, as parameters, the value of the independent variable, social media mentions in our case, then our datasets, starting with the dependent variable first. Let's parameterize forecast and predict how many website visits we could expect with 100, and then 1,500 social media mentions, respectively. We see from our regression model that we could expect approximately 4,500 website visits for 100 social media mentions, and approximately 37,500 visits from 1,500 social media mentions. Congratulations, you now understand how to use simple linear regression to develop a statistical forecasting model. A few notes before we close out this module. It's important to understand that our regression model suggests that increases in website visits tend to happen with increases in social media mentions, not that they happen because of increases in social media mentions. We could also, with care, use this model to forecast website visits for levels of social media mentions not previously observed in our dataset. Now it's your turn to demonstrate your understanding of these concepts in problem number 2B, and the culminating exercise for the week. Enjoy. [MUSIC]