[MUSIC] So the one way to identify influencing factor is to use the correlation. And correlation is to look at the relationship between two variables, okay? How they behave against compared to one another over time? So here's an example of comparing the sales of products from different groups, so different product groups. So we have ceramic, engineering stones, glass product, home center or installation product, natural stones, others or porcelain. So this is from the renovation, such construction industry and they have product skills in these different categories. And we're interested in find out whether we can predict or we can forecast one of these products based on the sale of another. So if there's any relationship between the two? And one way to do that, so on the top, we have here is [INAUDIBLE] historical demand and these are denoted by negative. That means they've being withdrawn from the system. So a little bit of different connotation here. So then we have sales also were made from 2018 all the way to the beginning of 2020, okay? So, we have the same amount of data set for all of these different categories. And what we're interested in is just find out, can I predict are there any relationship between any of these product categories? So the first one which is done here for you and I'll explain it and then we'll do the other one together. The first one is want to see if there's any relationship between ceramic and the other product categories. So here, the formula use is just correlation. This is a correlation formula in Excel, CORREL, CORREL. And you put in the data array to data array at once. So here, I'm comparing, since I'm interested in finding out whether this relationship between ceramic and glass product. So, I put in the first array as the historical demand for ceramic and then the second one is the glass product, okay? So, I just do that. And then I fixed this, you can see the dollar signs here at the beginning in the this formula here, in blue. This means that I have fixed the data for ceramic. So it will not change. So if you see this, right, is the same, right? The B3 and AB3, that refers to the historical demand for ceramic. But the rest changes, right? So then, right now I'm on the second array, data array, refers to insulation. And then the next one is natural stones and this one's other and then the last one is porcelain. So, they are comparing and I'm comparing to set of data and once to see how they are related to each other. And the higher the number, these are represent percentages, so I can even change this, right, to percentages. So you can see that ceramic and porcelain are highly correlated, right, 99%. That means when one product, the sales one product goes up, the other one also goes up. And then if it goes down then it also goes down. So it's very, they are highly correlated and their movements can be said to be closely associated to one another, right? So the next one natural stone and even glass products are also highly correlated to the glass product. So then we can do the second one, right? So now, I'm interested in seeing if glass product, right? How does [INAUDIBLE] glass product related to the other product categories? So I do the same thing. So the first, in doing any of the Excel formulas, I just type in equal and then I type in the [COUGH] formula name. So in this case CORREL. And then he says, it tells you the hint is putting the array, right? Two arrays, array 1 and array 2. So the array 1 is always the one that we are interested in, like the base that we are comparing from. So in this case, it's glass product. So I just choose when I grow up with glass product. And then you can use the, depends on whether use Mac or PC, there's a shortcut that you can just highlight all the available data's horizontally at once. And then once I do that, I just hit a in at 4, right? Whether you are in PC or Mac, the shortcut keys at 4 to fix the array. And I just to add the comma and then, so now I'm comparing to your home center, right? So, I'll do the same thing except I won't fix the array this time, okay? So it gives me 0.8. And once I do that, the beauty with fixing the base, which in this case is the glass product, is that, I can just do drag and drop, not drop. I can just drag the formula down and then you'll keep, right? I don't have to type this all the time. You'll keep the glass product where it is, but then you'll move the rolls down, right? As I go down. So now I can see that, if I come down here, I have the data for porcelain now, right? It's the second arrays referring to porcelain. The first array is fixed on glass products, okay? So, I can do the same thing. I can just turn into percentage for ease of viewing. And then, I'll do the same thing for home center. So in this case, and also know that I didn't do the first one like I could, right? I can do the first one, but they will be [INAUDIBLE] for 100% because it's just compared to itself, okay? So, I can usually skip that and then the same thing, home center, I will skip home center and go down to the next one and repeat the formula. So this one now is home center. I'm going to fix that and then get my second array, which, compared to installation product, okay? going to do the same thing. Just drag this down and then the installation products, so insulation products compared to natural stones. Okay then, oops insulation. So that should be one before and then compared to natural stone Finish this 1 1st before I natural stones compared to the rest. Okay, I've done this. I can just track down and then all these tinted percentages. So you can see that. So now just for ease of viewing that can highlight these, right? So these are high correlation and I can see that also for glass product against natural stone is probably the best. And for home center actually these two or actually this one against natural stone and installation product process Ellen's and then natural stone with process Ellen's. So you can see that, you know now may I may have identified right? Something influencing factor. It was something that I can correlate to. That can help me right to predict better, predict the my sales of let's say ceramic. Right? All these products. I can reference the other product categories to see if they are is there any associations between the two? Okay, so that's one way correlation is one way and the other way is actually during the regression. Okay, so regression is you have independent variables and dependent variables. So my independent variable is what I'm interested in which in supply chain will be in my demand. What is my demand dependent on is of my sales of my the quantity of sales before product will be dependent on other factors. So these factors or we call it explanatory variables in the regression term. These are the independent variables right there. Their existence their values, right? Is independent of one another. But my dependent variable which is the my sales, my demand, right? Dependent on these factors. And so what before you can do a regression you need to think about will come up with the, you know, so demand is independent variable. That's obvious. That's what I'm interested in finding out and forecasting. And then but I need to go to find or identify the independent variable. My explanatory variable. So going to use even a more familiar example here. So this is the home sales. Okay, so we all understand home sales. So sales of homes, this will be my dependent variable. And we usually know that you know sell the home. The price at least I depends on the size, the size of the house or the condo the number of bathrooms. And then also the the school districts. You know that if you have a family right? People of parents who look at that as a a consideration in purchasing the homes and or their willingness to to buy, you know to pay a certain price for for a house in that neighborhood. So we can say that the sales in this case we are set up in the sales is my dependent variable. And then my explanatory which is, you know what I think can predict the sales of the price of the house is dependent on the square footage number of bathrooms and the school districts. So school this week. 11 thing is in regression you can include categorical information. So it's not just numerical. So square footage and bathroom these are numeric. That is a value that you can measure. But school district is not something you can measure is a category, right? It's good or bad or mediocre. Right? So then we can code that to say you know one is good, always bad, you know something like that. So then you code the category called data and then put it here to use in regression. So once you have set up, you know get to gather all this information. You can use the the regression tool in in Excel. And it's under data and data analysis. And I think the step is the same for I'm using a mag. So even for pc it should be the same. There is under data and there's a data analysis and within that there is the regression option. Okay? And if you don't have this you can do an add on, you can google how to do you know add the data analysis package to your Excel and it actually is actually free. But I don't think this is typically by default is added when you install Excel. So then here you just choose regression and then what you want to do is you want to identify the my input. Y So this is why it's my dependent variable. Okay? So then what you will choose is my dependent variable in this case is sales. And then my ex my ex is actually my independent variable. So I have three of them here that that identified. So I just choose that. Oops. Again so since I have a common label, so I selected a label option to tell the system to tell Excel that the first roll is actually a data label field. So I'm going to output to just to my sheet here. Okay. Do it again. So once I chose label and then become the output, the output range, I'm going to just opposite the same sheet. So for ease of viewing or you can output to a new worksheet in the workbook. That's fine. And we don't need to worry about any of these for now. Okay. And then you just hit. Okay And then Excel will do the competition and background and they will give you this output. So what we're interested in, Is there a couple of things? So first we want to look at the the regression statistics. Okay. So these numbers here in the first three are actually percentages. For ease of viewing. Let's just turn them into percentages. And this tells us how what's the relationship between my dependent and independent variables or the other way to say it is How is my, an independent variable in this case, how are square footage number of bathrooms and school districts help me predict the sale price of a home. You can see that, based on these numbers, these three are good predictors. They're highly correlated. So I have chosen the right predictors to help me, forecast was predicting the sale price of a home. But you may ask why are these three numbers? So first we will look at typically people look at these two the R square and adjusted R. But more specifically, if you wanted to be if you want to look at a statistics that's not affected by the the amount of data that you have like diabetes two for example the first to multiple R and R square. The value will go up. You add, like if you have more independent variable to the your set, but adjusted is not, it just takes care of that issue. So you would typically look at these two but more public pay more attention to the adjusted number. So 88% is not too bad. And you to is you know, pretty good. And they are I think have have done had a good set of predictors or experience with variables right to help me predict the sales of my home price. Now the other one that we wanted to make sure is that, I need to look at the significant F. And typically we will define, right? So there will be a hypothesis to say that, my initial hypothesis is that the home sale price does not or is not affected by square footage by number of bathrooms or by the school district. So that's the initial hypothesis. And my alternate would be that, it's home sale prices actually affected by, any of these or all of these three factors. And this is where the significant F tells us that. And typically we will define a value. So typically it's either your 0.1 or 0.5. Okay? And then if if the value, if the significant F falls below that value is less than the value, then we reject the null hypothesis and then accept the alternate. So again, the non hypothesis is that the sales price of a home, is not dependent on any of these variables. And the alternate is that the sales price of a home is dependent any or all of the variables. And since in this case I define my this value to be 0.1. So anything less than that, is I can reject the null hypothesis and accept the alternate. So in this case my cynical NF is pretty small, right, 0.00117036, so that's 0.1. So I can accept the noon hypothesis alternate hypothesis that the they are a relationship between my dependent and independent variables. Okay and then the next the last set of values. This is the values down here. This is what the coefficient is what the program if you ever turn this into a plot. If you ever need to plot it to forecast for homes prices based on using this model. These are the coefficient that you use as part of the regression equation. But in our case, since we're just trying to determine if the these factors influence the sales, we just look at the P value. Right? Look at the P value here and again, this is where I also define the value of 0.1. So anything less than that. I can say that these are significant and then I can accept these are good predictors. These are good explanatory variables. So since all of these are less than 0.1, right? I can say that all these three factors are good explanatory variables in predicting helping me to predict the sales of a home. So we can do this also in turn into supply chain, if you have a demand of the product and then if you go define if you have a hypothesis on how or what the explanatory variable, maybe, you can put them here and then run the regression and look at the the output. Okay look at the significant value, like the R square, the significant values and the P values to see how closely related your explanatory variables are to your dependent variables.