I promised you that today would be conceptual and I think I'm delivering on the promise. I also told you that unlike in accounting where I did things that were explicitly focused, in statistics I can't do that. I have to give you a background of why we are using what we are using. Having said that, what I'm covering is not that difficult to do but I hope before you came here, like you did before you started measuring covariance and correlations, you went and looked at the note, reviewed the formulas, practiced some examples in your homework, in your assessments and now are joining me for regression. Regression is used in every discipline that I know of. My brother's a doctor, he uses it in his data analysis. The people in finance use it all the time. I almost feel like this was created for finance. People in fields which never thought of using statistical analysis now do. Because it's everywhere, data is everywhere. One of the reasons it's become quite simple to do this is because we have the horsepower, the machine power to execute. By the way, the only caution about statistics is this; it is because it's so powerful like finance, it is also susceptible to misuse. Now I'm going to introduce you to a topic that can take an entire week. It is called regression. I'm going to show you why and how it's used in finance later, but for now just use an example that you still would like or understand in the real world and a different one every time, so let's start. This time, let me write out the regression equation on top. It is written in textbooks like this. Actually, let me just remove this. Beta X_i. This is what you'll see in every book. I've removed this subscript because it's not needed here. This is what a regression is. The first thing when you stare at it, it should strike you as this: are there any squares in there? Are there any cubes in there? No. A regression, the most commonly used version of it is linear and linear relationships are the most simple to understand. It turns out, and please remember what I am saying now, it turns out that if x and y have a joint distribution, meaning their correlated phenomena are occurring jointly. If they are normal jointly, the only relationship between them is linear that's possible. Isn't that really profound? If you have a normal distribution, the only measure of average behavior is the mean. What is the only measure of uncertainty of a single thing? Variance. Similarly, the only relationship that can happen between two things is a linear relationship. This is so powerful and therefore underlies a lot of the way finance works. Of course, it's dependent on an assumption like everything else. This is linear. Now let me draw the visual version of this. What is this? This is y and this is x and let's, for simplicity, call this advertising and let's call this sales. I'm just taking different examples because I don't want you to think that we are doing this just for the heck of finance, no. Finance rarely relies on statistics but statistics has much broader uses, so I want to explain it to you regardless of your background. Supposing you're working for a company, let me ask you this, do you expect a relationship there? If you were sure there wouldn't be any relationship, would it be good to spend money on advertising? Probably not, but if I asked you who's the best advertisement for your service or your product, I hope you would say the service or the product itself. In fact, that's why I love Apple. Apple basically doesn't believe in advertising because it believes the product it gives you is something you never wanted, but now are dying to have, and therefore everything is built into the product. That's the genius, anyways. Let's do y and x. What do you get? For each I, you draw a dot and you get something like, this is called a scatter plot because that's what it looks like. Scatter plot. I love these words. Statistics, I love also because it says what it means. There's no BS in statistics, and there's a lot of BS in life, anyways. What is y? These are all the possible dots. What is this straight line trying to do? It's going to try to fit a relationship between the two. What you notice is that there is an intersection point here and there's a slope of this. I have drawn this line and now let us talk about regression. What I want to do right now is tell you the formulas for things. You can go figure it out. I'm here just to show you how regression measures stuff. Let me ask you the first question here. What the heck is Alpha? Seriously, what is that Alpha? That first term, after y naught equals Alpha? Well, let's stick with the sales and advertising problem. Give you a second. You could say it's the intercept. Turns out you're right. It is the intercept. On which axis? The y-axis. But telling me it's an intercept is like throwing jargon at me. Tell me in English. As soon as you said English, you should have figured out what I'm trying to ask you. As soon as you said Alpha, the intercept. What I'm asking you is, tell me in English, what is Alpha supposed to be? An Alpha is that amount of sales that happens regardless of advertising. Hopefully, it's positive because if it's zero, and negative sale is tough to visualize, but supposing it's zero, then you know what I mean. You may have to work too hard to sell it. Anyway, Alpha is the amount that requires naught. Makes sense, right? X is zero. Let me ask you this. What is Beta over there? What is Beta? What is this guy? Turns out that if you have measured sales and advertising in percentages or in changes, and I would recommend very strongly doing that because there's a phenomenon called spurious or artificial relationship, which happens when you measure things levels, especially in economics. Why? Because levels tend to increase just because of inflation. That's obviously common to both, but it's not what you're after. Let's assume sales is changes, advertising is changes, and the good news in finance is we are after returns, which are obviously changes. What does Beta reflect? Beta reflects the slope. By the way, if you say slope in Excel, it'll do Beta for you. Guess what it'll ask you. As soon as you say equals Beta, it'll ask you, what are your Ys? Where are your Xs? Are you with me? Betas says, in this change context, if x changes by one percent, how much Y changes by obviously. If x changes by one percent, so it's telling you the sensitivity of sales to advertising in this example. Suppose Beta is zero, what will you do? You'll stop advertising or become much more creative in advertising, and you're hoping it's positive. That's why you see the slope I drew was positive. That's Beta. What is beta? Beta has to be measuring the relationship between x and y and I promised you that I wouldn't do it, but let me do it. I'm going to rewrite just the equation, y_i equals alpha plus beta x_i plus epsilon i, and I'm going to make them changes, because that's the best way to measure things. What does the change in x do to change in y? It turns out beta is equal to covariance of change in y, change in x, divided by variance of change in x. Does the numerator look familiar? Sure. That's the basic sign of the relationship. The only difference between this and correlation is I'm standardizing by the variance of x instead of the standard deviations of the two. So beta can be any number. In other words, it can be negative, positive, greater than one, and so on, whereas correlation was had to be between minus one and plus one, so beta is another way of managing or measuring relationships. Let me ask you what this is and don't tell me error, because if you say error, you're talking to me in Japanese. I mean, sorry. I mean you know what I mean. You're talking to me in a language I don't know. Japanese is an awesome language. I meant you're talking to me and I'm not understanding what you're talking about. Assuming this is the error from the line, this dot, this dot, this dot, this dot, this dot, and on average, what should that be? Zero, because if on average the errors are not zero, there's something pattern there that I should either pick with x or with some other variable that would explain y, but the way I say what epsilon is very simple. Epsilon is my level of ignorance about the relationship, about not the relationship, but what drives y. The more ignorant, the more the error will matter, so if advertising affects sales and I can come up with other things that could measure the effectiveness or the changes in sales, my errors will become smaller, or the variance in those errors will become smaller, and that's what the goal of a good model is, but not randomly. Let me ask you this. You will run the relationship with this being, this is the last point I'll make and then we'll take a break, sales and advertising. The point I'm trying to make here is that one variable is on the right-hand side and one variable is on the left-hand side. What dictates which variable goes on the left, on the right? Theory, concept, thinking. You cannot put advertising on this side and sales on the right-hand side. Most people will say sales is determined by advertising more likely than advertising is determined by sales, so that kind of thinking has to go before you run regressions. Let's take a break here. What we do next is we'll spend 15-20 minutes trying to motivate with the data the notion of portfolios, and show you some data and then what we'll do is we'll get into the details of diversification, and finally come up with a measure of risk. Let me ask you, what do you think it is called given that based on relationships? What do you think the measure of risk in finance is called? It's called beta, because in the end you'll see the only thing that matters is relationships and that's simply because we are risk-averse and we don't put all our baskets in one thing. See you next. See you in a little while. Bye now.