We are in the middle of about half-hour, 45 minutes of statistics. Actually it's more because we'll keep going back to it after we've introduced the concept. But I'm also assuming, by the way, if you have never done statistics before, you would have, before watching the next step, made yourself very comfortable with calculating means and standard deviations, which includes calculating variance before we reach to relationships. The reason I'm saying that is because I've given you a note on the side which goes through numbers. I shouldn't spend time in the video going through numbers doing mean and variance. I believe that is taking up more valuable time doing talking about things that are I think more important. Means and variances are not difficult. Having said that, please do examples and the easier number-crunching in your assessment. Now, I think I tried to convince you a little bit about why relationships matter. If you're not fully convinced, believe me after we've done about portfolio theory, in fact, by the end of next week, you will see that that's the only thing that matters, not just in life. Relationships not just matter in life, they matter in finance. No wonder I love it. Let's start with relationships. There are multiple ways you listen, but two fundamental things about relationships is this: how do you measure them? Suppose I have y over here and I have x over here. As soon as I have relationships, I have two. At least two. I'm going to call this y. You'll see in a second why I'm calling this y and x. Because in statistics, there is a fundamental structure and symbols which are very similar used in almost every textbook, and y and x is very common. Let's make y now. What is bushels of corn? It's a measurement unit of corn. Let's call this x inches of rainfall. Again, I'm doing an example which you can relate to, whether it's important to finance or not, because that's the beauty of statistics. I sometimes feel like statistics may go slightly ahead of finance in the love hierarchy of things. I can't believe I said it. But anyway, that's the way. Now, why are you interested in this relationship? You're interested in this relationship simply to figure out whether rainfall has impact on the production of corn, one, and what is the nature of the impact. The reason why we are interested in relationships in a portfolio in the finance context will become more apparent once we have started doing the financial application. First thing you do is calculate the average value, y bar. Second thing you do here is calculate the average value, x bar. Another way of saying the x bar and y bar are the values you expect to happen. On average, how many bushels of corn are you expected to produce? On average, how many inches of rainfall are you expected to get? We can throw in numbers there if you want. 120, 60, whatever. I'm sure just 60 inches of rainfall may be a lot or little. I don't have a clue. You're not interested in the averages. Standalone, you are, but now what you are asking is the following question: how are these two related? The very simple way of thinking about is the following. Ask yourself when rainfall is below the average. Remember, here are the probabilities. When rainfall is below the average, what the heck is corn doing? Let's assume that for this particular data point, corn it happens to be above its average. For this data point, how would you measure this deviation? It'll be x_i minus x bar is the amount measurement of this. Then you multiply that by y_i minus y bar. In this case, what is x_i minus x bar? It happens to be positive. But accidentally, or maybe somehow the y_i minus y bar is negative. But when you do most of these, what will you tend to see? That this tends to be positive on this side. What will you do? You'll then sum all these up and you'll get what is called covariance. What it's measuring is, multiplied by the probabilities, of course, what it's measuring. What is the tendency of corn to be when rainfall is off its normal behavior? Corn, off its normal behavior too. I took a perverse example here where rainfall was positive, corn became negative. That could happen if there's excess rainfall. However, on average, what do you expect this to be? This number to be greater than zero. Why? Because hopefully, rainfall helps the production of corn. Maybe if I took rice here, and I'm just out of my league, I'm talking about agriculture, maybe the relationship is more strong, positive because it needs more water. But you understand what I've measured. I have measured when I deviate from the average in this, how does this deviate exactly for that situation? Point-by-point, that's what the probabilities are about. This is the fundamental measure of relationships, and it's called covariance. How do you covary with something? Let's assume every time I turn, come on on Live, and I have a smile on my face, you also automatically have a smile on your face or you tend to smile. We have a positive covariance going on. On the other hand, if you dislike my guts for some reason and every time I smile, you frown, then maybe we don't have something positive going on. What I'm saying is you can use the same phenomena and a normal is no smile at all. Covariance captures that relationship, but there's a tragedy with covariance and that is two things are wrong with it, and I'll just emphasize that in a second. Let me just start off with a clean. Covariance has one issue. One issue is magnitude is not communicative. Let me take a corn example. It's called Sigma_yx. Covariance of y with x is this summation Pi, y_i minus y bar, x_i minus x bar. It's very important to take deviations from normal behavior. That's why I said when I smile, what is my normal behavior? No smile. That's [inaudible] and the probabilities are in there. Look, what will happen to this? Right now I had this in bushels, if I believe, and I had this rainfall in inches. Could I change this number? Suppose this number was 55, whatever, can I make this number larger just without changing anything dramatically fundamental? Answer is yes. Start measuring your rainfall instead of inches in millimeters. What will happen? This number will become big because they're a lot more millimeters in an inch. It doesn't have magnitude doesn't tell me anything. The only thing that's good is it's telling sign is okay. But magnitude is not reflective of strength or weakness. The second tragedy with it is this; it is unit dependent. What are the units of covariance? The units of the covariance are the units of both the things being measured. When we do return analysis, both the returns are measured in percentage, so it's not a big deal, but I wanted to show you why statistics is so awesome and why you have to deal with things which are more difficult than in finance. Bushel inches, what is it supposed to mean? An ideal measure of relationship shouldn't have units. How can I compare bushel inches with, say, the productivity of people and whether they have had a school education or not? The units are totally different. Relationships should be comparable across different phenomena. Here's the tragedy of covariance, though it's trying to measure the right thing, its magnitude doesn't mean much and its unit dependent. What did we do? What do statisticians do? I have very little about this to talk about that's important to us, but this one is extremely important. We took covariance of y and x, which units was what? Bushels per inches, and magnitude was what? Didn't reflect much but sign was okay. In other words, whether it's positive or negative relationship is being reflected because you're taking deviations from the mean and then multiplying. What did we do? We came up with a measure called correlation. What is correlation? Correlation simply take Sigma yx. Remember what are the units? Then does what? Standardizes it by Sigma y, which is the standard deviation of y. We know how to measure it and Sigma x. What does this do? This does two things for us. What is the units of this? Remember, this is bushels inches. What is the unit of this? Bushels. What is the unit of this? Inches. What happens? This becomes unit free, and the second thing happens, which is a little bit, I'm not going to go into, a correlation which is written like this, y, x is between minus 1 and plus 1. You see the awesomeness of correlation is taking covariance, retained its value of positive, negative sign being important, got rid of its issues of the units by dividing by the standard deviations of the two and creating a number that can be compared across different phenomena. Let's take a break again and I'll come back with one last statistical and an extremely important statistical concept called regression, which is needed in finance and almost in any other discipline. Again, a way of capturing relationships, but an important one for us. See you.