Let's start with regression models. Regression, I hope you have learned it because that's a very typical subject or a topic in statistics. Actually, in Taiwan, in my home country, high-school students they already get some exposures to regression. I'm not going to talk about a lot of regression models, I will start with the very basic one. This is the first time for you to hear about this term regression. I guess you may still understand its meaning according to my introduction. The idea is following. Basically what we want is that we are given a set of data, x_i, y_i from 1 to n. That means for each observation, we get two numbers. One example is here. Maybe we are talking about for each house, the size and the price. We collect 12 samples here. For each sample, it is a house and we record its size and then we record its price. We get 12 samples, 12 observations and then that's n, we get n samples here. Then for each sample we get two numbers. When this is the case, we may simply draw a scatter plot to take a view at these 12 impulse. Very quickly you may see that it seems to us that there is some linear relationship between the two variables. The house size and house price very naturally they should be related. If you get a bigger house, typically it is more expensive. Maybe we believe or we try to see whether there is a linear relationship between our two variables. In that case, we're going to do something we call simple linear regression in theory or in practice. We're going to try to fit the data into one linear equation to try to describe the linear relationship. That's what we want to do in simple linear regression. Here the term simple, does not mean is easy, is simply because we are talking about just one variable and one another variable. We call this x as independent variables, and we call this y as dependent variables. When you have only one independent variables to explain that variance of the viability of your dependent variable, we call it a simple linear regression problem. If you have multiple independent variables, we call it multiple linear regression. But anyway, let's start with the simple one. When we say we want to find a linear equation to best describe the linear relationship, what does that mean? We need to somehow use mathematics to formulate that problem. More precisely, we try to find what? By two values, Alpha and Beta. Why is that? Because with Alpha and the Beta as coefficients for a linear equation, we are able to draw a line, something like this. This line is a way to describe the linear relationship. But the thing is that given the 12 points are here, how do we know whether the line should be this one or that one, or that one. You need to somehow have a way to decide which line is the best to describe the linear relationship according to your data. People developed a rule saying that for all these possible lines, we're going to try to minimize the sum of squared errors for all these data points. What does that mean? Once we have a line, then for each observation, we're able to see that if we plug in x of that observation, we're going to get a value as our predicted or estimated y. Our prediction and our true values typically have a distance. That distance is measured here. Your y_i is your true value. Your Alpha plus Beta x_i is your predicted value. You'll get the difference here, and that's the error. You then square them, does your squared error. We sum up everything here, that's the sum of squared errors. We hope to find a line to minimize the sum of squared errors. Then we say this is the best we can do. The best line, the best linear relationship we may observe. Somehow, that make sense. Some people would call this a least square estimation, least square optimization, least square approximation, whatever. Because we are trying to find the least square error. Now we know optimization. From our point of view, this is nothing but a nonlinear program. You have two decision variables one is here, the other is here, you have a square term here. This is a nonlinear program. With the nonlinear program, now we want to do several things. First is this convex program, let's trying to analyze it, and the second, how may we solve it? Is it really reasonable to formulate this problem? If solving this problem is too difficult, then this is not a good problem because it does not help us. Let's try to answer these two questions.