Okay, let's continue. So that was about regression, and now we're moving to a next subject is called support vector machine. So let's start our introduction. Unlike regression, for support vector machine is talking about a problem that is similar but different. It is talking about classification problem, of course, is a very important subject in Machine Learning. So intuitively, what's this? Is that we give you a set of data or data set, okay. So you get again some independent variables and an outcome variable or dependent variables, okay. So to make the explanation and deprivation easier, I'm choosing a different notation, different from the regression part. So here, n is the number of variables you have, and the m is the number of, how should I say that, there are points you have, okay. You will see why I should use n to be the number of variables and m to be the number of data points. Because that's going to give you a formulation, that looks similar to your friendly, to your familiar non-linear programs, linear programs, whatever. Okay, coming back to the formulation. So you get several properties or several features, and that may be helpful for you to do a determination about whether your yi, should be 1 or negative 1. Here 1 and negative 1 means a success and failure, means boys or girls, means a or not a, means, death or birth or whatever. So pretty much you are trying to do, some kind of classification of each data points. So there are, of course, many examples or applications. For example, today there are a lot of applications talking about image recognition, okay. So you are given, picture and you get some kind of entities there. So, for example, you get something like this. So I guess everybody knows what this, this is a pig. All right, so how we know this is a pig? There are several different kinds of models, Bharara. But anyway, classification problem is that whenever you get a photo, determine whether there is a pig there or there is no pig there. All right, so pretty much that's classification. And then you will get several features from this picture like, for example, all the RGB values for each pixel, whatever, and then those are features. So, let's focus back to theory or the abstract formulation. So we want to find a classifier to assign data point i, to a class either pig or no pig, according to xi. Okay, we want to read this information, and we will assign this data point to a class. And once we do this assignment, we get some kind of yi hat, we determine its class, and then we check whether we are doing right or wrong. We want to somehow minimize the total number of classification error. That's about classification in general, okay. So general classification can be, I shouldn't say it's difficult, is having a lot of possibilities, so that's too simple. That's too linear classification. Okay, so what do we mean by linear classification? Let's start with an example here? Suppose you only have two features of each data point. And here blue means one, green means negative one or the other way as far as you like. So now we are trying to do linear classification, means we should draw a straight line to separate blue dots and the green dots, okay. So a straight line means something like this, okay, or you may do something like this, something like that. So there are many, many different straight lines that may do the work, okay. So this is a two variable example, because it's easier to do visualization. Of course in practice, you may have a plane in the three dimensional problem, or a hyperplane in higher dimensional problems. We are just using a line to make the visualization easier, okay? So as we mentioned, there are infinitely many lines that may do the separation. We somehow need to tell our computer to our model, one way to choose the best line. So we need to define best. How do we do that? Well, some people say that line, that is the farthest from both groups. Okay, so that means we want to find a line, for example, this one. Okay, once we have a line, then for each group, we may see what's the distance from each point to a line, from point to the line, point to the line. And for each group, there will be a point that is closest to the line, we try to propose, then there is a distance, okay? From another set there is also a point that is the closest. So for all the points, there is a distance to the line and we try to maximize that distance, that minimum distance, okay? So in that case, if we draw a line this way, okay, as I printed here, then the distance is the length of this red slice segment. About this I'm going to draw another line. If I draw the blue line, then you may see that for the minimum distance, now it's just the blue line segment, and that minimum distance is not as large as the red one. So this blue proposal is not very good, or how about this? I'm going to draw another line like this one. Then again, even though this line is far from the blue one, but it's too close to the green one, so it's also not good. So that's the idea of choosing the separation hyperplane. Try to make it as far as from all those points, okay. So once we agree to do that, technically we may define several terms. First we get two dashed lines here, they are called supporting hyperplanes, one for each group. So you may check the definition for supporting hyperplanes. But anyway, given a set, the supporting hyperplane is a plane that separate all the set to one side of it, and it's the closest to that group. Okay, so I hope you agree, and that is indeed these dash lines. And then the two points x1 and x2, they are called support vectors. Okay, somehow they define the hyperplanes, the supporting hyperplanes. So the name support vector machine is a model, or you may say it's an algorithm whatever, is the model that try to find the best separating hyperplane. Which may be expressed in general in this way, alpha plus beta transpose x equals 0, the typical formula for a linear function. Okay, so that's the idea, later, let's see how to do the formulation.