So let's try to formulate the support vector machine problem. And you may, of course, predict we're going to formulate it as some kind of optimization problems. Okay, so it takes some time, so let's do it one by one. But first, we're going to say that this is a classifier, okay? Alpha plus beta transpose x is a classifier. So for each x, we're going to plug in that into alpha plus beta transpose x to see what's the resulting value. We're going to say that how about this? We're going to say a point is blue if that result is greater than or equal to 1. Or we're going to classify it as green if that corresponding number is less than or equal to -1. So you may wonder why we don't set, for example, 0 at the right hand side. If you want to have 0 as your right hand side, can you do that? Of course you can, because somehow the point after the plugging is either positive or negative, right? So it should still do the classification. But as long as we are able to choose alpha freely, then it doesn't really matter whether we put the right hand side as 0 or any positive number, actually, because we may do scaling from alpha and beta. So you may choose 1, -1, k, -k, 10, -10, whatever. So whenever we say we are using 1 as the right hand side, we are not really limiting ourselves. The model is still general enough, okay? So for convenience, for some example that you will see very quickly, we're going to choose 1 as our standard or the criteria, okay? And we may always scale alpha and beta to make sure that no point is going to lying inside the two dash lines. Which means no point is going to make the value be within 1 and the -1. Okay, so that's our first step. And then once we have that, one thing that we should observe is that if we want to maximize the minimum distance from all the points to the proposed line. That's equivalent to saying that we want to maximize the distance between the two supporting hyperplane, okay? So it's this one. And the good thing is that we know there are two support vectors, right? So for vector 1 and vector 2, the difference is here. So pretty much, what we need to do is to do a projection to do a projection of x1 minus x2. Project that to the normal vector of the separating hyperplane. So the separating hyperplane is this 1, the normal vector is some vector that is perpendicular to your separating line. And we all know from high school mathematics that's better, okay? For example, if you have a line, let me make it easier. If you have a line here, which is your x2 is exactly 2, what's this? One-half of x1, okay? Then you can write it as one-half of x1 minus x2, it should equal to 0, okay? And then you, of course, know that one normal vector here is one-half and then -1, okay? So that's pretty much, of course, one-half, -1 is here or the other way. So we project this x1 minus x2 to beta. And it doesn't really matter whether beta is long or short in this way or that way. That's all the same. So we just need to have a way to do this projection. Well, this is something that you have learned in linear algebra or management mathematics or whatever or engineering mathematics. So very quickly, let's do some review. So how to do a projection in general of vector to another vector graphically is here. So one vector is here, we want to do the projection. So let's say the outcome of the projection is a vector called a1, then naturally a1 and the w is just different as a constant multiplier. Let's call it alpha 1, okay? So let's say the projection a1 is alpha 1 times w, let's do some derivations. So we know that for this particular equation we may get two vectors at both sides. So for each vector, we may take its norm, and the norm here, just a review, is this thing, which is pretty much the length of this vector, okay? So if two vectors are the same, there norm, of course must be also the same. And then we also know that for a norm of a1, we also know that it is this guy. So a1 is here, okay? If we're talking about this norm, then this is the length of the green line segment. And for a is another line segment, okay? The norm of a is the length of this red line segment. And according to your high school mathematics, you get this a formula regarding your cosine function. So now you have two equations regarding the norm of a1, that means your alpha 1 should be exactly this. You simply take your norm of w to the denominator, okay? So that's how you first get a resulting result here. And it then follows that we just need to do some arithmetics or a1 is here. a1 is alpha 1 times w, and alpha 1 has just been derived, okay? And also from your high school or linear algebra mathematics, you're cosine theta is always expressed in this way. It's the two vector in their product and the numerator and their distances as the denominator multiplying each other, okay? So once you do this, then these two can be put together. That's how you get your a1. So for a1, for the projection you now have a formula. Don't forget, this is the result and your product is a scalar number. This is the length, this is the square of the length. So that's how you really get a number as your coefficient for w, okay? So once you have it, now it's very simple to do the remaining things. You'll simply take the norm for a1 because you want this distance, all right? And then you take the norm here, you cancel one of them, and that's how you get this result. So now the length of the projection of a onto w is this formula, right? So this is our a, this is our w. And once we apply this formula, we may express the outcome as this one. So now we are clear about our objective. What we want to do is that we want to minimize this distance, and this distance in general can be expressed in this way. We want to find alpha and beta to minimize the distance we just derived, which is this one, okay? And of course, alpha and beta is doing separation. So the constraints is that for each point, we want to say for the blue point, why are all positive or 1? We should get alpha plus beta transpose x greater than or equal to 1. And for those green guys, yi is -1 when you want to have alpha plus beta transpose x to be less than or equal to -1. So to make both things happens together, why don't you do this? You are going to require the following. You're going to say yi multiplying the alpha plus beta transpose xi should be greater than or equal to 1. But it doesn't really matter how you put your yi to be at the left hand side or right hand side, whatever. But as long as you have this particular thing, then you have exactly expression for your optimization problem. You choose alpha and beta to satisfy this for all your points. That means you are doing the correct separation. And among all those alpha beta that can do the correct separation, you choose the one that can maximize the distance you desire. That's the formulation.