Well, that was good, but there's still one thing that we have not solved. In practice, actually, perfect separation in many cases is impossible. Let's take a look at this example. In practice, when someone gives you a dataset, is possible that you are blue dots and the green dots, they are somewhat mixed. For example, maybe, this is the age of your customers, maybe this's their monthly salaries, and the blue ones means they are buying a lot, and the green ones means they are buying a little bit. But there are always some weird guys in data's perspective that their features is similar to one group, but their behavior is tied in the other group, that makes sense. Technically now may you draw a line to separate these two groups? Pretty much that's impossible. Now is impossible for us to do perfect separation. We must get some classification error. Somehow that means in our previous formulation, the formulation is infeasible. Previously we say, let's try to find Alpha and Beta to make these satisfied for all points. But now this is impossible. In this case, what people do is that, we now need to allow errors. But once we allow errors in this particular constraint, we're going to add the degree of errors into the objective function. Again, somewhat similar to some Lagrangian relaxation, although it's different. Let's see how is this. Given a separating hyperplane, in this way, ideally for all the data points, we should get this satisfied. If this is violated, that Gamma I, to be the degree of violation. What does that mean? That means if your Y_i times Alpha plus Beta transpose X_ i cannot be greater than one. that means it's smaller than one. Then that's try to fix the thing by having one minus Gamma and the right-hand side. Once we have that, then this constraint is done, and we put that greater than or equal to, back. Now, for all the constraints, is fine to require the inequality to be greater than or equal to. Because you are allowed to choose Gamma I to fix the problem for the points that cannot be perfectly separated. Of course, this Gamma should be non-negative, and this Gamma should also somewhat be minimized. We take the sum of all these Gamma to your objective function, and now you have a C here. This C is a given parameter which controls the, how severe the penalty should be imposed to this separation errors. You may wonder how we people determine C. C is a parameter for a machine learning guys to optimize, to avoid over-fitting or whatever. Or so in machine learning there are some people talking about how to choose a, right C or a good C. But here, let's ignore it from the optimization perspective. What do we need to do is to make sure that this formulation is nice in the solvable. Again, this is a convex program. Adding Gamma here does not change the fact that all the constraints here are linear. You have a linear constraints. Also this is a linear guy, so your objective function is still convex. This is still a convex program, you are allowed to ask someone to solve it, this is reasonable formulation. Why we mark, I want to say it here is that, well, you see that now we have summation of this error terms. But if you remember, what do we mentioned about linear regression, about loss or reach to whatever, you may ask, well, why don't we square the error terms Gamma? There are several reasons. First, your Gamma here is non-negative,. You don't really need to square it. You don't need to worry about whether positive terms cancel negative terms for regression you need to worry about it before SVM, you don't. That's one reason. Another reason is, always when we have choice about how to, for example, add error terms, how to determine whether these line is better than that line. There are some artificial designed. The model is designed artificially, designed by person. Then we choose the model carefully to make sure that it is not too difficult to be solved. Your model must somehow make sense for practical purpose. But it should also be solvable as easily as possible. We are always doing this. Why SVM is a good model? Because it makes sense. That outcome is really useful and also is not too difficult to derive the outcome. That's why these are good model in general.