The last thing we want to do is to dualize our SVM problem. Well, why do we want to do that? Recall that for our SVM problem, either for perfect or imperfect separation. We are always able to solve it because it's a convex program. But it does not mean that we want to solve it directly. Recall that for linear programming, if you have a primal linear program, you always have a way to solve it but sometimes what we want to solve is dual program because sometimes it gets easier. Some researchers studied the SVM problem and that they find that the dual program is easier to be solved, so that C was that. Let's try to find the dual of the SVM problem. We of course want to do Lagrange duality. We're going to first take the Lagrangian. So let's say Lambda_i and Mu_i are Lagrange multipliers for constraints. Lambda_i are further constraints. That is talking about the imperfect separation. Previously your y_i times whatever should be greater than or equal to one minus Gamma_i. Now you use the bigger side to minus the smaller side, and that's how you get the term inside the square brackets in the Lambda_i multiply this gap. Mu_i multiplied by Gamma_i because you have a constraint saying that your Gamma_i is non-negative. That's how you do these things and get your Lagrange multipliers here. To recall you that this is a minimization problem, so what we want to do is to make sure that our Lambda and Mu should be non-negative so that whenever you really have a feasible term here, your Lambda is rewording your feasibility. That's Lagrangian. Once we have that, the Lagrange dual program is to choose Lambda and the Mu in the right way to maximize the outcome of this minimization. We know that minimization of Lagrangian always give us a lower bound, so we choose Lambda and Mu to maximize the lower bound. That's Lagrange duality. Now let's take a look at the inner program. We want to choose Alpha, Beta, and Gamma to minimize the whole thing here. Here, Alpha is just one scalar. Beta is a vector from Beta_1, Beta_2 up to Beta_n, and the Gamma is a vector Gamma_1, Gamma_2 up to Gamma_m. We need to do the first-order condition because it's necessary and sufficient is the most powerful tool we have. Actually, you may do all the derivations by yourself, but let me just try to give you some hints. For Alpha, Alpha only exists here. Alpha, this function is actually linear to Alpha. Once you do that, the only thing that remains is this guy, that's why you get this. That's for Alpha. Regarding Beta. Beta has the non-linear term here, also a term here. But once you do differentiation, your Beta goes from here, and your Beta has the coefficient here, Lambda y and x. That is how you get things here. I'm saying it very quickly, but you may want to help yourself by doing all the careful derivations. The last one is Gamma and for Gamma, it exists here and here. So once you do the differentiation, you're going to see that also here. Your C comes from here, your Lambda_i comes from here, and your Mu comes from here. You'll know how to do the differentiation. An interesting thing here is that the first and the third set of constraints, they don't have primal variable, they don't have Alpha, Beta, or Gamma. What does that mean? Whenever you are doing an optimization to try to minimize the whole thing here, you must make sure that your dual variables are chosen appropriately to make this and that happen. Otherwise, you may adjust Alpha or one of your Gamma to make the whole problem here negative infinity. Otherwise, I'll say it in another way, if your dual variable does not satisfy this and that, then your Lagrangian minimization problem will be unbounded just like what we did when we are just saying that linear programming duality and a Lagrange duality is actually somewhat equivalent for linear programming. I hope this does not seems too weird for you. Now we have two things to be satisfied by dual variables and they will be able to be used to simplify our Lagrangian. Our Lagrangian can be simplified. Once we know this and that must be true, then you will be able to take out some parts. For the coefficient of Alpha, they must be 0. That's how this term here disappear and Alpha also disappear. Also, your C term and then your Lambda, Gamma term, and then your Mu term should go away. That's how at the end you get the simplified vacation here. This and that they are the same thing. Then this particular term corresponds to only y, this guy, and your plus or this minus one gets you this. That's pretty much how you do the simplification. Then once you do that, the last thing to do is that your Beta has been optimized. Your Beta should be chosen in this way according to your x, y, and Lambda, so you may plug in this into Beta. That's how you get this and that. Now again, you get a giant formulation with a lot of summation superscripts, subscripts. After careful arithmetics , you're going to get this one. Well, I don't have time too here to do all the derivations for you because of high-score arithmetic. Please try it by yourself but it's of course not a weird thing but just do some arithmetics. Here you may see we have Lambda multiplying each other, y multiplying each other, and x multiplying each other some kind of symmetry.