So then the Lagrangian dual program can be formulated in this way. So now we choose lambda and the mu to try to do a maximization for the Lagrangian dual program. And here our objective function is something we just derived. And our lambda and mu, should be non negative. Also, they should satisfy the two sets of equations we just mentioned. Otherwise the inner problem would become unbounded. Okay, so with all of this, now we have a Lagrangian dual program. Maybe some of you are not so familiar with this notation. Pretty much Xi is a vector. So it's transpose is a role vector and extract is another column vector. So that's how they may do in their product. Or, if you like, you may say it in this way. So it's a summation k from 1 to n, and this is k and k. That's also just equivalent, okay? So pretty much, now we have our dual program. Maybe we want to do some more observation and further simplify this program. Because you see that mu only exist here and here. What does that mean? That means your mu is only saying that your lambda cannot be greater than C as long as you're lambda does not greater than C. You always have a way to adjust your mu to make this happen. That's how you get our final version of the dual program is here. Though objective function is the same. But now you only have one set of variables, lambda. Your lambda should make this equality hold and it should be within zero and the C. So your primal program is constrained. Nonlinear program. Your dual program is also a constrained nonlinear program. Why the dual program seems to be easier to be solved? I don't have a way to prove it here, but you may take a look at the number of variables and the constraints. For your dual program you have lambda, okay? For lambda, you have long down from one to m. So you have m variables. And also you have one constraint here, m stream here and m constraint here. So you have one plus 2m constraints. If you recall your primal problem, for your primary problem, for variables you have alpha. You have beta and you have gamma, okay? For beta is for each variable you have beta, and for each data point you have gamma. So the number of primal variables is one plus m plus m. So that's a larger number. And in many practical classification problems, you are a number of features m is a huge number. Also, there can be a big difference. Also regarding constraints for your primal problem, you have two sets of constraints, right? Why is saying that gamma should be non negative? The other one is saying that your separation you're imperfect separation is bra bra bra. So both are for i and then that's how you get 2m primal constraints. Maybe you would feel that the constraint number seems to be somewhat just the same. But don't forget that for the dual program, most of the constraints are in a very simple format. They are just having, one kind of side constraints for your variable lambda, okay? But on the other hand, for your primal problem, you have this y times alpha plus beta transpose Xi greater than or equal to one. This function is not very complicated, but at least is complicated than lambda i. Unless they want to c, right? So 0, two 0, from this perspective, maybe you are also convinced that the dual program seems to be simpler than the primal program. If that's the case, then is really useful. Because if you have some understanding about machine learning or experience or if you are going to do that in the future. You know, for machine learning problems, we always train models and we hope that data volume to be as large as possible. So you really need to have a quick way to solve this optimization problems. So it would be nice if the dual program is easier to be solved than the primal problem. And I hope this is not a proof, but maybe this can somewhat convince you. But the best thing we need to do is to look at the objective function, okay? Maybe you still remember that for our primal problem. For the primal problem the objective function is actually very simple, either we have this or we also have C times gamma, i something like this, okay? Somehow this is not a complicated formulation and can be easily considered as convex. But how about the dual objective function? The dual objective function looks scary, right? Even though the theory tells us that if your primal is convex, then you're Lagrangian dual must also be convex. But that's try to prove this is really complex.