All right, welcome back, so last time we did generative models. And we introduced the concept of what a generative model is, what it does, how it works. And even the formula of how to figure out what you want. Now let's talk about we're going to talk about three generative models but the first two are going to be LDA and QDA, linear discriminate analysis. Eventually we'll explain why it's linear and quadratic discriminate analysis QDA. And will describe why it's quadratic, it'll make sense at the end, the final lesson that deals with LDAQDA. So just like when we started linear regression and logistic regression, we're going to start with the simple case. So we're going to start with one predictor, so in general we're dealing with P is only one and one predictor, one response. Now, when we spoke about generative models, we said that there was some F sub K of X, that describes the distribution. If that didn't make sense last time it's only going to get more complicated. So make sure that last lesson on generative models make sense, make sure the math makes sense. Make sure you have an understanding of even the beginnings of base theorem, what a prior is what a posterior is, okay coming into this. So, we had that form from last time we're still using that same form except where we had F sub K of X. For LDA and QDA we assume that F follows a normal distribution. Okay now for LDA, we're also assuming that sigma one, sigma two, sigma three all the way up to Sigma K, one for each class. They're all equal to each other, so we're just going to call them sigma here. Okay, this is a strong assumption, it doesn't need to be true, if it isn't true, then you wouldn't use this. These are assumptions that you need to decide if they fit your data and if as a data scientist or statistician you agree with this. If you can get on board enough that these assumptions are realistic then maybe LDA might be an option for you. Okay these are two assumptions that definitely hold going forward for LDA, P equals one class. Okay, so F sub K of X is this now if you're asking, where did you get this random formula of things? Then again you need to go back and kind of re-evaluate some of your stats background. This is an expectation that it should instantly click that this is a normal distribution, right. So the functional form of F sub K of X is normal, this is just the function of normal one over square root of two pie sigma K. Now we're going to make that assumption again, that's all sigma's are equal. So that sigma K here will turn into sigma here, no longer relying on your K class. Yeah it's just sigma no matter what, okay, and then sigma here, so this sigma K square turns into sigma squared. Mus of K, stays mus of K, we're not making any assumptions about mu. Okay, so our general form is normal, it will look like this. Now, we'll get rid of sigma K, sigma K turn it into sigma due to this assumption. And we end up with what we talked about last time in generative models. It is going to be the per class on top, and the total on bottom, right? Per class prior, times the per class distribution. Now when we talked about it last time it was positive K times F sub K of X. But here again, we're assuming F sub K of X is normal, so we plug it in here and we get this for each K class. So for the Kth class, that will be your top and you're in the bottom the denominator here will be the summation over all classes of that same thing. Keep in mind that the only thing changing, and this distribution is mu, sigma does not change. We assumed in the beginning that they were all equal to each other, which makes this much nicer. Okay, so we have this posterior distribution, now granted we did assume a lot of things but again, assuming you can justify that we have a posterior here. Now if we want to maximize this posterior we want to say for some Kth class, this posterior is going to be the largest. That's the one we want, right, we want the highest probability class, that's what we're looking for. [COUGH] This is a difficult thing to maximize, but [COUGH] if you take the log of it alright. We've seen problems before in statistics we're maximizing something is difficult. But the point in which you can maximize something is the same as the point in which you can maximize the log of that something. If you don't know why that's the case, that's an important thing to understand, so make sure you do. So, in order to maximize this here, we want to maximize this, we can just take the log of it and maximize that, we'll get the same results. So if you maximize it, you do a little algebra, you eventually get something nice and neat, which is this. So, you want eventually you want to, like I said, we want the class that gives us the largest probability to the same thing. When we have an observation, we want to assign the class that maximizes this. This, take the log, do a little bit algebra, you'll end up here much easier, right? So what LDA does is, it does exactly what we talked about last time and exactly what's happening here, right. For each class we take a prior and then a distribution pretty much figure out which one is the biggest posterior probability. That's hard to do, so we take the log to do a lot of algebra and we end up with an easier thing to maximize. So, when you go through this for each class K, you'll get some number, some data that I signed it too. And you want to just pick the largest one, and it's just that. So that's pretty much all LDAs is hey this posterior, which comes from a prior and distribution. Let's maximize it, okay, let's use a little bit of mathematical trickery here and maximize it. Take the log maximize that and we end up with this formula here, which we just want to pick the class K, which gives us the largest result.