In this module, we are going to look at another way of representing information. So as the title says, "Knowledge can be represented in trees." What it means is that, there's a way to keep asking questions in there. I read the right answer starting from one question and saying, "Are you male or female? Are you old or young? You own a house, you don't own a house," and you can just go down the tree and then arrive at where you want to go at the answer you want to get. So that's, we'll see that often knowledge is presented that way. As is consistent with the tools part of this course, we want to construct these trees using data rather than our experience. So that's the meaning of this sentence, knowledge and trees, and trees constructing trees using data. Before we dive into trees, we'll briefly discuss what are the different classifiers, so partitional versus discriminative. So we will try, we are going into what are known as discriminative classifiers and we'll see the difference between this and what we all are already seen in a previous module. Then we will come to rules, because there are different kinds of discriminative classifiers. So we're going to look at rule-based classifiers and try to understand why it rule based classifiers actually create discrimination, discriminated lines or surfaces to publish in your data. Then we are going to then ask, how do you construct these rules from using information? So we will look at a simple example of constructing a tree and if time permits, we will also see how to construct a forest from these trees, so which is a forest as many trees as you can see. We'll go into Rattle and start looking at these examples. So moving on, let's see the difference between the classification we did so far and what we want to do in this module. So the classification we did, was by describing. So two of them we did was the Bayesian classifier and the Nearest Neighbor classifier. So basically what these classifiers do, is identify similar areas and say these are similar and these are similar. That's one way of doing it and we didn't do Parzen Window, but Parzen Window is another way of doing the same thing. Identifying clusters similar things and saying this is one, this is another, another. Another way of doing it, is instead of saying, "We will identify the clusters," is to be able to draw these lines and in space or whatever number of dimensions in this case, they're given a two-dimensional representation. They're all red dots are in between the four lines and outside these four lines, you've got all the blue dots. So basically, we have partitioned it using lines, discriminated it and there are many ways to do it. We're going to see really one thing in this course, which are decision trees. Neural networks fall into this and more complex things like support vector machines also fall into this. So the whole idea here is you want to partition the space and saying using these lines, I'm able to partition the space into regions and these regions distinct. Let's take again the difference between one and the other. So for example, I could, I know it looks like a sausage, but it is not the same data on both sides and you've got a description classifier, that is a discriminative classifier. You can see visually what's the difference between one and the other. Remember we did the IRIS dataset? It here as a descriptive classifier, one cluster represents one IRIS and other, another and another. You can also imagine that I could draw lines to separate them out and that becomes a discriminative classifier. So you got two lines and they're dividing this into three places that may not always be two lines. There could be many lines, dividing it into areas and each of these areas would have a concentration of a similar type of object. So what is discriminative classification? It is partitioning, that's the formal definition, space into pure regions assigned to each class. If you like that better than the picture, fine. So here's an example, and maybe it's worth animating it, showing you this way. So here is a classification boundary and everything on the right is blue. Everything on the left is brown. You see there are some objects which had been misclassified. You can see that, right? So there are several steps in presenting it like this. We have transformed the data, retained the two most important features using maybe a principal component analysis, we've drawn the graph and created a decision boundary and then ask, is region one in region two? Then ask, how many mistakes are there? We can see that a few of the points have been misclassified. So the classification error, we say is 0.015, it's the number of points which have been misclassified. So here is decision boundary for the IRIS data. You will remember it. The axis is the sepal length versus width and you can see it's no longer a simple boundary, it's three different lines and it's partitioning the space into three parts and it's doing it into the three components and you have the color code, the Setosa, the Virginica and the Versicolor and they are in separate box of this. This data you have not seen but if you do a lot of analysis this kind, this is the digits data from zero, one, two, three so on to 10 I think, the 11 areas each corresponds to a digit. It's been transformed into a two-dimensional space and these lines are partitioning them into areas which correspond to each digit is not perfect. That is a little bit of error out there, as you can see.