Hello again. I'm Neil Clark, and in this lecture, I'm going to cover the second part of principal components analysis. So in the first part, we covered some of the mathematical ingredients that go into principal components analysis. So now what we're going to do is we're going to put them together to perform the analysis on a simple example. We'll apply PCA to the data set shown here. You might find it helpful to have a specific idea in mind of the meaning of this data, so let's say that we have two genes. Let's call them X1 and X2 and we've measured their expression levels ten times, resulting in the table shown here. First you plot the data into the form described at the beginning of the first part. That is a matrix with the samples running across the columns and the variables running down the rows. The results for our data in this example is shown here in the matrix D. Do you remember in the first part how we looked at the dimensionality of the data? But in this case, we have two variables which are the expression levels of two genes. So we can picture each pair of measurements on a plot, with two axes, two dimensions. And this is shown in the plot here. And because we can visualize this data so easily, the structure in the data is already apparent, so we don't strictly need to apply the PCA in this case. However, we'll press on with this example as it will allow you a very clear picture of what is happening when we perform principal components analysis. Now, the first step, we're going to use the statistics that we covered in part one. For reasons that will become clear later, it's best to subtract the mean from the data. To do this, we first calculate the mean of each of the two variables. This calculation shown here in the first two equations. We take the sum of the expression values and divide by the total number of samples. And repeat this process for each gene to arrive at two corresponding mean values. These are then subtracted from each of the corresponding expression values resulting in the new centered data matrix, D prime. This means subtraction transformation effectively transforms to a new coordinate system in which the origin is the center, or the mean position of the data. None of the structure in the data has changed, only the origin of the coordinates has moved. The next step is to calculate the covariance metrics. In part one, we saw how to calculate the covariance between two sets of numbers. Now we're going to make a matrix. There's ith row and jth column is a number equal to the covariance of the expression levels of gene I and gene J. The first equation on this slide shows a simple expression that is in terms of matrix multiplication. And the second equation shows the layout of the covariance matrix for our example. Just as an aside, remember that the covariance of a variable with itself is equal to the variance of that variable. So the diagonal of their elements of the covariance matrix are equal to the variances of the individual variables. The covariance matrix for our data is shown in the third line here. This covariance matrix is going to form the basis of our analysis. First, we calculate the eigenvalues of this matrix. They are shown in the fourth line. We can immediately start to interrupt these results because the relative size of the item values correspond to the relative amount of variance in the data in the directions of the associated item vectors. So from the fact that one item value is much larger than the other, we can immediately infer that there is much more variation in one direction than the one perpendicular. This was already evident from our plot, the data, but it is not usually so easy to see. The next step is to calculate the eigenvectors, which correspond to the eigenvalues we've just calculated. These eigenvectors are shown in the first line of this slide. Now do you remember how we said that when the eigenvectors are all placed next to each other to form the columns of our square matrix, then that matrix is an orthogonal matrix. And we also said that orthogonal matrices are those matrices that perform rotated axis coordinate transformations. Well in the next slide, we do just that. Though it's important to note that we place the eigenvectors in a particular order, we place them in the order corresponding to the order of the magnitude of the corresponding item values. So the eigenvector corresponding to the largest eigenvalue forms the first column, and then the eigenvector corresponding to the next largest eigenvalue forms the second. And so on and so on. Finally, we take the transpose of this matrix where we just swap the rows and columns to finally arrive at our transformation matrix, W superscript T. We now want to see the effect of the coordinate transformation, which is enacted by the matrix we've just worked on. We've already seen orthogonal matrices in actual rotated axes coordinate transformations, so if we regard the expression of each gene as being a coordinate of an abstract expression space, then our matrix is going to provide us with two new coordinates, which are calculated by matrix multiplication. As shown in the first equation, this multiplication operation, which transforms our data, is shown explicitly in the next equation. The action of this coordinate transformation is illustrated in the figures on the right. The original plot for our data is shown in the top figure, along with the new coordinate axes, which are at an angle to the original, that in the bottom figure, the data are simply plotted in the new coordinates. These coordinates are called principal components. And because our data was only two dimensional, we can clearly see what PCA has done. It is produced a new coordinate system where the first coordinate aligns with the direction in expression space, which has the most variation. If the expression space had more dimensions, then subsequent coordinates would align with directions with descending degrees of variation. This is why we're careful to order according to the size of the eigenvalues. So what PCA is trying to do is capture as much variation in the first coordinate as possible, then the same for the second coordinate, and so on. In the case of our data, if you look in the bottom figure, you can see that all the meaningful variation seems to have been captured with the first coordinate, our principal component. And the second component always seems to be random scatter. So we have effectively reduced the dimensionality of our data from two to one. In our example, this may not be very important, but when you have 20,000 genes, PCA may well be able to capture most of the variation in your data, with only two or three principal components. Thereby making it visualizable, so this is the significance of PCA. It allows you to take high dimensional data and reduce it to just a few principal components, which hopefully capture most of the variation of the data and allow you to infer meaningful structure. So in summary, Principle Components Analysis is a method which we use to extract the trends from high dimensional data by allowing you to visualize it. If each variable, the expression of a single gene for example, is interpreted as a coordinate in a high dimensional space, which we have been referring to as expression space, the PCA works by using the covariance matrix to derive a transformation matrix which rotates the coordinate axes. This is done in such a way that the maximum amount of variation of the data is captured in as few coordinates as possible. These new coordinates are called principal components, and it is possible that the first two components can capture most of the variation in your data and allow you to extract meaningful patterns and structure.