Hello I'm Neil Clark. I'm a postdoc here at Mount Sinai at the lab of Ma'ayan and I'm gonna cover a couple of lectures that deal with high dimensional data and I'm gonna start this one with the first part of principal components analysis. Principal components analysis, often referred to as PCA, is a mathematical technique that is used for exploring data. It is particularly useful for high-dimensional data. To illustrate the form of data we'll be using, and to also explain what we mean by high-dimensional data, take a look at the array of numbers shown here. This might contain the appropriately processed expression values called micro experiment. For example, each column contains the result of an individual sample or experiment and each row corresponds to the estimated expression level of an individual gene. This form of data with samples across the columns and features down the rows, and a continuous value in each element is the general form, the data we'll be using here in principal component analysis. Now, what do we mean by high-dimensional? We can think of the result of each experiment as a point in a kind of space, and that each feature, or gene in this case, is a coordinate in this space. If there are only two or three features then we can easily picture the points in two or three dimensional space. And we can immediately see the patterns and structure extending into these dimensions. But if there are many more features, there are typically thousands or tens of thousands of genes on a micro-array chip, then the space is very high-dimensional in which structure can extend into all of these dimensions. So this is high-dimensional data and this is the kind of data that we're gonna use principle components analysis to help us see the structure and the patterns in this kind of data. So there are a number of computer tools which automatically perform the PCA for you. But my approach here is not to show you how to use those tools but to try show you how PCA works so that you can really understand the results. I'll do this in two steps. So first in this part, I'll go through some of the mathematical preliminaries. Then in the next part, I'll use these to illustrate PCA at work with a simple example. The mathematical preliminaries I'll introduce here fall into two categories. First, there will be some statistics, and then I'll go through some basic linear algebra. First, the basic statistics. So typically, our data will have some degree of noise or randomness. There may be noise from biological as well as technical sources. Now, in such situations, statistics are useful. We may not be able to find predictions or descriptions for the exact bad news of our data, but their statistics may be well behaved and predictable. We shall only need some very basic statistics. We'll need the mean, the variance and the covariance. I will now describe these in turn. I'll describe the mean and each of these statistics in terms of an example. Consider this set of values here, we've called it S. We can take this opportunity to begin to introduce our notation also. Notice how we can refer to individual elements in our set S with a subscript. So S subscript 3 is a third element minus 2.6. Notice how we can refer to individual elements in our set with a subscript. So S subscript 3 is the third element, 2.6. In terms of this notation the mean is given by the following expression. Note that we use the standard notation of the capital sigma to indicate the sum. So here we're taking the sum of all the elements in the set and dividing by the total number of elements in that set. This calculation is carried out explicitly here in the next line. The mean provides a quantitative measure of the location or the middle of the data. As illustrated by the dotted line in the figure on the right. The next statistic that we'll need is the variance. The variance can be thought of as a measure of the spread of the data. It's given in terms of the notation that we've described by this equation, on the top point here. In words, we subtract the mean from each data point, square it, sum, and then divide by the total number of data points minus 1. The minus 1 helps provide a bias adjusted estimate of the true variance. We can calculate this explicitly for our data set, S, as shown here. To illustrate the difference between mean and variance, the figures show two distributions of data points. Each with the same mean, but the distribution on the top has a larger variance. To put it another way, the data points are at the same location but with a different strength. So the third statistic we'll need is the covariance. Just as the variance measures the degree to which a set of data varies, the covariance is a measure of the way two sets of data vary together. The explicit equation for the variance is shown here. In the first equation. In words, we subtract the appropriate mean from each data point, multiply them together, sum, and again divide by the total number of data points minus 1. The figures here illustrate how the covariance might be interpreted. Each figure we plot the value from one data set against the corresponding value from the other data set. You might like to think of an example of the expression of two genes across various samples. In A and B we see examples where there is a positive correlation between the two. This results in a positive covariance. The relation is stronger in B than A, so the data sets in B would have a larger positive covariance. This occurs because whenever X is large compared to its mean, Y also tends to be large compared to its mean. And so the product is positive. Conversely, when X is small. And X minus the mean is negative, then Y minus its mean also tends to be negative. And the product is again positive. So when we take the sum, as shown in the equation here, the result is positive. The example in D has a negative covariance. You can follow similar reasoning to work out why. The covariance also increases in magnitude as the variance of each of the two data sets increases. Also know that the covariance of the data set with itself is equal to the variance of that data set. Okay, so you might be glad to hear that's all the statistics that we're gonna need for components analysis. Now we're gonna move on to some matrix algebra. First, we'll define a matrix before going on to some operations with matrices and some useful properties. You can think of a matrix as a rectangular array of numbers, such as the example shown here. Similarly to our previous examples we can pick out individual elements from our matrices with sub scripts which indicate the row and column of the element, like coordinates. For example, if you wanted to select the element from second row and the third column of the matrix A here, we could write it as A with the subscripts 2,3. And the element is minus 4 in this case. A general 3x3 matrix is shown here too. You might think of an n-vector as being a special case where there is only one column and n rows. Now we've defined matrices. Let's see some things we can do with them. First we can add them, but in order to do this the two matrices that we are adding need to have the same number of rows and columns. This is because the way two matrices are added is that each corresponding element is added. As shown here by for two by two matrices. Matrix multiplication is a little more complicated. The result of multiplying two matrices is that the element in the ith row and the jth column is obtained by taking the ith row of the first matrix and the jth column of the second matrix, then multiplying the corresponding elements and then summing them all up. This is illustrated here, the same two matrices A and B as the previous example. Know that in order to multiply two matrices, the number of columns in the first matrix should equal the number of rows in the second matrix. You might be relieved to know that multiplying a matrix by a number is a lot simpler. You just multiply each element of the matrix by that number. And before we leave this slide we should mention a special matrix, the identity matrix. This acts a little like the number 1 with respect to matrix multiplication because if you multiply any matrix A by the identity matrix the result is just A again. The identity matrix can have any number of rules, but it must be square. It has all elements equal to zero; apart from the diagonal elements, the elements with the same row number as column number, which are all equal to 1. Okay so now we're gonna start to use our matrices and their operations to do something useful. Look at the example in two dimensional space because it's nice and easy to picture what's going on. Just like points on a map we identify points in two dimensions with two coordinates. Let's call them X1 and X2. We can make these the components of a two vector as shown here. Then to each point there is a corresponding vector. For example, the plot shown in the figure on the right is represented by the vector with elements one and two. We now want to consider the exact same space, but now we're gonna introduce a new way of assigning vectors to each point. Just as before, the vectors have the components X1 and X2. To distinguish the new way of assigning coordinates, we'll give the same point in the space another set of coordinates, X1 primed and X2 primed. One way to achieve this is to rotate the coordinate axis, as shown in the figure here. We can then make a new coordinate system by using a transformation matrix. Let's call it T, which relates the two coordinates vectors by matrix multiplication. There are many ways to transform coordinates, but we should be interested particularly in transformations which rotate coordinate axis. These are performed by matrices which have the property called orthogonality. The final piece of linear algebra which you'll need to understand the workings of PCA are eigenvectors and eigenvalues. When a transformation matrix maps a vector to a multiple of itself then the vector is called an eigenvector. And the amount by which it is stretched is the associated Eigenvalue. These are the solutions to the equations shown here. If you'd like to experiment, you can use functions in programs such as MATLAB or to calculate these. The ideal vectors for any given transformation matrix makes a set of directions or transformed coordinate axes, which only undergo a stretch under the transformation. In broad terms, PCA uses core variants to encode the structure in the data and then eigenvectors to devise a new set of coordinates that best reveal the structure by finding the appropriate set of directions. One result from linear algebra is that if the eigenvectors are placed next to each other to form the columns of a square matrix, then the result is an orthogonal matrix that performs a coordinate transformation. This is central to principal component analysis. Before we continue with PCA, we shall just take a quick look at an illustrated example eigenvectors and eigenvalues. Just so you have a clear picture of what's going on. Take the following matrix as an example. I can tell you that it has eigenvalues of 4 and minus 1 with the corresponding eigenvectors shown here. You can easily check that this is correct by performing matrix multiplication. If you multiply the original matrix by the first eigenvector, the result is just the eigenvector again with a stretching factor of 4. You can check this for yourself by hand. Though for the larger matrices which we'll use later, these calculations are best left to computers.