We have seen a camera projection matrix that take a point in 3D space and transform it into a two-dimensional image plan. We've seen that both the three-dimensional points as well as the two-dimensional image points are representing its homogeneous coordinate system. There are two factors in this transformation. The first is the camera rotation and translation, the second is the pixel transformation or the camera calibration matrix K. Recall we see a picture not just as a two dimensional image but as a camera seen at front of a person who is looking into the world through the picture. As such as a points of a two dimensional planes, I form my ray between the camera man and the point. That line forms the ray into the three dimensional space. And that set of rays is corresponding to set of a parallel lines in a physical space, all converge to the same point in infinity. And when those point infinities that project to the 2D plains, it project to the same points that's all pointing, it's called the vanishing point. In this world, we have two different coordinate systems. One seen as a third person perspective, what we call the world coordinate system, shown here in RGB coordinates where red is x, green is y and z is blue. That's the world coordinate systems. As a person we're looking at walking around the space. My coordinate system, also known as the first person coordinate system changes. My coordinate system, first person coordinate system, is a simple one where x points to the right, y points down, z point to the space. The existing factor is knowing where we are as the first person perspective to world, where we are in the space, we want to know how are we oriented in space and we want to know how we are located into an XY position relative to the world coordinate system. So this lecture is about figuring out where we are in the space or putting a virtual GPS system on ourself. So how do we cover this rotation matrix? What we're going to do is try to understand the concept of vanishing points and using a vanishing points to tell us how are we oriented in space. First we will ignore to translation we'll see how to recover translation later, we going to focus on either rotation for now. Under rotation matrix has three dimensional orientations, we need to know the relative angles for all the axis in 3D space. FIrst we're going to come up to a simple case where we only know two of the rotation axis. And this requires us looking at a single vanishing point. A single vanishing point, for example, we can take a ray in space, for example we have a wall in front of us a building at intersection two streets. We could one of the streets as defining a ray in the z direction. We can define this ray in z direction as long as we can identify the same row from multiple different angles. As such, this set of points lay on this ray, will project to a set of points in the image plane as a line. In fact, all the rays in three-dimensional space lay on this road pointing in the z direction. All going to project to lines in the two dimensional image, and those lines converge into a single vanishing point. We're going to write down a equation for this z vanishing point in the world coordinate system in 3D, as well as the z vanishing point in the two dimensional image. The z vanishing in the world coordinate system is represented x equal to zero, y equal to zero, z equal to one. And to indicate this is a point of infinity we'll put the last coordinate as zero. An the image, this point is formed by intersecting lines or a point in the z direction. And this is going to be a point in the finite coordinate system. And we call this point v of z. Taking the camera transformation matrix we have seen before, we're going to take the rotation translation. Not apply that with the point in z infinity and transform again to camera calibration matrix that obtains a z vanishing point in the 2D image plain. Plug in the specific equations for z vanishing point we see x is equal to zero y equal to zero, z equal to one and infinity is zero. This equation we can see very quickly eliminates the two elements of the rotation matrix. The first column r1, multiply to zero, it disappears, r2 multiply to zero, it disappears, only r3 maintain itself and t disappear, because it's multiplied by zero. So, as the result by this collaborators of z vanishing points we eliminated three out of four columns in this camera projection matrix. Therefore, the image point of the point of infinity, the vanishing point, v of zero is simply the camera calibration matrix k transforming this third column of rotation matrix r. Another way to say it is if I have a three dimensional axis in the 3D, 3 dimensional axis in z direction. When I image this point in the pixel space, I'm seeing this vanishing point in it's xy locations. All the other angles disappear. In fact, translation also disappears. Meaning that if I move the camera left and right, up and down, the same vanishing point shall remain itself. And this makes sense if, for example, we look at a night sky pointing south to a star in the sky no matter how we move left and right, the star remains in south. The star only moves when we rotate ourselves. So we have this vanishing point in the pixel space, xy representing vc. To recover the rotation column r3 we have to first take the inverse of a k inverse and that transform the pixels in the xy space measuring pixels to two dimensional optical row and that's to represent by k inverse times v3. First of all we know this is a color in the rotation matrix, so therefore a must have normed one. We denormalized that vector to obtain r3 in the following equations. So now we have our rotation of column r3. So how do we know what it means us? Some of you might be familiar with the pan tilt and representation rotation. If I were flying a plane I can see myself panning around, tilting up and down, and in space. In fact, this diagram shows that z axis of the world is transformed by the camera pan and the pan tilt. Pan represented by alpha. Tilt represented by beta. We see that the third column rotation matrix r3 is represented in the following vector form in alpha and beta. Therefore if I had obtained r3 we can derive through simple calculation what a specific alpha and beta or specific pan and tilt angle that we have corresponding to the person look into the space. Now we are ready. We encourage you take your cell phone and go out, take a few pictures. As we can see, this picture of the building that we're working in, we have set of parallel lines on the ground and the set of parallel lines intersect at a point corresponding to vanishing point in z. We can use the equation we introduced earlier transform this vanishing point z in the picture so the optical world, so the k inverse, so that the representation of the pan and tilt angle. So you imagine if you are the cameraman where you, turning left or right, vanishing point z will change left and right. As you tilt the camera up and down, again the vanishing point z will go up and down in image space. So how about the other angle we have,the yaw angle? The rotation on the z axis. Can we figure this out by this set of parallel lines? No, because given a point in a vanishing point the z direction. If the camera simple rotates there's no way this vanishing point z is going to move. To figure out all three angle rotation in fact we need two vanishing points in the perpendicular direction. So let's go visit our example again. We have a building in the corner intersection of two streets. And the streets are laid out such that in perpendicular direction to each other. We take a picture of this scene in the z direction again we had a vanishing point. And now we have another vanishing point corresponding to the row in the horizontal direction. And lines on the horizontal direction on the row, physical space project a set of converging lines and image forming a vanishing point in x direction. Given a vanishing point on x, vanishing point z in the image, we can show again, that we can recover rotation columns r3 through the vanishing point z. A rotation column one. In front of vanishing point x, those are transformed by the K inverse, where K is the calibration matrix. Given the r1, r2, r1, r3 are perpendicular to each other, we can reconstruct the third axis. By taking the cross product between r3 and r1. Where r3 corresponds to z axis, r1 is the x axis. So now, given two vanishing points in directions, we have completely estimated rotation matrix r. The space that we're in, in fact are structured in a way, such that all film will have many vanishing points correspond to the direction of lines constructed so they're perpendicular to each other. So this buildings it's constructed nicely, such that one facade provide two vanishing points, one in the x direction, one in z direction. And the other facade provide us the y vanishing point and the z vanishing point. This is done because typically man made structures are constructed in such that we have right angle to each other. Such that the building maintain structurally stable. So on the right we see our vanishing points in all three directions. And this can be measured in the picture itself. Note the vanishing point itself might not lay in the physical picture itself but still have finite coordinates. Using the K inverse transformation, we can transform the two dimensional pixel coordinates into this ray corresponding to the optical ray in the optical world. And hence, it provides us a scale fracture of the rotation access, r1, r2, and r3. Now, we have a scale factor Z which needs to be taken out and this is easily done, because we know all the three columns, r1, r2, r3 has normal one. We simply renormalized r1, r2, r3 and show they have a normal one. Putting a three column together, we obtained the entire rotation matrix. So here we demonstrate a few examples. On the left we see a picture of a poster, again poster constructed such that we have two set of orthogonal lines. One in a horizontal direction. One in vertical direction. So long as we have objects like this in the world, we can orientate ourselves as we move through space. So we can compute for example the vanishing point in a horizontal direction, indicated here in the middle. We transform these vanishing points from the pixel coordinates to the optic coordinates so the K inverse. Again we're going to see on the right we have vanishing point computed in a y direction. In this case, the two vertical lines are almost parallel to each other. So the vanishing point is pretty further out from the image. Similarly, yet again a different camera pose, now we see the vanishing point through the right, this case the vanishing point y is pointing downwards. Given the vanishing points, we take a K inverse to bring to the optical world from the pixel world and we normalize back to norm such equal and norm one. And from that we obtained the columns in the rotation matrix. Again, we need to ensure that we have measurements on lines which are perpendicular to each other in the physical world. So long we have those lines existing in our environments, we can orientate ourself respect to that object. Given the two columns of rotation matrix, we can construct the third columns by taking the cross-product. Yes, one more time. Given all the colons in rotation matrix we can obtain the physical meanings of the camera until pan tilt if we want to have taken the following equations. Here we introduce ourself to camera moving in space. In each distance, we have a view of the world from our first person camera perspective. As you can imagine, we're looking at the picture on the left pointing to the left direction, center more or less straight, right to the right. In the figure below, I show the pan and tilt angles corresponding to each with a camera. Left indicating red, center in green, blue on the right. Now we see how to recover the camera orientation relative to the world if we were able to recognize two perpendicular directions in the physical space. How do we figure out both orientation of the camera as well as the translation, the position the camera itself by looking the image? In this case, we need to recover the rotation R as well as the translation vector t. This is the case where we used a homography transformation, from a planar object in 3D, to a planar surface in 2D. Again, we go back, to the street intersection that we were seeing before. Now, instead of looking at a three dimensional world, we look in at the ground plane itself. In this case, we conveniently orient the work coordinate system such as X is on the horizontal road, Y is the vertical road going up and Z is perpendicular to the ground plane. As such, a point on the ground plane has an accordance system of X, Y, Z equal to 0, 1. This set of points on the ground plane is then projected to the image to a point U, V and 1 in the homogeneous coordinates. Looking through the camera projection equation, we see this arrangement of coordinates system allowed us to eliminate the third columns of r, r3. Therefore, obtaining a two dimensional tomography transformation, where we have a k, a 3x3 matrix r1 r2 and t, a 3x3 matrix, transforming the x, y on the ground planar road to the x, y in the image plane. We call this combined homography matrix K times r1 r2 t, H total. H total itself can be estimated by taking four points on the physical ground plane and four points corresponding to that in the image plane. Given this four point correspondence, we can numerically compute the 3x3 matrix and given the 3x3 matrix computed, then we can backtrack to figure out what our rotation translation is. This is done by first taking the H total estimated by 4 point corresponding in the physical space to the image planes through the K inverse transformation. This time again, take the pixel world into the optical world. Once the K inverse is eliminated, then we see the H matrix in the optical world is simply made of rotation 1, rotation 2 column and translation vector t. It's very simple. Given four points in the ground planes, four points in the image space corresponding to each other, estimate the H total in the pixel domain. Take the H total, transform through K inverse, the camera calibration matrix, bringing it back to the optical world, in the optical world, this H matrix is very simple, it's made of R1, R2 and T. We can simply insure r1, r2 have normal 1 giving us rotation axis in R and t through the normalization of r1, r2 exposed the translation vector for us. For example, in this case, we have computed H through the K inverse of the H total obtained, shown on top right. And then we can normalize the columns R1 and R2 to be a normal one and the same scale factor is used to normalize the third columns of H and that gives us the translation vector. If you want to know the third column rotation matrix which was limited, again, we can simply take the cross product between r1, r2. Now we have estimated the rotation and translation of the camera from the camera projection matrix. How do we relay this quantity to the camera position itself in the world? This is an important point to distinguish because we have a third person perspective as well as a first person perspective. As recall, the camera rotation translation vector transforms the third person perspective of the world to the first person perspective. Now let's look at what this means, the t particularly, the translation vector t. In this transformation, the t vector is a vector pointing from the camera person, the first person perspective to the world origin, is a vector going from me to the world center. Is it going to be measured in my first person perspective from my coordinate system? As you imagine, if I were moving around a space, if I simply fix the camera position relative to the work origin, I simply rotate myself, this t is going to change. In fact, we can see the t encodes not just the position relative me to the world center but also encodes the relative orientation of me to the world. As such, if we're looking from the world perspective while we are in the space, first thing we need to do is looking at a vector minus t that's going from the world center to me, opposite direction. Second, what we need to do is transform that t that's measuring my coordinate system to the world coordinate system. This is going through rotation matrix R, again, transform my coordinate system, in my first person perspective to the world coordinate system. The transformation of minus t simply changes the perspective going from one direction to the other and R transformed the coordinate system for me. If I would know the camera center in terms of C in the world coordinate system, t equals minus R times t.