Now, we are back to the same street corner. We see the same building across the street. This time you are with two of your friends Bob and Mike, each in town visiting. They took a picture of the same building. And they share with each other. Now given just the two picture, can we figure out how they are relatively located, their relative rotation and relative translation between them? In this lectures we will study how, given two pictures of the same scene allow us to compute the relative translation rotation between them. This doesn't require us knowing a vanishing point, as we saw earlier. It doesn't require us knowing a three dimensional plane, or object as we saw before. It simply requires two pictures looking at the same scene. How did we do this? Intuitively if Bob can see Mike in his field of view that would give a hint how Mike is located relative to the building. Similarly if Mike can take a picture with Bob in it. Then he also know how Bob is relatively located to the building, and then allow them to figure out their relative position to each other. In fact, this is the concept will make a precise in the following lectures. So now I encourage to go out, take a picture with your friend of the same object, the building. Hopefully you can take a picture so their friends is in the field of view. If you don't have a friend, I still encourage you to go out, and take a picture. You can simply take a picture by moving from location A to B, and maybe you can go find a friend later. Now we will see how we can estimated relative three dimensional translation rotation between two viewers given that just two men is by itself. First set of intuitions. Imagine Bob is pointing, with a laser pointer, to a three dimensional building. And the image is sitting in front of him. This three-dimensional laser beam is then imaged into Mike's field of view. If Mike was looking through his camera he can see Bob's laser beam pointing at the building corners as a ray, and this ray is imaged as a line into Mike's image. And furthermore, if the image is large enough or Mike has a large field of view, he can even see Bob in the picture holding the laser beam pointing at this building. So, I see this line and Bob himself. As Bob moves his laser beam around pointing at the different corners of the building, Mike witnessed this event by seeing a set of rays, a set of lines forming his picture. And those rays all intersect at the same point which is Bob's location. And Mike will see this if his field of view is large enough. So Mike is seeing a picture in front of him, the building, the street. In addition, a set of laser lights coming through Bob Pointing at a different part of the building as rays. Similarly, Mike can do the same thing. Mike can take his laser beam pointing at different part of the buildings. Every time he does so Bob is observe it and he sees a line forming in his picture. And if Bob's field of view is wide enough, he can also see Mike in it. Again, Mike sweeps his laser beam around in the 3D object pointing to different points in a 3D scene, and in each time he does. So, Bob observed in terms of align in his image. So now Bob and Mike can share this information with each other in terms of a photograph. So on the left we see Bob's view. We see the building in front of him we see the street with the vanishing points on it. He also see Mike, if his field of view is large enough, and he sees Mike holding laser beams pointing to different location on the building. And those are lines in Bob's field of view. And similarly, Mike see the building in front of him and a the vanishing point in his view but also see Bob holding. Bob's laser light point in different points and those laser lights from lines in his image. In fact, those quantity have precise names. We will specifically cover location of Bob and Mike in each other's field of view the epic poles. And these laser lights that we are witnessing from the other person's point of view is called the Epipolar lines. An important point is that if I were looking at a particular voting corners, if I were Bob looking at the left in the voting corners, that forms a ray a lie, that people are lie In Mike's field of view. So if I were looking straight out of the world from Bob's point of view, I'm pointing a laser inside the object, and that laser beam would intersect the image in my field of view or Bob's field of view at a point. But the laser beam is going to be image to ally in Mike's image instead. And this formation from point to an epipolar line, is what allowed us to compute throughout the camera rotation, and translation between the two viewers in 3D. So first seek following computation we will need to have two pictures of the same scene. And we would like to have point correspondence meaning that we know exactly which building corner found Bob's view correspond to the pixel in Mike's view. [BLANK AUDIO] This can be done automatically with the feature matching. Or it can be done manually if you click on the points corresponding to the same objects into different field of views. And we should see that we need at least eight points. If I were just giving eight point correspondences and two pictures of 3D scenes. We can compute the route of a three dimensional translation rotation between two views. So now we are ready to derive this computation mathematically. We turn to the drawing that we have with Bob looking to see, and Mike looking to see. I recall we can define the corner system any way we want it for Bob. And we would define the corner system in the first person perspective. That's the world is measured according to Bob's corner system. So Bob is the center of the universe. And his X is universe X. His Y pointing down is the universe of Y, and Z pointing into the scene is the Z direction. And, as such, Bob has a simple camera projection matrix, consists of a camera calibration matrix K times identity, followed by a zero column. Mike, camera projection matrix. Consists of calibration matrix K, rotation matrix, and t. So Mike need to take a point in the world coordinate system and transform it through the rotation and translation, into his first person perspective, and then project through the k matrix, into 2D. We know this, so let's look at the particular points that Bob is pointing to. Bob is pointing a particular building corner we saw before and this is located as location x1. In the three dimensional space reference the camera center number 1. Now imagine Mike is going to use his left hand and point to Bob. And this direction has a vector called t. And it is simply the difference between the position between Mike and Bob, and is measured in the coordinate system of Mike, illustrated in red. Now, Mike can also measure the same point that Bob is pointing at in 3D in his own coordinate system. And this coordinate system that Mike's reference frame is called the X2. It's a three dimensional vector. And X2 And X1 are the same point, it's just measured in two different field of views. And this transformation is described as X2=RX1 + t. So now we have three vectors, the t between a translation between Bob and Mike measured in Mike's coordinate system X2 which is the point that Mike's pointing in 3D and X1 which is the same point but measured in Bob's camera reference center. So now Mike can also point to the same point on the three buildings. And to his measurements, this is called X2, and X2 is a simple transformation of X1, measured in Bob's coordinate system into Mike's coordinate system. And mathematically, this is written as x2 = rotation times X1 + t. So now we have three set of lines. Line number 1t, which is translation between Bob and Mike in 3D. Measuring Mike's coordinate system. X2 is the ray between Mike and the building corner, measuring Mike's coordinate system. We have the third line going from Bob to the same building corner. And this ray in Mike's coordinate system defines X rotation times X1, or it can be seen as X2- t. Those three rays forms a plane As such, we can define that normal to this plane, by taking cross product between two of these vectors. For example, we can take t, the relative vector between Mike and Bob, cross with X2 when Mike is pointing onto the building. The cross-product between the three vectors give the surface normal of that plane, in 3D. The third vector we have x2- t, in fact is perpendicular to the surface normal because it sits on the same plane, so therefore we have x2- t transpose times surface normal = 0. Since X2- t is simply the coordinate transformation of X1, Bob's ray into the world multiplied by R which transformed me to Mike's coordinate system. We substitute the equation and further expanse. We take the transpose of the two quantity in the parentheses out, obtain an x equation, and further simplification through a transpose, allows us to obtain the formal equation, where we have x2. Bob's ray transposed times t. Bob and Mike's relative position to each other, cross-product of that, times rotation times x1 which is Bob's point of view equal to 0. Now we see a bilinear relationship between x 1, x 2. Bob's and Mike's point of view into the same point in the 3D space. So a matrix in the middle. We are going to combine the two matrix in the middle t cross and r into one combined matrix and give this a matrix a name, called a essential matrix. As termed name by Longuet-Higgins, who first discovered this relationship. The essential matrix, hides inside both the translation and rotation relationship, between two of the cameras. Into a simple form of a three by three matrix, and that three by three matrix, relays a corresponding ray from Bob's, Mike's point of view through the bilinear equations, where X2 transpose E time X1 = 0. This simplified form of essential matrix allow us to compute the essential e from a few point correspondence. And once we can figure out where E is, we can then backtrack to figure out what translation notation from E. Now return to the two pictures we have seen before. Mike has Bob's pictures and Bob has Mike's picture. Furthermore, from Bob's point of view we see Mike in it with his laser being pointed object and we call those laser beams imaged into other person's field of view, the epipolar lines. But how do we form those epipolar lines from a central matrix? And how they're related to the central matrix we just introduced? So, return to the diagram we've seen before, Bob and Mike are each pointing at the same building corner. And furthermore, they are forming an array between the two person, Mike and Bob, through the vector t. This three vectors forms a plane. And this plane will physically slice through the space, in fact, will slice through the image of Mike. And as it slices through the image of Mike, it forms a line in Mike's image. As we point to a different point, now Mike and Bob decide to simultaneously point to a different building corners. That two rays plus the vector between Mike and Bob, again forms the same plane. And that plane was sliced to a different line in the image of Mike One more time, Bob and Mike decide to point that laser pointer at different building corners, and they form again a different plane. And that plane is sliced through Mike's image again, through a line. Form yet a different epipolar line. So as we see, every time Bob and Mike decide to point to different points on this building, they form a three-dimensional plane together. And that three-dimensional plane will slice through Mike's image at a different orientation is Mike's picture. And different epipolar lines are formed. What's more interesting is all the epipolar lines in my field of view converge into a single point. They all intersect at one single point. And this is because in 2D you can see all the planes that Mike and Bob can we through share a vector between Mike and Bob. That vector will penetrate through Mike's picture at a single point and that point is called epipole. So again, as we sweep through the planes, through the space. The orientation of the plane will change depends on which point in 3D we point to. But the base of this plane had to connect Bob and Mike and that base stay invariant. And that vector will intersect with Mike's image, if Mike's image is large enough. At this point, epipole. And epipole is going to be in the section of all the epipole in line that Mike sees in his field of view. Now we ready to compute the epipolar lines and epipoles numerically. We called we have a essential matrix relationship relating to x2 Mike's point of view at the same point, x1. Bob's point of view, through the binding equation, x2 transpose E times x1 equals zero. The image point of x1 in Bob's point of view is simply a scale factor of x1. Marked by lambda one. We can combine E times x1 into the second point of view, in the following derivation. E times x1 returns a vector. This vector in fact can be thought of as the homogeneous coordinates of a line. Recall the homogeneous coordinate of a line is simply the plane formed by the line and the observer himself. In this case we can see, that plane had the same surface normal defined by E times x1. The corresponding point x2 is a scale factor of lambda two times the three dimensional vector x2. And then we know x2 is perpendicular to the surface normal. Therefore, lambda two times x2 also is perpendicular to the surface normal. And therefore, this image point, x2 in the image plane of Mike passed through this line E times x1. We can do the same derivation on the left view, we call x2 transpose E, the three dimensional vector, as the homogeneous line in Bob's field of view. And this epipolar line, pass through x1, corresponding to the same point they're looking in the 3D space. Now where is the epipoles in Bob's point of view? We know that all the epipolar line Bob's point of view intersect at the same point, column one, Bob's epipole That means no matter what x2 is, no matter what Mike is pointing at, each time Mike time pointed at a different x2, it formed a different line. So x2 times E, that line must pass through the same point, x1. And therefore we have the property x2, which is arbitrary times E time to particular point called e1 equal to zero. And this is only true when E times e1 equal to zero them self. So given this we can compute epipole but solving this e square problem where E times e1 equals zero. Similarly, we ask the question where is the epipole in Mike's field of view? That we know that no matter which epipolar line in his field of view that we must have the equation e2 transpose times E times x1 for arbitrary x1 equal to zero. And this is only true when e2 transpose E equal to zero. So therefore, given the essential matrix E, we can compute the epipole in Mike's field of view by solving the equation. The lea square equation e2 transposed E equal zero. So now, we can return to the picture we saw earlier. Two friends taking picture of the same buildings. And what is showen here is set of epipolar lines corresponding to each other. The green line for the left corresponding the green line. To the right, the red corresponding to each other and so is the pink. And as you can see all the epipolar line converges to a single point, which is denoted as epipole. Again we can think of epipole is a vector forming between the optic axis, which is us, looking into the epipole. And the camera two must be on lines on that array. Next time we'll see how to compute the camera translation rotation found on this information as well as how to compute the essential matrix from image measurements directly.