Strided convolutions is another piece of the basic building block of convolutions as used in Convolutional Neural Networks. Let me show you an example. Let's say you want to convolve this seven by seven image with this three by three filter, except that instead of doing the usual way, we are going to do it with a stride of two. What that means is you take the element Y's product as usual in this upper left three by three region and then multiply and add and that gives you 91. But then instead of stepping the blue box over by one step, we are going to step over by two steps. So, we are going to make it hop over two steps like so. Notice how the upper left hand corner has gone from this start to this start, jumping over one position. And then you do the usual element Y's product and summing it turns out 100. And now we are going to do they do that again, and make the blue box jump over by two steps. You end up there, and that gives you 83. Now, when you go to the next row, you again actually take two steps instead of one step so going to move the blue box over there. Notice how we are stepping over one of the positions and then this gives you 69, and now you again step over two steps, this gives you 91, and so on, so 127. And then for the final row 44, 72, and 74. In this example, we convolve with a seven by seven matrix to this three by three matrix and we get a three by three outputs. The input and output dimensions turns out to be governed by the following formula, if you have an N by N image, they convolve with an F by F filter. And if you use padding P and stride S. In this example, S is equal to two then you end up with an output that is N plus two P minus F, and now because you're stepping S steps of the time, you step just one step of the time, you now divide by S plus one and then can apply the same thing. In our example, we have seven plus zero, minus three, divided by two S stride plus one equals let's see, that's four over two plus one equals three, which is why we wound up with this is three by three output. Now, just one last detail which is what of this fraction is not an integer? In that case, we're going to round this down so this notation denotes the flow of something. This is also called the flow of Z. It means taking Z and rounding down to the nearest integer. The way this is implemented is that you take this type of blue box multiplication only if the blue box is fully contained within the image or the image plus to the padding and if any of this blue box kind of part of it hangs outside and you just do not do that computation. Then it turns out that if that's the convention that your three by three filter, must lie entirely within your image or the image plus the padding region before there's as a corresponding output generated that's convention. Then the right thing to do to compute the output dimension is to round down in case this N plus two P minus F over S is not an integer. Just to summarize the dimensions, if you have an N by N matrix or N by N image that you convolve with an F by F matrix or F by F filter with padding P N stride S, then the output size will have this dimension. It is nice we can choose all of these numbers so that there is an integer although sometimes you don't have to do that and rounding down is just fine as well. But please feel free to work through a few examples of values of N, F, P and S on yourself to convince yourself if you want, that this formula is correct for the output size. Now, before moving on there is a technical comment I want to make about cross-correlation versus convolutions and just for the facts what you have to do to implement convolutional neural networks. If you reading different math textbook or signal processing textbook, there is one other possible inconsistency in the notation which is that, if you look at the typical math textbook, the way that the convolution is defined before doing the element Y's product and summing, there's actually one other step that you'll first take which is to convolve this six by six matrix with this three by three filter. You at first take the three by three filter and flip it on the horizontal as well as the vertical axis so this 345102 minus 197, will become, three goes here, four goes there, five goes there and then the second row becomes this, 102 minus 197. Well, this is really taking the three by three filter and narrowing it both on the vertical and horizontal axes. And then it was this flit matrix that you would then copy over here. To compute the output, you will take two times seven, plus three times two, plus seven times five and so on. I should multiply out the elements of this flit matrix in order to compute the upper left hand rows elements of the four by four output as follows. Then you take those nine numbers and shift them over by one, shift them over by one, and so on. The way we've define the convolution operation in this video is that we've skipped this narrowing operation. Technically, what we're actually doing, the operation we've been using for the last few videos is sometimes cross-correlation instead of convolution. But in the deep learning literature by convention, we just call this a convolutional operation. Just to summarize, by convention in machine learning, we usually do not bother with this skipping operation and technically, this operation is maybe better called cross-correlation but most of the deep learning literature just calls it the convolution operator. And so I'm going to use that convention in these videos as well, and if you read a lot of the machines learning literature, you'll find most people just call this the convolution operator without bothering to use these slips. It turns out that in signal processing or in certain branches of mathematics, doing the flipping in the definition of convolution causes convolution operator to enjoy this property that A convolve with B, convolve with C is equal to A convolve with B, convolve with C, and this is called associativity in mathematics. This is nice for some signal processing applications, but for deep neural networks it really doesn't matter and so omitting this double mirroring operation just simplifies the code and makes the neural networks work just as well. And by convention, most of us just call this convolution or even though the mathematicians prefer to call this cross-correlation sometimes. But this should not affect anything you have to implement in the problem exercises and should not affect your ability to read and understand the deep learning literature. You've now seen how to carry out convolutions and you've seen how to use padding as well as strides to convolutions. But so far, all we've been using is convolutions over matrices, like over a six by six matrix. In the next video, you'll see how to carry out convolutions over volumes , and this would make what you can do a convolutions sounds really much more powerful. Let's go on to the next video.