In the previous lessons, I introduced the basic I/O model to you. It consists of external memory of unlimited size, an internal memory of size M, and the data is being transferred between the external memory and the internal memory in blocks of size B. We also looked at how algorithms are analyzed in this model. Then I introduced two types of algorithms in the model namely; cache-aware algorithms which know the size of the memory M, and the block size B and cache-oblivious algorithms. We looked at examples of such algorithms for the matrix transposition problem. Finally, we looked at replacement policies. So now in this lesson, we're going to look at one of the most basic algorithms problems you can think of namely, how to sort a set of n numbers. Obviously we're going to look at this problem from the I/O perspective. Okay, so let's have a look at the sorting problem. So we're given an array with numbers and the goal is to rearrange the items in this array such that they appear in sorted order. From internal memory, I assume you all know that you can do this with n log n running time. There's lots of different algorithm for doing this for instance, MergeSort or QuickSort, or also other algorithms. We also know that this n log n running time is actually optimal in the comparison based model. So now the question we're asking ourselves is, if we want to sort numbers in external memory, how many I/Os do we need for this? What we're going to do in this lesson is we're going to look at this classic merge-sort algorithm and see what is the number of I/Os that merge sort performs. Okay. So here you see Mergesort. You have this array of size n, you split it into two subarrays of size n over two, recursively sort those sub arrays, and then merge the result back to get the whole sorted array. So let's have a look at a simple example. So here we see an array, so what we do is we split it into two subarrays, simply the first half and the second half. Then, we're going to recursively sort the first half. So this is here, then we're going to recursively sort the second half A2. Now the remaining task is to take these two sorted subarrays and somehow merge them back into the original array to get that one sorted. So the way we do it is we essentially walk over these two sub-arrays in parallel always looking which is the smaller of the two elements that we're currently looking at and write that in the correct place in our original array A. So for instance here, the first element of array A2 was the smaller of the two, so we write it to the right position in the original array. Then, we move the pointer in A2 and now we compare the other two elements, five and eight. Here we see that five is the smaller one, we write it to the correct place in A and so on. Then we advance the pointer, what is currently the smallest of the two? Well, six we write it to the right place and so on. So if we do this, then in the end the whole array is sorted. Now let's have a look at how many I/Os does this algorithm perform? this is a recursive algorithm so to analyze it we're going to write a recurrence. So here, the recurrence that we need is well, T I/O of T is the number of I/Os we perform when we sort a sub-array of size T. A subarray with T elements. What I'm going to assume is an algorithm that is may be slightly different from what we just showed namely, an algorithm that is in place. So what does it mean? It means that these arrays A1 and A2 are not going to be separate arrays that need additional memory, but they're embedded in the original array So here, you see the code if you want to do that. So now a recursive call would have the original array A, and two indices i and j that indicates which part, which subarray of the original array we now have to sort. If this part has length one, well, then we don't have to do anything otherwise, we go into recursion, essentially the same thing as before. So now let's try to analyze the number of I/Os of this particular algorithm. So, we are going to have a recurrence and the recurrence should have a base case and the general case. So I hope you remember from the analysis that we saw before for the matrix transposition problem, that the base case for an IO analysis is always when the subproblem completely fits into the internal memory because at that point, the algorithm is going to do more recursive calls. But this all happens in the internal memory so no further I/Os are needed from that point on. So, how big can the array be so that it fits into the main memory? Well, The subarray size is T so you would first think well, T should be at most M the size of the main memory. Well, that's not quite true because we also have these blocks that could be sticking out. So here if you have a subarray of size T and you read it, you get actually all the blocks covering that subarray and they could be on both sides a small part of a block sticking out. So that's the two times b minus one. That's the worst case that can happen. The worst that can be sticking out. So in this base case how many I/Os do we do? Well, it fits in the main memory. So under assumption of optimal caching what could happen is that, we read everything into the main memory then, okay, continue the algorithm by recursive sorting but that all happens in the internal memory, and then write the result back. So, the number of I/Os for this case would be M, the size of the main memory to read all that divided by B. That's the number of blocks that we're going to read in and then at the end write back. So now let's have a look at the general case. So before it fits into main memory, how many I/Os do we do? Well, here we're going to have this recurrence because what we do is, well, we have two recursive calls, on two sub-arrays of half the size. So that's that T of T over two plus, well we have to do this merge step. The merge step takes T divided by B I/Os because what we're going to do is we're going to have this A1 and A2 and scan both of them and write it to the correct position. So that's two times scanning, so that's T divided by B I/Os. So, what we see is that MergeSort of the standard MergeSort. The number of I/Os it performs, if you solve this recurrence and actually in the next lesson we're going to have a closer look at that, but then the number of I/Os is N over B times log of n over n. But the problem is this, well, it's an okay bound but it's not really what we want because the base of the logarithm is two. What we really like to have is log base B, or perhaps even better log base n divided by B. So, in the next lesson, we're going to look a little bit more closely at this recurrence, and show that it really solves through the bound I just claimed, and we're going to look at the new algorithm or actually a variant of the merge sort algorithm we just saw, which has a better I/O behavior because the log will be now base and divided by B.