Hello. In this lecture, we are going to be talking about amortized analysis and introduce the concept of potential functions. This is going to make some of the concepts we looked at in the previous video where we looked at this slightly informal accounting method and we are going to make this much more formal and much more rigorous using something called a potential function. What's a potential function? Think of this as inspiration that's taken from physics. In many physics situations, for example, suppose you have the following interesting physics situation, very interesting. You have a boulder and your goal is to roll this up a hill. That's your goal. You're rolling this boulder up a hill and moving this up a hill and let's say you've rested it here. Your goal is to get it here. The point is once you are up here, you give it a teeny tiny push and it's going to roll all the way downhill and this doesn't need work. This is a lot of work and this is no work. The point here is you're building up potential. You're raising a boulder from the ground height, let's say to some height up a hill. That's building up potential and the potential then allows you to do something. You might think of it as doing work but it's really normal. Then you can see the total work that's taken from moving from A-B here is essentially the work that's from A-C building up this potential and then the potential does a lot of work moving from C-B. It's a very nuanced way of analyzing how much work you do. Similarly, another example would be like, for example, if you have a spring and you press the spring down, then you're building up potential, or if you have a battery and you charge the battery, then you are doing some work but you're also building up potential. Now, the point here is to do the same for data structures and do amortized analysis using these potential functions where we can talk about formally, about the potential that you are building when you do computational work in a data structure and that would formalize some of the insights we got from the accounting method. What's the potential function then? It's a mathematically regress, in my opinion, they of performing amortized analysis and it's an alternative to the accounting method where you are paying upfront for operations. What's the basic idea? Very simple. The basic idea is think of D sub i as an instance of a data structure. Somehow it's a data structure instance in the middle of doing some operations and you are going to associate the function capital fee of D sub i, which is going to map this data structure to a potential value which is typically a non-negative real number. The non-negative part is not important but we will try and keep it non-negative if we can. It's going to be a non-negative real number D sub i. Now the point here is this, anytime you do an operation on the data structure, it's going to move your data structure from D sub i to a new data structure instance D sub i plus one. Let's see the C sub i was the computational cost. Then the amortized cost is defined as, the change in the potential, that delta fee, that change in the potential because you did the operation, and plus the actual cost. For example, if I'm moving a boulder uphill then the actual cost is the cost of moving the boulder, let say one unit. But then there's going to be the change in potential between moving up some height. It's going to be the change in potential plus the actual distance traveled. That's going to be the amortized cost and we'll put a hat on top of the amortized cost. That's the definition. That's what we call the amortized cost. But here's the main requirement. Let me be careful about this. If you want to sum up all the amortized cost of n operations, let's say your data structure starts off at D0 and then you do some operation whose true cost is C_1 you go to D_1, with C2 cost you go to D2 and you keep going until you go to D_n and then D_n plus one. Let's say these are the data structure instances, you get to. Now, the point is, the total amortized cost is the total change in the potential. This is the total change in the potential plus the cost all of which are under the summation. Now the point is, the change in the potential, so delta phi is going to cancel out. For example, you have phi 1 minus phi 0 plus phi 2 minus phi 1, and this is called a telescoping sum, plus phi n plus 1 minus phi n. This is this summation that changes in potential, but plus phi 1 minus phi 1 cancels, plus 2 minus phi 2 cancels. The only terms that remain are phi at the very last data structure, minus phi at the very first data structure, plus the total computational cost. Therefore, the main idea here is the total amortized cost is of n operations. So if you take any n operations, the total amortized cost is the actual cost plus the change in the potential from the final data structure minus the potential of the initial data structure. Therefore, here's the thing, if I can guarantee that for any sequence of n operations. If I can guarantee that any final data structure I get phi d sub n is greater than or equal to phi d sub-zero. For any sequence of n operations, if I start from some fixed initial data structure instance, and I apply some sequence of n operations I get to this [inaudible] data structure instance. Actually, this should phi d n plus 1, I guess. Suppose I get here, then what I am guaranteeing is that phi d plus n plus 1 is greater than equal to phi d o. This I need to guarantee and if I can guarantee this, then the change in the potential and the actual cost means that the total amortized cost will be an upper bound on the actual cost. The point is if this happens, then the summation of ci had is going to be greater than equal to summation of ci. So this is the actual cost, and this is the total amortized cost. The amortized costs are going to be an upper bound on the actual cost provided my potential can be shown so that at any data structured instance is greater than equal to the one that really happens when you start out. This may be a little confusing. Let me try this with a few examples to actually show you. These are examples we encountered in the previous lecture, so let's look at them. The idea here is this then, let's define a potential phi d for any data structure instance d will always identify an initial instance d 0, usually, the empty queue, empty stack, empty tree and we will calculate phi d0. Typically, this will be set to 0 and then for any data structure d prime, after any series of operations, you want to guarantee that phi d prime is greater than equal to 0. So the potential at any point is greater than the starting potential. This is different from saying that potential cannot decrease. The potential can decrease but the point is if I start from d0, then the potential at any point must be greater than or equal to the potential at d 0. Even though potential can decrease, it should still remain above the starting value. Potential should always remain about starting value. That's important. The potential can decrease but it has to be remaining about the starting value and the amortize cost of each operation is just the actual cost plus the change in potential. This is to summarize everything we have seen so far. Let me give you some examples. Let's start with the binary counter-example. The Analyze this binary counterexample and if you remember using the accounting method, we said anytime you increment a counter and you go from 0 to 1, I am going to prepay an extra dollar on top of my actual cost so that when I go from 1 to 0, you notice that I used this extra cost that's already been prepaid to pay for this. To pay for the flip back from 1 to 0, I used the prepaid dollar 1. If you remember from the previous lecture, that's what we did. But let's do it in a different way directly using the potential method. Here we say, the potential of any binary counter is just the number of 1 bits. It's just the number of 1 bits in the binary counter. This is almost the same as before, we're just expressing it in a different way. If you have 10110101, then the number of 1 bits is this. 1,2,3,4,5,6, so the potential is 6. The idea here is just is counting if another way of thinking about it, it's counting all the dollar ones that you have prepaid, so it's counting all the prepaid costs. Initially, your counter is all zeros, so its potential is 0. Any data structure B_i, any intermediate counter has to have pB_i is greater than equal to 0 because you can't have negative potential here, so you get pB_i is greater than p0, so any potential is about where you stop. Now what's the amortized cost of incrementing a data structure? Suppose you had k trailing ones. Suppose you had k trailing ones here. Then you're going to increment, and all of these are going to go to 0. All of these are going to go to 0, and this 0 is going to go to a 1. The change in potential, this change in potential is, you're going to have a minus k drop in potential, and a plus 1 increase in potential because these ones are going to go to a 0, that's a minus k drop and a plus 1, so it's minus k plus 1 as the change in potential. The actual cost is k plus 1, so k plus 1, so the amortized cost is, so actual cost is k plus 1, amortized cost is 1 plus 1, which is 2. That's going to be the overall amortized cost of each incremental. It's the same argument. If you think about it, it's essentially the same argument we used in the accounting method, except that now we are using something called a potential, which keeps track of the prepaid costs in a much more rigorous manner than before. Let me look again at an example we saw previously, which is our dynamic arrays. Remember, in our dynamic arrays we had the following situation where we had n allocated slots, and somehow this is the unallocated space. The idea here was you would have n by 2 elements in your allocated slot, and these n by 2 elements don't have any prepaid costs, and then you have ones that are posted by 2, which would all have two dollars each prepaid so that you can copy. The way to express that is to say that the potential for a data structure is 2k minus n, where k is the number of elements, and n is the total size of the data structure, so that's the allocated size. Remember, if I'm just inserting and I'm not deleting, then one of the things I can always prove is any data structure instances always more than half-full, because the way I get to some size, let's say 128, was that previously I had a data structure of size 64, and that got full, so I must always be more than half plus 1 full. I must be always be more than half-full plus 1 element. This potential 2k minus n is always greater than or equal to 0. The reason is you must always have k greater than n by 2 in this data structure, so k must always be greater than n by 2 in this data structure, just by the very, very operator. Remember there's no deletion, only insertion. Here's the idea. When initially the number of allocated elements, k is 0, n is 0, so the initial data structure is always 0, and any potential is always greater than equal to 0, so this is good, it's a good potential function. Now, suppose I add an extra element, then there are just two possibilities. If I resize, then you basically double n or you resize n, and then so just after resizing, n is 2k, so just after resizing n is 2k, and so pT is 0, otherwise 2k greater than n, so pT greater than 0. This is just rehashing, recreating the argument that the data structure always has to be more than half-full, so data structure always is more than half-full, by the way it's designed, it's always is more than half-full. That's all, it's reiterating the same argument. Let's get to the analysis of dynamic arrays then. How do you think about dynamic arrays? Suppose you have pT, which is 2k minus n, which is, you have an array, which has k elements already in it, and n is the total allocated size. Suppose you insert a new element and then move from T to T prime. Suppose no resizing happened, which is k less than n. In this case, the new k is k plus 1, and the new n is just n, and the computation cost of this is 1, so the amortized cost is 1 plus the potential of the new data structure T prime minus the old data structure T, and if you do the math, it's 1 plus, k prime is k plus 1, so it's 2k prime minus n prime plus 2k minus n, this is the change in potential, so this better be a minus. This is the change in potential, n prime and n are the same. This is just 2, the difference is 2, so it's 1 plus 2, which is three. On the other hand, suppose resizing happens, then k prime, the new k is k plus 1. Here's what resizing means; k was equal to n, therefore, the new k is going to be k plus 1, one element came in, one extra element inserted. The new n is 2n because you double the size of the allocated space, so the new n prime is 2n, and the cost is n plus 1 or k plus 1 because you have to copy all these key elements here. You have to copy them over and then there's an extra one element, so n plus 1. If you do the math, the amortized cost is the actual computation cost plus change in potential; actual computation cost is n plus 1, change in potential is this minus this, which comes out to be 3. It's the same argument we came up with for the accounting method, it's just that we are using a potential to make it more formal. Just to give you an example that we couldn't have done with the accounting method, let us talk about dynamic tables now with deletion as well as insertions. The table has some allocated space n and some number of elements k that have been allocated. When the whole thing gets full, not only do we expand and allocate a new one with 2n and copy over, suppose we also do the following; suppose if the table gets too empty, let's say, if the table gets less than half full, so this is k, this is n. Suppose k is less than n by 2. Let's shrink it. Let's basically take this and shrink it to half its size. Let's go back and shrink it to half its size. What if we also allowed shrinking in addition to expanding? What happens to this table? The first thing we will note is that the amortized cost is going to be really bad. Why is that? Let me give you an example that illustrates a very bad case. Think of a data structure that just got full. Now, just after it got full, you're going to reallocate this data structure and you're going to copy over the elements, and here's the new element that got inserted, so this is 2n and this is k plus 1. The problem is if I took this new element and I just deleted it, then it's going to go and get less than half full, it's going to go back here. If I just did deletion of an element that I just inserted, I have to reallocate. I have to reallocate and copy everything over again. Once I did a deletion, suppose I did an insertion, then I have to expand to twice, and then I have to do this again. If I keep doing insertion-deletion, insertion-deletion this way, then I'm going to tickle this halfway boundary and I'm going to do reallocation on allocation, and what will happen is the amortized cost of n operations will become O(n) squared, and in this case, theta n squared because that's going to be bad. This is bad. I don't want to keep inserting and deleting over and over again. That's going to cause re-allocation and de-allocation, so expanding and shrinking over and over again, and that's going to be bad. How do I deal with this problem? The way to deal with this problem is something like this. Suppose I just expanded, you see, the next time I expand previously, I had to fill in all of these elements before the next expansion. But for contraction, I don't have to do anything. I just have to delete one element to contract again. What I will do is something like this; instead of contracting when the table is half full, I'm only going to contract when the table is less than one-fourth full. I'm going to allow you to keep some elements so you will have to delete all of these elements to contract. I'm going to contract only if the table is less than one-fourth full, and then to avoid a repeat of the same situation when I contract, I contract to a data structure where it's one-fourth full or it's half-full, and then you still have half empty. I'm going to contract to a data structure of half the size when it's one-fourth full. Contract to half the size when one-fourth full. That's going to avoid this continuous expansion and contraction. Why is that? Let me give you an example. Suppose my data structure currently has 100 elements in it, and its total size is 200. Now, suppose I delete 50 elements and then I delete one more, then it's going to become less than 1/4 full. What I'm going to do is I'm going to take the 24 elements and I'm still going to put them in a data structure of size 50 or in this case 48. The point here is, I'm still going to have these. If you need to expand this data structure, you have to insert 24 more to expand, or you have to delete 12 to contract. You can't immediately expand or contract, you still have to either insert 24 more or delete 12 to expand or contract, so that gives you some [inaudible] to make the amortize cost manageable. You're not going through expansion and contraction. If you have studied some physics, this phenomenon is called hysteresis. I'm going to keep some hysteresis between expanding and contracting. You have to do some extra work before you can contract again if you just expand it. The idea here is we can do the following, so let me summarize. If k equals n, if the size of the allocated number of elements is equal to the allocation, then when we try to insert, so this on insertion, we're going to allocate a 2n size table and copy elements over. But if k is less than n/4, so if we get less than 1/4 of the elements of the allocated size, then we allocate n/2 size table and copy elements over. We are going to do that, n/2 size elements. We're going to make the potential function like this, 2k minus n, if k greater than 1/2 or n/2 minus k, if k less than n/2. This is going to be the new potential function. It's actually much easier visualized on a graph, so I've did that. Here's on a graph. Here's k going from zero elements to n elements and really it's going to be up to n/4. You're never going to go here because I'm going to shrink. If I try to go beyond this, because this is full I'm going to reallocate. This is the limit of where I would go. Then the potential function looks like this. It's going to be for k less than n/2, it's in this blue potential function. This is n/2 minus k, and otherwise, it's 2k minus n as before. The cost of an insertion is the same as before. When you insert, the change in potential is going to be plus 2 for the change in potential and the actual cost is plus 1. If you hit the reallocation, it's exactly the same analysis as before it's still going to be three. Of course, when you actually insert here, your cost of insertion is zero, because your potential change is minus 1 and the actual cost is one that cancels so it's zero. On the other hand, what about deletion? Well, when you delete and no resizing happens. When you are in this blue zone, when you are deleting a no resizing happens, well, the change in potential is plus 1 and the actual cost of the operation is plus 1, so the amortized cost is at most two. On the other hand, when you delete and resizing does happen, what happens then? Well, if resizing does happen, let's do the math. You are going to go from this case where you're here to a new table, where the new value of k is n/4 minus 1 because you've deleted one and the new value of one is n/2. If you compare the potentials, the change in the potential is 1 minus n/4. The computation cost is the cost of copying over n/4 elements, so that's this, and so the amortized cost is this two. Once again, you're amortize cost of n operations becomes 2n, and these operations can now we can insert and delete. But the important thing is this, you have to wait for sometime before you delete. The moment you expand, you can not immediately trigger a deletion. To do that I put a gap. I only delete if my data structure is less than 1/4 full. When I delete, I go to a new data structure which is 1/2 full. With these two changes, I can make sure that my amortize cost still remains a constant times n. I hope this was clear how we can build dynamic tables that boat shrink and expand with amortized analysis. Thank you and I will see you in the next lecture.