Another popular closure resolution method is known as linear probing. In this you know many different versions of hashing that are based on this idea. With linear probing is called open addressing and is also around the same time in the 50's the idea is just use an array. Instead of using space for the length in a length list. I use that same space, and just, allocate an array. In this case, the size of the array is going to have to be bigger than the number of keys that we [inaudible] expect. And we use the empty slots in the array. To. Essentially terminate the length of the [inaudible] list that we have to search through when we're doing a insertion. So let's look at a demo of how it looks. So to hash again we do the same thing, we just map the key to a index. But, in linear probing, to insert what we do is when we put it in position I if that's free, if not we just look at I plus one, and I plus two, and wrap around to the beginning if we reach the end. Now that's also simple to implement and it works well as long the size of the array is, significantly bigger than the number of keys. Let's look at, Well it's a demo. So we start with an empty table, insert s, it's hash value is six, six is empty so we put it there. Now we look at e, hash of e is ten, we look at ten, it's empty so we put e there. So at the beginning we're going to be fine. A is four, empty, put it there. R is fourteen, empty, put it there. So we just essentially, using the hash funtion as an array index. C is five, that's empty and we put it there. So H now, the hash value of H is four. So now we look at four, and that's occupied, so we can't put the H there. And then linear probing says, just look at the next position, look at five. That's still not empty. So we look at six. And we keep going till we find an empty place, and then we put H there. Now when we search, we're going to have to do the same thing. We'r e going to have to look at all those positions to look at H. The. Group of four key, continuous keys in a table space there is called a cluster and clearly we want to keep those clusters small. And we do that by juts by not putting too many keys in to the table. So X hashes to fifteen, that's empty so we put it there, M hashes to one, that's empty and we put it there. P hashes to fourteen, 14's occupied, 15's also occupied, now we run off the end of the table, and look at zero, and that's empty so we put it there. L hashes to six. Six is occupied. We look at seven, seven is occupied. We look at eight, and we put it there. And, so that's an example of inserting, keys into a hash table. And now, for a search, we just do the same thing. We, use the hash function. To search for E, E's hash value is ten so we look in ten and there it is. So that's a search hit. If we're going to search for say L L's hatch value is six so it's not there. So in order to look at every place in the table where L could be, we have to keep looking til we found an empty table position, or we find L itself. So now we look at seven L not there, we look at eight L is there, that's a search hit. If we have a value that's not in the table like K, well hash and is in position five, no, six no, seven no, eight no and we find an empty position at that point we can conclude that K is not in the table. Because if K were in the table it would be somewhere between it's hash point five and that empty position nine. That's a search miss, and we return all. So that's a short demo of linear probing hashing. So here's a summary of linear probing, hashing. To. To get started we map a key to a integer between zero and m-1 where m is the sides of our array where we are storing the keys. To insert we put the key value pair. Use parallel arrays [inaudible] and the value array with the same index. We put the entry at the table index A for three. If not try I+1 I+2 until getting to a empty position. And for search you do the same thing you hash to the table position and you look there into the right. To find the key and you stop when you find an empty table position. Find the key or find an empty table position. Now, it's essential that the array size is greater than the number of key value pairs N. And for linear probing hashing, really, the implementation needs to include array resizing, whenever the hash table gets too full. Here's the implementation. And it's, quite straightforward, given the demo that we talked about. You use the same hash function. And we use parallel arrays for the value in the keys. And we have to use ugly cast, 'cause we can't have a race of generics. Then let's do the search. So. We just have a for loop starting at hash of key and going until we get to a position that's null. As long as it's not null, we stay in the loop and increment I mod m. So that's when I gets to the end it gets to the end, it's in the position m minus one and it goes... In the next increment goes back to zero at the left end of the table and we just test for all the non null keys. If it's equal, if it is, go ahead and return the associated value and if you get to an empty position, then return null. And the implementation of put is similar. Find a, a position, if it's, that's equal, And then, reset the key, in the value. If the key's there, it just resets the value. If they key's not there, it puts a new entry in. So again, that's, fairly little code, to implement, a fast symbol table and insert, search and insert. But it's only going to be fast, if the, table size is set appropriately. In ancient times, memory was, at quite a premium and so people were very concerned in m-m-making sure that the hash table never, got too empty. Remember in the first computers, each bit was a physical thing, a magnetic core that somebody had to string a wire through, so. The bits were really expensive, and people wanted to make sure, that they were making best use of the memory. And just leaving empty positions around, in a hash table, or using links in a link list, did not seem like an appropriate use of space. And, so there was quite a bit of effort, devoted to figuring it out, how full we could get the hash table, in linear probing. And how close it could get to full without sacrificing performance. And one way to think about what goes on is to just watch what happens when a hash table fills up. So here we just as, as it goes up we're showing each key getting inserted in the number of probes of the table that are needed for the insertions are J hash to the same position that A; you had to look for a while, and the one thing to notice is as the table gets full, is that first of all. You have, these clusters or these chains building. And so, what's clear about that is that, it means that, the new hash is likely to hash into a big cluster. >> And not only that once you have a big cluster and you hash into the middle of it you've got a good chance that, that clusters going to get longer, or worse. That's it's even going to merge with another big cluster. And so, that's the situation as the table fills up. You get long clusters and they're likely to get longer. And the math bares that out. Now this was studied in detail by Knauf, Don Knauf, in the 1960's and actually this problem, Knauf says, was the origin of the origin of analysis of algorithms. Mathematicians were trying hard to understand this problem and were ready to give up and he realized you could use classical balls and bins type probabilistic analysis. Not an easy analysis, but we actually could make precise accurate statements about the performance of this algorithm. And those statements can be borne out in practice, because the hash functions approximate random, the math assumes random and the formulas predict what actually happened in practice. No way can you formulate the problem as so called parking problem. So, what happens is that you are on a one way street and you are looking for a parking place and, it's, the idea's you start looking for a parking place at particular times and say "Okay, now I need a parking place", and what you're doing is linear probing hashing. If the current space is taken, you try the next space and the one after and so forth. And the question is. If every car. Starts looking for a place at a random time. That. Then that models the hash function, then how far do they have to go to look for a place? That's canoot's parking problem. And he was able to show, and we'll talk just a little bit about this, that if, there's, only half of the parking spaces are occupied, then, on average, half the people find, find it after one place and the other half have to look one extra. So that's the kind of performance that we want. But as it gets full. The displacement gets up to square root, of pi M over eight. Which is obviously much higher than we want. We don't want our searches to take that long. And that actually, the analysis, is amazing function that goes back to famous Roman Nuygen and other classical results from our commentorial analysis. What Canute's theorem says is that under the uniform hashing assumption, the number of probes in the linear hash table size M, that is alpha percent full, so the number of keys is a fraction of M, is for a search miss half one plus one over alpha, and a search miss one plus one over one minus alpha squared. One myse alpha is for the hit, one myse alpha for the squared for the insert. Now as alpha gets close to one, you can see these things are going to grow, and particularly the search miss is growing to grow quite, quite a bit. If it's 9/10's full one over one minus alpha squared is 100 one over 100, so it means it's going to be 50 p robes for a search miss if it's 9/10's full, and that's independent of N and M, whereas if it's half full then we get the nice. Numbers of only 3-house for a hit, and only 5-house for a miss. And, again these formulas are nice approximate formulas, but Knuth, once he figured this out, in 1963, tells stories, that time, he decided to write his famous series of books on algorithms. Now there's four volumes out and more planned, and this is where, all computer scientists go. For detailed information on a performance, eval grievance. So, in, in summary. You can't have M too large, what we want to use nowadays is array resizing to make sure that the array is always about half time, half full. And if we can keep the array about half full then we get constant time performance for search hit and search miss. And linear probing is very attractive in this case. There's other things that we can do algorithmically to bring down the search time a little bit. Like using another hatch function rather than looking at the next entry. Use another hatch function to determine the stride that we're going to use. And that brings it down somewhat and allows us to keep the tables more full. But the bottom line is that now we have two methods that under the uniform hashing assumption can give us constant time, search, search it insert and delete. For symbol table implementations where we don't need ordering. And we've got a reasonable hash function. So, that's a summary of linear probing or second hash, collision avoidance strategy.