
Algorithms Algorithms and Data Structures  Analysis, Graph, Search, String, Sorting, Merge, Compression, Optimization, Quantum 
 LinkBack  Thread Tools  Display Modes 
December 7th, 2007, 09:55 PM  #1 
Joined: Dec 2007 Posts: 232  So convince me...
Why are hash tables considered to take O(1) time* to search? This doesn't seem right to me. Any given hash can contain only a given number of values before there will be collisions, and with enough collisions hash table performance degrades rapidly, to something like O(n) if implemented naively. This is true even for nested hash tables, though more slowly. And in a recursive data structure in which hash tables can be added in an unlimited depth, lookups take time proportional to the depth of the hash tables  which is going to be something like O(log n). So why does every text and every paper I've seen say they search in constant time? Am I missing something? * Well, not strictly O(1) time, but average O(1) or amortized O(1). 
My Computer Forum is free to register and we welcome everyone! 
December 8th, 2007, 06:29 AM  #2 
Joined: Dec 2007 Posts: 24  Re: So convince me...
Well, I've programmed hash tables before, and I think the reasoning goes like this: In order to file a given amount of material (say 'n' things) you need at least n slots in the hash table. Assuming that that requirement is fulfilled, then you know it takes n sorting actions if the hash algorithm is good (Which is O(1), of course). However, there will probably be collisions, so the algorithm has to pick a new slot in case of a collision. This will take it a certain number of steps depending upon how many elements are already in the hash table. However, we know that we will have to do at most 'n' steps to find a new hash slot. So, I would say that hashing is 0(1) but resolving collisions is o(n) or better. However, if we take things to the typical infinite extreme that big O notation uses, let's assume that we have a hash table of infinite size. That would mean we would be guaranteed to have no collisions for a finite number of elements, so we would know that we had a O(1) hash table. Thus, I would say that a finite number of elements in a huge hash table is O(1). However, it could change if the table is smaller and there are collisions. On the other hand, we could implement another hashing trick and use an array of stacks, which eliminates the collision problem entirely, although more than one element might be in each hash slot.

December 8th, 2007, 11:22 AM  #3  
Joined: Dec 2007 Posts: 138  Re: So convince me... Quote:
 
December 8th, 2007, 02:35 PM  #4  
Joined: Dec 2007 Posts: 232  Re: So convince me... Quote:
Quote:
 
December 8th, 2007, 02:39 PM  #5  
Joined: Dec 2007 Posts: 232  Re: So convince me... Quote:
 
December 9th, 2007, 06:33 AM  #6 
Joined: Dec 2007 Posts: 24  Re: So convince me...
All in all, I still prefer the 'array of stacks' method. That way, all you have to do is sort through a few elements when you want to retrieve something, and hashing is O(1) guaranteed.

December 9th, 2007, 07:54 AM  #7  
Joined: Dec 2007 Posts: 138  Re: So convince me... Quote:
But, if it's actually an array of stacks unhashing the thing at the bottom is rather difficult, no?  
December 9th, 2007, 12:11 PM  #8  
Joined: Dec 2007 Posts: 232  Re: So convince me... Quote:
 
December 9th, 2007, 03:32 PM  #9 
Joined: Dec 2007 Posts: 24  Re: So convince me...
Well, you would make an array, say of 100 elements, 099. Each of these indexes is the location of an object in the form of a stack. Upon hashing your information, there can never be a collision, because if two objects are sent to the same index by the hashing algorithm, they are simply pushed onto the same stack. Getting your data back might not be O(1), but it will be very close to that for the most part. Pseudo code: Choose_hash_index_for_information(object information) { return hash_index } main method() { Stack[] stack_array = new Stack[100] for(n=0; n<100; n++) //Initialize stack. { stack_array[n] = new Stack(); } Object information = new object(); stack_array[Choose_hash_index_for_information(information)].push(information); information = new object(); stack_array[Choose_hash_index_for_information(information)].push(information); } 
December 9th, 2007, 06:11 PM  #10  
Joined: Dec 2007 Posts: 232  Re: So convince me... Quote:
 
December 10th, 2007, 12:26 PM  #11  
Joined: Dec 2007 Posts: 24  Re: So convince me... Quote:
 
December 10th, 2007, 02:47 PM  #12  
Joined: Dec 2007 Posts: 232  Re: So convince me...
Of course hashing to a constantsized hash table is O(1); doing anything constantsized should be O(1). But checking if an element is in a hashtable appears to take more than constant time for large numbers of elements, which certainly contradicts conventional wisdom. A random Googling provides: Wikipedia: Quote:
Quote:
Quote:
 
December 10th, 2007, 04:37 PM  #13 
Joined: Dec 2007 Posts: 24  Re: So convince me...
What about this: If we define the hash table size to be based on the number of elements therein, then we get a O(1) search result. If we hash, say, 5n elements into a hash table with n slots, then there will be 5n/n = 5 elements on average in each slot = O(1) to search that constant number. There could be, of course, more or less that 5 elements, but that is were the "on average" runtime qualification comes from. Only with a finite hash table defined as fixed in size, no matter how many elements are put in, could you ever have a search runtime of O(n). But why would anyone want to make such a hash table? You always attempt to size the hash table to the number of elements you think will go in it, right? So thus I would say that we have O(1) search time, on average, assuming we are rational computer programmers who try to make a reasonablysized hash table for the job it is being used for. (If we were irrational fools, or course, we could conceivably make a hash table with a horrible runtime, :mrgreen: but I'm not sure why we would want to do that apart from an obfuscation contest or if we were saboteurs :shock.

December 11th, 2007, 05:07 AM  #14 
Joined: Dec 2007 Posts: 232  Re: So convince me...
So say you increase the size of the hashtable as you go along, amortizing the transition by moving just 23 elements at a time from the small, nolongersufficient hash table to increasingly large ones. How does this have constanttime lookup? When the table is 1 kB the hash function produces ~10 bits, when 1 MB it makes ~20 bits, and as n goes to infinity the hash function's size also goes to infinity. Eventually you will have a hash function producing a gigabytes of data. You can't produce that in constant time because the hash function size is no longer constant. If at any point you want to make the hash function bounded in size, then beyond a certain point its performance degrades to O(n). If the size is unbounded then calculating the hash function takes unbounded time; in particular, if you want to keep the table between 1% and 70% full, the hash function should take at least logarithmic time.

November 19th, 2009, 11:50 PM  #15 
Joined: Nov 2009 Posts: 5  Re: So convince me...
So, I would say that hashing is 0(1) but resolving collisions is o(n) or better. However, if we take things to the typical infinite extreme that big O notation uses, let's assume that we have a hash table of infinite size. __________________________________________________ _____________ Technology expert and member of youserbase, the technology wiki 