My Computer Forum Computer Science Forum

Go Back   My Computer Forum > Computer Science Forum > Algorithms

Algorithms Algorithms and Data Structures - Analysis, Graph, Search, String, Sorting, Merge, Compression, Optimization, Quantum


Reply
 
LinkBack Thread Tools Display Modes
December 7th, 2007, 10:55 PM   #1
 
Joined: Dec 2007
Posts: 232
So convince me...

Why are hash tables considered to take O(1) time* to search? This doesn't seem right to me. Any given hash can contain only a given number of values before there will be collisions, and with enough collisions hash table performance degrades rapidly, to something like O(n) if implemented naively. This is true even for nested hash tables, though more slowly. And in a recursive data structure in which hash tables can be added in an unlimited depth, lookups take time proportional to the depth of the hash tables -- which is going to be something like O(log n).

So why does every text and every paper I've seen say they search in constant time? Am I missing something?

* Well, not strictly O(1) time, but average O(1) or amortized O(1).
CRGreathouse is offline  
 

My Computer Forum is free to register and we welcome everyone!

December 8th, 2007, 07:29 AM   #2
 
Joined: Dec 2007
Posts: 24
Re: So convince me...

Well, I've programmed hash tables before, and I think the reasoning goes like this: In order to file a given amount of material (say 'n' things) you need at least n slots in the hash table. Assuming that that requirement is fulfilled, then you know it takes n sorting actions if the hash algorithm is good (Which is O(1), of course). However, there will probably be collisions, so the algorithm has to pick a new slot in case of a collision. This will take it a certain number of steps depending upon how many elements are already in the hash table. However, we know that we will have to do at most 'n' steps to find a new hash slot. So, I would say that hashing is 0(1) but resolving collisions is o(n) or better. However, if we take things to the typical infinite extreme that big O notation uses, let's assume that we have a hash table of infinite size. That would mean we would be guaranteed to have no collisions for a finite number of elements, so we would know that we had a O(1) hash table. Thus, I would say that a finite number of elements in a huge hash table is O(1). However, it could change if the table is smaller and there are collisions. On the other hand, we could implement another hashing trick and use an array of stacks, which eliminates the collision problem entirely, although more than one element might be in each hash slot.
Infinity is offline  
December 8th, 2007, 12:22 PM   #3
 
Joined: Dec 2007
Posts: 138
Re: So convince me...

Quote:
Originally Posted by Infinity
However, if we take things to the typical infinite extreme that big O notation uses, let's assume that we have a hash table of infinite size. That would mean we would be guaranteed to have no collisions for a finite number of elements, so we would know that we had a O(1) hash table. Thus, I would say that a finite number of elements in a huge hash table is O(1). However, it could change if the table is smaller and there are collisions. On the other hand, we could implement another hashing trick and use an array of stacks, which eliminates the collision problem entirely, although more than one element might be in each hash slot.
I'll add to this that a hash table diminishes rather quickly to O(n) if it is more than 70-80% filled... A professor recently suggested to resize (and re-hash) a hash table at around 70%. This will take time, but will save quite a bit of time in the future.
cknapp is offline  
December 8th, 2007, 03:35 PM   #4
 
Joined: Dec 2007
Posts: 232
Re: So convince me...

Quote:
Originally Posted by Infinity
So, I would say that hashing is 0(1) but resolving collisions is o(n) or better.
Sure, hashing with a fixed-size hash table is O(1). Resolving collisions should be O(n) with a fixed hash table size, though -- if there are k spaces in the hash table, then you need to do a linear search on n/k items, and O(n/k + C) = O(n/k) = O(n). It's not o(n), in other words.

Quote:
Originally Posted by Infinity
However, if we take things to the typical infinite extreme that big O notation uses, let's assume that we have a hash table of infinite size.
But no, an infinite hash table takes an infinite amount of time to produce a hash for, so no item is ever found. The algorithm fails -- this is worse than O(blah) for any blah.
CRGreathouse is offline  
December 8th, 2007, 03:39 PM   #5
 
Joined: Dec 2007
Posts: 232
Re: So convince me...

Quote:
Originally Posted by cknapp
I'll add to this that a hash table diminishes rather quickly to O(n) if it is more than 70-80% filled... A professor recently suggested to resize (and re-hash) a hash table at around 70%. This will take time, but will save quite a bit of time in the future.
Yep. But the new table will be longer, so hashing to it will take slightly longer. If the table was resized automatically as items were added (even if the re-hashing was free) the performance degrades to O(log n) asymptotically, no? Of course the re-hashing can be effectively free -- constant amortized time -- if a new hash table is built before it's needed and every time a new element is added one is removed from the old hash table and added to the new. More can be moved at a time if desired.
CRGreathouse is offline  
December 9th, 2007, 07:33 AM   #6
 
Joined: Dec 2007
Posts: 24
Re: So convince me...

All in all, I still prefer the 'array of stacks' method. That way, all you have to do is sort through a few elements when you want to retrieve something, and hashing is O(1) guaranteed.
Infinity is offline  
December 9th, 2007, 08:54 AM   #7
 
Joined: Dec 2007
Posts: 138
Re: So convince me...

Quote:
Originally Posted by Infinity
All in all, I still prefer the 'array of stacks' method. That way, all you have to do is sort through a few elements when you want to retrieve something, and hashing is O(1) guaranteed.
And you don't need to flag emptied cells.

But, if it's actually an array of stacks unhashing the thing at the bottom is rather difficult, no?
cknapp is offline  
December 9th, 2007, 01:11 PM   #8
 
Joined: Dec 2007
Posts: 232
Re: So convince me...

Quote:
Originally Posted by Infinity
All in all, I still prefer the 'array of stacks' method. That way, all you have to do is sort through a few elements when you want to retrieve something, and hashing is O(1) guaranteed.
So how long does it take to check if an item is in the collection? My essential claim is that hash tables are not truly O(1), and I'd be surprised if your method was O(1). I'm not quite sure how your array of stacks works -- could you explain precisely what you mean? What is the index to the array?
CRGreathouse is offline  
December 9th, 2007, 04:32 PM   #9
 
Joined: Dec 2007
Posts: 24
Re: So convince me...

Well, you would make an array, say of 100 elements, 0-99. Each of these indexes is the location of an object in the form of a stack. Upon hashing your information, there can never be a collision, because if two objects are sent to the same index by the hashing algorithm, they are simply pushed onto the same stack. Getting your data back might not be O(1), but it will be very close to that for the most part.

Pseudo code:

Choose_hash_index_for_information(object information)
{
return hash_index
}

main method()
{
Stack[] stack_array = new Stack[100]
for(n=0; n<100; n++) //Initialize stack.
{
stack_array[n] = new Stack();
}

Object information = new object();
stack_array[Choose_hash_index_for_information(information)].push(information);

information = new object();
stack_array[Choose_hash_index_for_information(information)].push(information);
}
Infinity is offline  
December 9th, 2007, 07:11 PM   #10
 
Joined: Dec 2007
Posts: 232
Re: So convince me...

Quote:
Originally Posted by Infinity
Well, you would make an array, say of 100 elements, 0-99. Each of these indexes is the location of an object in the form of a stack. Upon hashing your information, there can never be a collision, because if two objects are sent to the same index by the hashing algorithm, they are simply pushed onto the same stack. Getting your data back might not be O(1), but it will be very close to that for the most part.
This is what I meant when I talked about hashtables. I assume that when two objects hash to the same value they are stored in a list that is linearly searched (i.e. new items are added at the bottom). When the number of objects is small compared to the number of hash values the expected time is close to the time it takes to calculate a hash; as the number of objects increases beyond that point performance degrades to O(n).
CRGreathouse is offline  
December 10th, 2007, 01:26 PM   #11
 
Joined: Dec 2007
Posts: 24
Re: So convince me...

Quote:
I assume that when two objects hash to the same value they are stored in a list that is linearly searched (i.e. new items are added at the bottom). When the number of objects is small compared to the number of hash values the expected time is close to the time it takes to calculate a hash; as the number of objects increases beyond that point performance degrades to O(n).
But that still means that hashing is O(1), doesn't it? Unhashing could possibly be more like O(n), though, depending upon what the situation was.
Infinity is offline  
December 10th, 2007, 03:47 PM   #12
 
Joined: Dec 2007
Posts: 232
Re: So convince me...

Of course hashing to a constant-sized hash table is O(1); doing anything constant-sized should be O(1). But checking if an element is in a hashtable appears to take more than constant time for large numbers of elements, which certainly contradicts conventional wisdom. A random Googling provides:

Wikipedia:
Quote:
Like arrays, hash tables provide constant-time O(1) lookup on average, regardless of the number of items in the table.
SparkNotes:
Quote:
We now present a new data structure, called a hash table, that will increase our efficiency to O(1), or constant time.
Hashtable Library:
Quote:
Hash tables, sometimes also known as associative arrays, or scatter tables, are a data structure that offers fast lookup, insertion, and deletion of (key, value) pairs. In a well-designed implementation of hash tables, all of these operations have a time complexity of O(1), rather than the O(n) that linked lists, association lists, and vectors would require for some of them.
So there's surely a reason that people say they're O(1). Why?
CRGreathouse is offline  
December 10th, 2007, 05:37 PM   #13
 
Joined: Dec 2007
Posts: 24
Re: So convince me...

What about this: If we define the hash table size to be based on the number of elements therein, then we get a O(1) search result. If we hash, say, 5n elements into a hash table with n slots, then there will be 5n/n = 5 elements on average in each slot = O(1) to search that constant number. There could be, of course, more or less that 5 elements, but that is were the "on average" runtime qualification comes from. Only with a finite hash table defined as fixed in size, no matter how many elements are put in, could you ever have a search runtime of O(n). But why would anyone want to make such a hash table? You always attempt to size the hash table to the number of elements you think will go in it, right? So thus I would say that we have O(1) search time, on average, assuming we are rational computer programmers who try to make a reasonably-sized hash table for the job it is being used for. (If we were irrational fools, or course, we could conceivably make a hash table with a horrible runtime, :mrgreen: but I'm not sure why we would want to do that apart from an obfuscation contest or if we were saboteurs :shock.
Infinity is offline  
December 11th, 2007, 06:07 AM   #14
 
Joined: Dec 2007
Posts: 232
Re: So convince me...

So say you increase the size of the hashtable as you go along, amortizing the transition by moving just 2-3 elements at a time from the small, no-longer-sufficient hash table to increasingly large ones. How does this have constant-time lookup? When the table is 1 kB the hash function produces ~10 bits, when 1 MB it makes ~20 bits, and as n goes to infinity the hash function's size also goes to infinity. Eventually you will have a hash function producing a gigabytes of data. You can't produce that in constant time because the hash function size is no longer constant. If at any point you want to make the hash function bounded in size, then beyond a certain point its performance degrades to O(n). If the size is unbounded then calculating the hash function takes unbounded time; in particular, if you want to keep the table between 1% and 70% full, the hash function should take at least logarithmic time.
CRGreathouse is offline  
November 20th, 2009, 12:50 AM   #15
 
Joined: Nov 2009
Posts: 5
Re: So convince me...

So, I would say that hashing is 0(1) but resolving collisions is o(n) or better. However, if we take things to the typical infinite extreme that big O notation uses, let's assume that we have a hash table of infinite size.

__________________________________________________ _____________



Technology expert and member of youserbase, the technology wiki
louise071 is offline  
Reply

  My Computer Forum > Computer Science Forum > Algorithms

Tags
convince



Thread Tools
Display Modes






Copyright © 2018 My Computer Forum Forum. All rights reserved.