Date: Thu, 16 Nov 2000 16:02:27 -0500
Reply-To: Howard Schreier <Howard_Schreier@ITA.DOC.GOV>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Howard Schreier <Howard_Schreier@ITA.DOC.GOV>
Subject: Re: hashing redux
But to generalize this check-sorting "program", you have to pre-allocate
space for 1000 piles (000, 001, 002, ... 999).
A better way is to have ten piles, and drop the checks according to the
*last* digit.
A real-life example: years ago, I ordered something from the Sears catalog
but arranged to pick it up at the catalog desk at a Sears store. The clerk
there asked me the last two digits of my home phone number. They had a rack
with a hundred folders, and the paperwork for my order was in folder number
20 since my phone number ends with "20". All the folders seemed to be
holding about the same number of orders.
On Thu, 16 Nov 2000 11:51:03 -0500, Fehd, Ronald J. <rjf2@CDC.GOV> wrote:
>> From: Frank Schiffel [mailto:SchifF@mail.health.state.mo.us]
>> ... for the life of me I still don't understand hashing.
>
>LOL well I'm not going to try to out-explain Master Dorfman
>
>but if you consider having a bundle of your checks come with your bank
>statement
>and you want to sort them into ascending check number order:
>
>what I used to do was put them in piles by the leading three digits of the
>check number:
>pile 123 contained check numbers 1230, 1231, 1232, 1233 ... 1239
>pile 124 contained check numbers 1240, 1241, 1242, 1243 ... 1249
>etc
>and then sorting ten, or less, within each pile was no big deal.
>
>in this instance the key is the check number
>each pile would be a bucket, or bin;
>it's name/address -- array index -- is a function of the key/check number:
>int(checkNmbr/10)
>and the size of the bin is 10
>this is the Computer Science terminology.
>
>hashing is an algorithm that determines the number, names/addresses and
>sizes of bins used for later searching. This concept is crucial to an
>understanding of binary searches.
>The advantage is that instead of having to do any kind of sequential search
>over large amounts of unknown keys
>you use your key to get to a place where the search has only a limited
>number of items to review: the 'size' -- number of items -- of the bin.
>
>novices start out learning the bubble sort which has to sequentially search
>it's list an interminable number of times,
>the terminology is big O, meaning 'on the Order of'
>bubble sort runs in O((N-1)^2) [from memory, corex welcomed]
>-- where is Donald Knuth's book on my library shelf when I need him?
>http://www-cs-faculty.stanford.edu/~knuth/
>
>anyway, using hashing algorithms can reduce run time to closer to O(N)
>
>Ron Fehd the macro maven CDC Atlanta GA USA RJF2@cdc.gov
>If you always try to be logical,
>you probably won't ever have much sorrow,
>or much fun.
>-- Ashleigh Brilliant pot-shot #4438
|