LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous (more recent) messageNext (less recent) messagePrevious (more recent) in topicNext (less recent) in topicPrevious (more recent) by same authorNext (less recent) by same authorPrevious page (December 2000, week 5)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Fri, 29 Dec 2000 13:09:45 -0800
Reply-To:   kmself@IX.NETCOM.COM
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   kmself@IX.NETCOM.COM
Subject:   Re: sorting efficiency
In-Reply-To:   <F201niFKULPEhGD1XyC0000adf6@hotmail.com>; from paul_dorfman@HOTMAIL.COM on Thu, Dec 28, 2000 at 04:55:13PM -0000
Content-Type:   multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature";

on Thu, Dec 28, 2000 at 04:55:13PM -0000, Paul Dorfman (paul_dorfman@HOTMAIL.COM) wrote: > From: kmself@IX.NETCOM.COM > > >In the case of 5m x 52 variables, hashing methods are likely out of >the > >question, but if there's a smaller table or fewer fields, it >should be > >possible to construct a 5m x 2 element hash table in about >72 MB, if I'm > >doing my math right. I was squeezing 40m elements >into roughly 5 GB on a > >Sun Ultra4 a couple years back. Amazing what >a little memory can buy you > >in process time. > > Karsten, > > You might recall that back circa 1998 we discussed the possibility of > solving this problem by using a hash table comprising 2 parallel > arrays to store just the keys and record pointers, so that later at > the time of search (long) satellites could be retrieved by means of > POINT= access. I tested the idea then and concluded that performance > was unsatisfactory, perhaps due to the cross-pagination. Since, I have > returned to the idea and re-tested it. I do not know what I got wrong > the first time around in 1998, but after the recent subsequent tests I > was amazed how efficiently it worked.

Not sure what you mean by "cross-pagination", but IIRC the issue was the SAS read-ahead buffer, which was restricted to some pagesize on the order of 4-8K, meaning that in a large dataset with a large number of randomly-accessed reads, you were reading large chunks of data only to discard them virtually immediately on reading another large chunk. Read-ahead is great when you're doing long, sequential, scans. They suck massively when you're doing hunt-and-peck, in which case you want to *minimize* them.

Not sure what's changed, possibilities include SI optimizing the DATA step (or a particular read buffer) for such access by identifying steps in which POINT= is the primary (or only) data access method, from increased memory (hence: disk caching) on your system(s), from improved I/O drivers at the system or SI level. I've noticed that several of our criticisms of SAS performance in hash methods have more-or-less silently been added to or incorporated into v7/8 of SAS, including the ability to create arrays with elements sized in chunks other than 8-byte offsets.

Haven't played with this at all in the past year or so, but would be interested to know what's changed.

-- Karsten M. Self <kmself@ix.netcom.com> http://kmself.home.netcom.com/ Evangelist, Zelerate, Inc. http://www.zelerate.org What part of "Gestalt" don't you understand? There is no K5 cabal http://gestalt-system.sourceforge.net/ http://www.kuro5hin.org


[application/pgp-signature]


Back to: Top of message | Previous page | Main SAS-L page