LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (October 2003, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Wed, 22 Oct 2003 14:53:04 -0700
Reply-To:   cassell.david@EPAMAIL.EPA.GOV
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   "David L. Cassell" <cassell.david@EPAMAIL.EPA.GOV>
Subject:   Re: estimated sorting time
Content-type:   text/plain; charset=US-ASCII

Zhonghe Li <zli@HSPH.HARVARD.EDU> wrote: > I am sort a 3.7 GB dataset on a computer with 16 free GB hard drive, and 1 > GB of RAM. So i run the tagsort. It has been 6 hours already. > > Can any one tell me how long it may take?

Ooh. That's not good.

Since Paul has tossed my name out there, I'll go ahead and stick my nose in. First off, TAGSORT may *not* be all that helpful. The TAGSORT option is really good when you have a really 'wide' data set (lots of variables and/or some really long strings) and a fairly 'narrow' key or set of keys you're sorting on. If you are sorting on keys that take up most of the 'width' of the data set, then TAGSORT may take a lot longer than doing an ordinary sort.

Second, your free space is more than 4 times the size of the data set. So you should be able to do a straightforward sort without TAGSORT.

Third, how long does it take to do a read read through the data set? If you end up generating a lot of network traffic, the time needed when working with the data set may be enormous, no matter what you do. Network traffic and/or disk I/O are often painful bottlenecks when working with large data sets. Try to avoid both.

Fourth, you should probably try to re-design your process so you don't *need* to do sorting (or you only need to sort once). Indexing can save you lots of work when you are going to be pulling out pieces of the data, and/or you need lots of different re-orderings of the data. I once had a data set with a statistical algorithm which looked like we would need 13 consecutive sort-step-and-data-step pieces, and intensive examnation of the underlying process ultimately gave us a different approach, requiring a DATA step, and then onea single indexing step (using PROC DATASETS). It took plenty of time to work out the new algorithmic structure, but it was more than worth it in the long run.

HTH, David -- David Cassell, CSC Cassell.David@epa.gov Senior computing specialist mathematical statistician


Back to: Top of message | Previous page | Main SAS-L page