Date: Wed, 10 Oct 2001 03:22:00 GMT
Reply-To: bruce@erlichman.com
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Bruce Erlichman <berlichman@NYC.RR.COM>
Organization: B. Erlichman, Inc.
Subject: Re: Sort a 8GB data set
Content-Type: text/plain; charset=us-ascii
Rich:
Divide the dataset into smaller pieces: a b c ... Then sort
each of these individually. Then combine: set a b c ...This set
statement will not append, it will take the sort into account.
Bruce
On 9 Oct 01 18:36:32 GMT, Cassell.David@EPAMAIL.EPA.GOV (David L.
Cassell) wrote:
>Rich wrote:
>> I am working with a 8GB dataset under SAS 8.0 on a Windows NT under
>> NTFS sytem. I need to sort the data set according to 2 variables. I
>> have 22GB free space in the hard drive. During the sorting, a
>> temporary system utility file with size 18GB is created. Then the
>> process stopped because of "OUT OF RESOURCE".
>>
>> What should I do with it? Thanks a lot.
>
>First of all, don't sort it unless you really have to.
>
>Second, for a standard PROC SORT, SAS needs several times the size of the
>original dataset for its scratch space. So you can see why you ran out of
>resources. You can avoid this problem in several ways.
>
>You may find that the TAGSORT option in PROC SORT will solve your problem
>for you, with minimal coding [just add the word TAGSORT in the PROC SORT
>statement]. This will take longer, but will use less space. It's a
>tradeoff.
>If you'll need to access the entire dataset multiple times afterward, this
>may be your best bet. Also, the fatter your dataset [moving more toward a
>very wide record with a smaller number of records], the better TAGSORT will
>look. A very tall, thin dataset will not benefit from TAGSORT as much.
>
>Or you may find that using the indexing capability of PROC DATASETS is your
>best option. This is typically a lot faster than a full sort. If you will
>need several different sort orders, you can create all of them in the same
>PROC DATASETS call. Furthermore, if you will then need to access only
>specific
>subsets of the data, you may get a big win here. OTOH, if you will want to
>access the full dataset many times once you are done with this part of the
>process, you will find a serious slowdown. SAS is far faster at sequential
>access than at indexed access - it is a matter of the I/O off the hard
>drive.
>
>So, part of the answer is another question: Why are you sorting this data
>set,
>and what will you have to do with it afterward? The answers will dictate
>your
>best strategy.
>
>HTH,
>David
|