| Date: | Mon, 3 May 1999 20:15:17 -0400 |
| Reply-To: | "Paul M. Dorfman" <sashole@NETSCAPE.NET> |
| Sender: | "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU> |
| From: | "Paul M. Dorfman" <sashole@NETSCAPE.NET> |
| Organization: | EarthLink Network, Inc. |
| Subject: | Re: PROC SORT: Disk Size limitations |
| Content-Type: | text/plain; charset=us-ascii |
jgropp1 wrote:
>
> I have a data set with 7569504 observations and 7 variables and I'm trying
> to sort it by 3 variables and I'm having trouble with the work space
> available for sorting. The data is approx. 407 MB itself and there is about
> 1.5 GB of remaining space on the hard drive. I figured even if SAS rewrote
> the file 3 times during the sort procedure then the would still leave about
> 300MB of space, however I still run out of disk space. I know that one can
> play around with the Proc Sort options to correct my problem but I am leery
> about changing some parameters without some guidance. I'm using SAS 6.12
> for Windows on a system with WIN98 350mhz/64mb ram. Any ideas?
>
> Thanks,
> Jeff
Jeff,
I would approach the problem this way...
1) Determine the maximum number of observations in your dataset that can be
sorted safely using OBS= option. It does not have to be precise, just get in
the
ballpark. Let us say, you have come up with it is 1,000,000.
2) Using FIRSTOBS= and OBS=, break the dataset into as many datasets as needed,
say, A, B, C, D, E, F, G.
3) Sort each one separately.
4) If your ultimate purpose is some kind of DATA step BY-processing, for
instance, accumulating something for every BY-group, simply interleave the
partial files:
DATA BY_SMTH;
SET A B C D E F G;
BY X Y Z;
IF FIRST.Y THEN DO;
..............
END;
..............
IF LAST.Y THEN DO;
..............
END;
............
RUN;
If you need to feed the would-be sorted file into a procedure, create a view
specifying BY variables, and feed that view into the proc. Specify the BY-group
in the proc, too. Something like:
DATA SORTED/VIEW=SORTED;
SET A B C D E F G;
BY X Y Z;
RUN;
PROC SMTH DATA=SORTED;
BY X Y Z;
........
RUN;
This way, you will not have to actually create a sorted dataset, but only
partial sorted datasets. Of course, if you do need that dataset in its
entirety,
then
DATA SORTED;
SET A B C D E F G;
BY X Y Z;
RUN;
will do the final sorting. Before doing that (and only if the partial datasets
have been created OK), you may want to kill the original dataset to save space,
for instance,
PROC DELETE DATA=ORIGINAL; RUN;
or
PROC SQL; DROP TABLE ORIGINAL; QUIT;
or
PROC DATASETS; DELETE ORIGINAL; RUN;
depending on your syntax preferences.
Kind regards,
===============
Paul M. Dorfman
Jax, FL
===============
|