Date: Tue, 4 May 1999 09:03:01 -0400
Reply-To: "Kagan, Jerry" <JKagan@US.IMSHEALTH.COM>
Sender: "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
From: "Kagan, Jerry" <JKagan@US.IMSHEALTH.COM>
Subject: Re: PROC SORT: Disk Size limitations
Content-Type: text/plain; charset="iso-8859-1"
Hi all,
I missed the start of this thread and I hope this wasn't suggested already,
but you could also stripe the file before sorting, using the macro below.
-Jerry
*** Create test data;
data test(drop=test);
do test = 1 to 10000;
x = int(ranuni(1)*100);
y = int(ranuni(1)*10);
z = int(ranuni(1)*10);
output;
end;
run;
%*** The Stripe macro is used to split file into pieces before sort or
summary;
%macro stripe(file, vars, stripes);
%*** Set line index and get number of obs;
data &file;
set &file end=last;
__n = _n_; *** Use __ to avoid bad side effects;
if last then call symput('nobs', compress(put(_n_,15.)));
run;
%put *** Note: Splitting &nobs obs into &stripes files;
%*** Sort file in pieces;
%do s = 1 %to &stripes;
%*** Set min and max obs for stripe;
data _null_;
call symput('min',
compress(put(floor((&nobs/&stripes)*&s-(&nobs/&stripes)+1),15.)));
call symput('max', compress(put(floor((&nobs/&stripes)*&s),15.)));
run;
%put *** Note: Stripe=&s min=&min max=&max;
%*** Perform the sort on the smaller file;
proc sort data=&file(where=(__n GE &min AND __n LE &max)) out=_temp&s;
by &vars;
run;
%end;
%*** Build sorted file using interleave;
data &file(drop=__n);
set %do s = 1 %to &stripes; _temp&s %end; ;
by &vars;
run;
%*** Clean up temp files;
proc datasets;
delete _temp1-_temp&stripes;
quit;
%mend stripe;
%stripe(test, %str(x y z), 10);
proc print data=test (obs=100); run;
---
Jerry Kagan
IMS HEALTH
Plymouth Meeting, PA USA
phone:(610) 834-5290
mailto:JKagan@US.IMSHEALTH.COM
> -----Original Message-----
> From: Carsten Mueller [SMTP:ca.mueller@FHTW-BERLIN.DE]
> Sent: Saturday, May 01, 1999 8:36 AM
> To: SAS-L@VM.MARIST.EDU
> Subject: Re: PROC SORT: Disk Size limitations
>
> If you have a problem with diskspace don't run the PROC SORT-Procedure
> one time with 3 variables, run the procedure 3 times with one variable
> (or 2 times with 2 and 1 variable(s)).
>
> e.g. a,b,c are the variables and your initial SORT-statement is
>
> PROC SORT DATA=????;
> BY A B C;
> RUN;
>
> sort the data 3 times with starting with the last variable up to the
> first
>
> proc sort data=????;
> by c;
>
> proc sort data=????;
> by b;
>
> proc sort data=????;
> by a;
> run;
>
> OR
>
> proc sort data=????;
> by b c;
>
> proc sort data=????;
> by a;
> run
>
> It needs more time but less diskpace :-)
>
> Jeff Gropp wrote:
> >
> > I was able to freee up some more space and run the desired Proc Sort:
> > I had forgotten about the size impact of Proc Sort: where an approx for
> the
> > needed disk space is;
> > (Size of orginial data set )*(# of BY variables) + (Size of orginial
> data
> > set )
> > which in my case was:
> > 400MB * 3 + 400 = 1600 MB = 1.6GB
> >
> > In either case, if anyone has some help on the options for future
> reference
> > it would be appreciated.
> >
> > Jeff
> >
> > jgropp1 <jgropp1@msn.com> wrote in message
> > news:O6#ugktj#GA.228@cpmsnbbsa03...
> > > I have a data set with 7569504 observations and 7 variables and I'm
> trying
> > > to sort it by 3 variables and I'm having trouble with the work space
> > > available for sorting. The data is approx. 407 MB itself and there is
> > about
> > > 1.5 GB of remaining space on the hard drive. I figured even if SAS
> > rewrote
> > > the file 3 times during the sort procedure then the would still leave
> > about
> > > 300MB of space, however I still run out of disk space. I know that
> one
> > can
> > > play around with the Proc Sort options to correct my problem but I am
> > leery
> > > about changing some parameters without some guidance. I'm using SAS
> 6.12
> > > for Windows on a system with WIN98 350mhz/64mb ram. Any ideas?
> > >
> > > Thanks,
> > > Jeff
> > >
> > >
>
> Carsten