Date: Wed, 22 Dec 1999 08:38:22 -0800
Reply-To: "Berryhill, Tim" <TWB2@PGE.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Berryhill, Tim" <TWB2@PGE.COM>
Subject: Re: Question about improving efficiency in database management
Content-Type: text/plain
PROC MEANS will run significantly faster for large numbers of groups if you
sort the file and use a BY statement instead of a CLASS statement. The
CLASS statement requires MEANS to build a table of all combinations of class
variables. The BY statement allows MEANS to process a single group, then
write it to the output dataset and use the same storage to process the next
group. The BY statement requires a sorted file (or a grouped file and the
NOTSORTED keyword), the CLASS statement can handle the observations in any
order.
Tim Berryhill - Contract Programmer and General Wizard
TWB2@PGE.COM or http://www.aartwolf.com/twb.html
Frequently at Pacific Gas & Electric Co., San Francisco
The correlation coefficient between their views and
my postings is slightly less than 0
> ----------
> From: machellew@MY-DEJA.COM[SMTP:machellew@MY-DEJA.COM]
<SNIP>
> I have a 2 million observation 30 variable administrative dataset which
> contains patient and physician variables and a line for each service of
> utilization. This means that there may be more than one line for each
> patient/physician encounter for a given date.
>
> If I want to reduce this data so that only one observation exists per
> date (that is, I am not interested in the actual service(s) used)
>
> I know of several options:
>
> 1)
> proc means sum noprint;
> class doctor patient date;
> var i;
> id {list of variables I don't want to lose or have to re-merge in later}
> output out=x sum=junk noprint; run;
<SNIP>
> ... there has to be a better way. The proc means method ran for over 26
> hours when I finally had to halt execution. Is there a more efficient
> method using PROC SQL?
|