Date: Fri, 5 May 2000 21:40:18 GMT
Reply-To: sashole@mediaone.net
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Paul Dorfman <paul_dorfman@HOTMAIL.COM>
Subject: Re: transpose?
Content-Type: text/plain; format=flowed
Steve,
Great job, thanks. I tend to agree with your conclusions, taking a small
exception to the difficulty of putting a bunch of arrays together. It seems
to be an easy prey for call execute(). Below, it assembles a number of
implicit arrays which are then looped through. If there are character
variables in the dataset, the _character_ array and a small macro %dolc()
analogous to %doln() can be added.
proc sql noprint;
select max(c) into : m
from (select count(*) as c
from a group by id);
run;
%let x = call execute;
data _null_;
set a;
array nn _numeric_;
length v $8;
%macro doln (ln);
do _i_=2 to dim(nn);
call vname(nn,v); &x (&ln); end;
%mend doln;
&x ("data b;");
%doln ("array nn" || left(put(_i_-1,best.)) ||
trim(v) || "1-" || trim(v) || left("&m;"));
&x ("array nn (_n_) nn1-nn" || left(put(_i_-2,best.))||";");
&x ("array v ");
%doln (v);
&x (";");
&x ("do _i_=1 by 1 until(last.id); set a; by id;");
&x ("do over nn; nn = v; end; end; run;");
stop;
run;
Working against the original input
data a;
input id age ctgy type score ;
cards;
1 36 1 2 1
1 42 2 1 2
1 57 3 1 3
2 16 4 3 4
2 21 5 3 5
;
run;
the step assembles and submits the following code:
data b;
array nn1 AGE1-AGE3;
array nn2 CTGY1-CTGY3;
array nn3 TYPE1-TYPE3;
array nn4 SCORE1-SCORE3;
array nn (_n_) nn1-nn4;
array v AGE CTGY TYPE SCORE;
do _i_=1 by 1 until(last.id);
set a; by id;
do over nn; nn = v; end;
end;
run;
I hope its idea is clear. Of course, the same can be done your way, i.e. by
writing into a temp file and %including it. The point is, neither does take
a lot of time to concoct, so much better performance does not come at too
steep a price.
Kind regards,
=======================
Paul M. Dorfman
Jacksonville, FL
=======================
>From: Steve_Giblin@DOM.COM
>Reply-To: Steve_Giblin@DOM.COM
>To: SAS-L@LISTSERV.UGA.EDU
>Subject: Re: transpose?
>Date: Thu, 4 May 2000 12:09:16 -0400
>
>In continuance of this thread, I wish
>to add a few other notes:
>
>Max's post indicated that his set contain 37
>variables, not just the age and type.
>
>The two methods that came to mind are RETAIN
>and TRANSPOSE.
>
>Because of the 37 variables, this also could
>involve macro / do loops.
>
>I took the time to enlarge the test set,
>add more variables, and do some efficiency tests -
>comparing TRANSPOSE with RETAIN.
>
>I had initially constructed the RETAIN method that
>uses a counter, etc. and then I saw Paul Dorfman's
>very clear and elegant post.
>
>It is still worthwile to compare the efficiency
>of the TRANSPOSE with the now 2 types of RETAIN
>methods - all with the 37 variables.
>
>I have assumed that all 37 are numeric variables.
>
>First, here are the results in run time for each type.
>(Used with DATA _NULL_ and output the time to a file).
>
>
>(Results with PC SAS - 6.12 - Windows )
>
>begin transpose method 10:24:36
>end transpose method 10:28:30 234 seconds
>--------------------------------
>(mine with count)
>begin retain method 10:28:30
>end retain method 10:28:46 16 seconds
>--------------------------------
>(Paul Dorfman's)
>begin alt retain method 10:28:46
>end retain method 10:29:01 15 seconds
>
>
>
>Thus, the RETAIN method is 15 times faster,
>let alone the extra disk space usage,
>because you do not have to create 37 separate
>data sets as with TRANSPOSE.
>
>Attached is the code for 2 cases:
>( I have only attached Paul's method
> for the RETAIN case
> because of email size constraints )
>
>A separate macro was created for each method:
>---------------------------------------------
>%transpos - TRANSPOSE method
>%retains - my RETAIN method with counter,etc
>%altretn - Paul Dorfman's RETAIN method.
>
>
>
>Final note:
>
> The TRANSPOSE method took 1/10th the time
> to code than the RETAIN methods, but you
> can see where there is no need to have to
> type out all 37 variables with their
> associated renames, computations, and drops.
>
> The TRANSPOSE method was cleaner coding for
> all 37 variables - %DO i=1 %TO 37 versus
> the tricky way that stright DO's are used
> in the DATA _NULL_ step to generate %INCLUDEable
> SAS code.
>
> It depends if the code is for a "production"
> type of environment or is a one-time ad-hoc.
>
>Steve
>
>
>
>
>* This is the test case Max provided ;
>* with a few extra cases and then ;
>* a blow-up to be 9,000 * 4 = 36,000 ids;
>
>
>data temp;
>input x v1 v2;
>array v (37) v1-v37;
>do i=3 to 37; v(i)=v2; end;
>drop i;
>cards;
>.1 36 2
>.1 42 1
>.1 57 1
>.2 16 3
>.2 21 3
>.3 8 2
>.3 12 2
>.3 32 3
>.3 35 1
>.3 36 1
>.4 18 2
>.4 40 1
>;
>
>
>data temp;
> set temp;
> do ref=1 to 9000; id=ref+x; output; end;
> drop x ref;
> proc sort; by id;
> run;
>
> data _null_;
> file 'd:\check.sas';
> x=time(); format x time16.;
> put @1 'begin transpose method' @25 x ;
> run;
>
>%macro transpos;
>
> %do i=1 %to 37;
>
> proc transpose data=temp
> out=temp&i (drop=_name_)
> prefix=v&i.r;
> by id;
> var v&i;
> %end;
>
> * now let sas create the merge statements;
>
> data _null_;
> file 'd:\temp.sas';
> put 'data result;';
> put ' merge ';
> do i=1 to 37;
> ref=left(put(i,2.));
> put @5 'temp' ref;
> end;
> put ';';
> put ' by id;';
> put ' run;';
> run;
>
> %include 'd:\temp.sas';
>
>%mend transpos;
>%transpos;
>run;
>
> data _null_;
> file 'd:\check.sas' mod;
> x=time(); format x time16.;
> put @1 'end transpose method' @25 x ;
> run;
>
> data _null_;
> file 'd:\check.sas' mod;
> x=time(); format x time16.;
> put @1 'begin alt retain method' @25 x ;
> run;
>
>
>%macro altretn;
>
>* this establishes the max records per id;
>
>data _null_;
> set temp end=lastrec;
> by id;
> retain cnt maxcnt;
> if first.id=1 then cnt=.;
> cnt=sum(cnt,1);
> if last.id=1 then if cnt > maxcnt then maxcnt=cnt;
> if lastrec=1 then do;
> call symput('maxid',maxcnt);
> end;
> run;
>
>
>
> * Now you sas to create sas code to be %included;
>
>data _null_;
> file 'd:\temp.sas';
> length ref $100;
>
> put 'data resultc;';
> do i=1 to 37;
> ref = 'array v'||trim(left(put(i,2.)))||'r';
> put @5 ref '(&maxid);';
> end;
> put ' ';
>
>
> put ' do _n_=1 by 1 until (last.id); ';
> put ' set temp (rename=( ';
> do i=1 to 37;
> ref=left(put(i,2.));
> put @15 'v' ref ' = ' 'w' ref;
> end;
> put ' ));';
> put ' by id;';
> put ' ';
>
> do i=1 to 37;
> ref = 'v'||trim(left(put(i,2.)))||
> 'r(_n_) = w'||
> trim(left(put(i,2.)))||';';
> put @9 ref;
> end;
> put 'end;';
> put ' ';
>
> put 'drop w1-w37 ;';
> put 'run;';
> run;
>
>
> %include 'd:\temp.sas';
> run;
>
>%mend altretn;
>%altretn;
>
>
> data _null_;
> file 'd:\check.sas' mod;
> x=time(); format x time16.;
> put @1 'end retain method' @25 x ;
> run;
________________________________________________________________________
Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com
|