Date: Sun, 31 May 1998 17:39:04 +0100
Reply-To: John Whittington <medisci@POWERNET.COM>
Sender: "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
From: John Whittington <medisci@POWERNET.COM>
Subject: Re: Sample, Thanks and Others
Content-Type: text/plain; charset="us-ascii"
At 00:17 31/05/98 -0400, Li Quan wrote:
>Jingren's new program works great. The estimates for each model are
>stacked in the results dataset, which makes the second stage analysis much
>easier.
>John's suggestion sounds very plausible (first, generate the datasets,
>then estimate the models with a by statement). But I haven't got a chance
>to test it. It would be very interesting to compare the performance of
>these two in terms of the CPU time, etc.
Quan, as I would have expected, there is a dramatic difference in
performance - although with small datasets it wouldn't be enough to fuss
about. 'My' method (code below) which also stacks all the results in a
single dataset (with just one run of PROC REG), can deal with 1000
observations in about half the time that Jingren's takes to do 60 (all times
in seconds):
'My' Method - 60 observations:
STARTED=61479.03 FINISH=61481.12 ELAPSED=2.09
'My' Method - 200 observations:
STARTED=62415.68 FINISH=62420.68 ELAPSED=5.00
'My' Method - 1000 observations:
STARTED=61657.43 FINISH=61678.85 ELAPSED=21.42
Jingren's Method - 60 observations:
STARTED=61764.87 FINISH=61802.93 ELAPSED=38.06
Jingren's Method - 200 observations:
STARTED=61867.3 FINISH=62022.8 ELAPSED=155.5
Jingren's Method - 1000 observations
** programme crashes after about 238 iterations because
of inability to create more output dataset 'handles'
This also illustrates another problem with the 'macro %do loop' approach
when dealing with many iterations - if one creates a separate output dataset
for each iteration, and then combines them all at the end, one can run into
problems - as you can see, my SAS installation got upset after about 238
iterations. One can avoid that problem by using a PROC APPEND within the
macro %do loop, thereby just building up a single results dataset one
observation at a time - but that would make the %do loop considerably slower
still.
CREATE TEST DATASET:
data test ;
do month = 1 to 1000 ;
a = ranuni (459274) ;
b = ranuni (134563) ;
output ;
end ;
run ;
MY METHOD .....
data _null_ ; t = time() ; call symput('start',t) ; /* for timing */
data samples (drop = i) ;
row = _n_ ;
set test nobs = num end = eof ;
array ar(10000, 3) _temporary_ ;
ar(row, 1) = month ; ar(row, 2) = a ; ar(row, 3) = b ;
if eof then do ;
do run = 1 to num - 11 ;
do i = run to (run + 11) ;
month = ar(i, 1) ; a = ar(i, 2) ; b = ar(i, 3) ;
output ;
end ;
end ;
end ;
run ;
proc reg noprint outest = results ;
model month = a b ;
by run ;
run ;
data _null_ ; /* for timing */
started = &start ;
finish = time() ;
elapsed = finish - started ;
put started= finish= elapsed= ;
run ;
JINGREN'S METHOD ....
data test ;
do month = 1 to 60 ;
a = ranuni (459274) ;
b = ranuni (134563) ;
output ;
end ;
run ;
options nosymbolgen;
%let n=60; /* this is the total number of obs */
%let datasets=;
%let i=1;
%macro doreg;
%do %until(&i>&n-12+1);
proc reg noprint data=test(where=(&i<=month<=&i+12-1)) outest=res&i;
model month = a b ;
quit;
run;
%let datasets= &datasets res&i;
%let i=%eval(&i+1);
%end;
data results;
set &datasets;
run;
%mend;
data _null_ ; t = time() ; call symput('start',t) ; /* for timing */
%doreg;
data _null_ ; /* for timing */
started = &start ;
finish = time() ;
elapsed = finish - started ;
put started= finish= elapsed= ;
run ;
Regards,
John
----------------------------------------------------------------
Dr John Whittington, Voice: +44 (0) 1296 730225
Mediscience Services Fax: +44 (0) 1296 738893
Twyford Manor, Twyford, E-mail: medisci@powernet.com
Buckingham MK18 4EL, UK mediscience@compuserve.com
----------------------------------------------------------------