|
I am looking into the comparative efficiency of different methods for
counting the observations in a SAS dataset, and creating a corresponding
macrovariable. I've compared the following 3 methods:
(1) datastep
data _NULL_;
call symput('n_obs',trim(left(put(nobs,7.))));
set myfile nobs=nobs;
(2) PROC SQL
proc sql noprint;
select count(*) into :numobs
from myfile;
(2) the ATTRN function
%let idnum = %sysfunc(open(myfile)):
%let obs_cnt = %sysfunc(attrn(&idnum,nlobs));
%let cl = %sysfunc(close(&idnum));
The SAS dataset 'myfile' has close to 3.5 million obs and 31 variables.
Averaging over 10 runs, CPU time for the datastep was 52 seconds; for
PROC SQL, 18 seconds. A significant difference (if reliable),
especially given that the macro I'm writing will need to do this a lot.
(By the way, the presence or absence of the PUT function in (1) makes no
CPU time difference.)
Two questions:
(i) Have others found this difference between the datastep and PROC SQL?
(I ran this on a UNIX system.)
(ii) The LOG file fails to yield any CPU time information for the
open/attrn/close procedure in (3). This method APPEARS to be faster
than PROC SQL, but that is based on the suspect method of taking total
CPU time and subtracting out all other logged CPU times. Any ideas how
to get the relevant information for using the ATTRN function?
thanks in advance,
Paul Gorrell
Gorrell_P@bls.gov
|