Date: Tue, 28 Jun 2005 14:11:06 -0700
Reply-To: "Choate, Paul@DDS" <pchoate@DDS.CA.GOV>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Choate, Paul@DDS" <pchoate@DDS.CA.GOV>
Subject: Re: Optimization question
Hi Michael -
Haven't read through the rest of the replies, but this may be useful.
Paul D. posted this about a year ago - rather than an IN comparison try the
Index function...it gave a 93% decrease in time in his example.
<Paul D. - "is it one of many strings">
data _null_ ;
retain v1 '123' v2 '567' v3 '890' ;
do j = 1 to 1e6 ;
if v1 in ('123' '234' '345' '456' '567' '678' '789' '890') then ;
if v2 in ('123' '234' '345' '456' '567' '678' '789' '890') then ;
if v3 in ('123' '234' '345' '456' '567' '678' '789' '890') then ;
end ;
run ;
/*NOTE: DATA statement used (Total process time):
real time 2.25 seconds
cpu time 2.21 seconds*/
data _null_ ;
retain v1 '123' v2 '567' v3 '890' ;
do j = 1 to 1e6 ;
if index (v1, '123 234 345 456 567 678 789 890') then ;
if index (v2, '123 234 345 456 567 678 789 890') then ;
if index (v3, '123 234 345 456 567 678 789 890') then ;
end ;
run ;
/*NOTE: DATA statement used (Total process time):
real time 0.15 seconds
cpu time 0.15 seconds*/
</Paul D. - "is it one of many strings">
http://xrl.us/gkce
regards -
Paul Choate
DDS Data Extraction
(916) 654-2160
-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Michael
Murff
Sent: Monday, June 27, 2005 11:25 AM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Optimization question
Hi SAS-L,
I'm accessing a very large dataset (6 gigs) with the following code:
data subset;
set huge(keep=date id var1-var5);
where "01Jan1970"d <= date <= "31DEC2003"d;
by id date;
year=year(date);
if permno in(&list);
if first.date then
do;
var1_l = lag(var1);
var2_l = lag(var2);
end;
if first.id then
do;
var1_l = .;
var2_l = .;
end;
run;
&list contains a list of 2000 ids (sorted) that I care about. Each id will
have a daily entry between the given dates. Huge dataset is already sorted
by ID and DATE. I need a more efficient way to run this datastep as it takes
several hours on our server. I have access to 8.2 and 9.1.3 SAS versions in
Unix environments.
I tried putting &list in a compound where statement but I reach the 8.2
where byte limit discussed recently on the -l (haven't tried this on 9.1.3
yet). Does the by statement slow this down? And what about the subsetting if
statement. The final dataset "subset" should a few hundred MBs. I can write
a gig with our SCSI drives in about 15 minutes? so it seems like this little
dstep could be written to go faster.
Thanks,
Michael Murff
Provo, UT