LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (June 2005, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Tue, 28 Jun 2005 14:11:06 -0700
Reply-To:     "Choate, Paul@DDS" <pchoate@DDS.CA.GOV>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "Choate, Paul@DDS" <pchoate@DDS.CA.GOV>
Subject:      Re: Optimization question
Comments: To: Michael Murff <mjm33@MSM1.BYU.EDU>

Hi Michael -

Haven't read through the rest of the replies, but this may be useful.

Paul D. posted this about a year ago - rather than an IN comparison try the Index function...it gave a 93% decrease in time in his example.

<Paul D. - "is it one of many strings"> data _null_ ; retain v1 '123' v2 '567' v3 '890' ; do j = 1 to 1e6 ; if v1 in ('123' '234' '345' '456' '567' '678' '789' '890') then ; if v2 in ('123' '234' '345' '456' '567' '678' '789' '890') then ; if v3 in ('123' '234' '345' '456' '567' '678' '789' '890') then ; end ; run ; /*NOTE: DATA statement used (Total process time): real time 2.25 seconds cpu time 2.21 seconds*/

data _null_ ; retain v1 '123' v2 '567' v3 '890' ; do j = 1 to 1e6 ; if index (v1, '123 234 345 456 567 678 789 890') then ; if index (v2, '123 234 345 456 567 678 789 890') then ; if index (v3, '123 234 345 456 567 678 789 890') then ; end ; run ; /*NOTE: DATA statement used (Total process time): real time 0.15 seconds cpu time 0.15 seconds*/ </Paul D. - "is it one of many strings">

http://xrl.us/gkce

regards -

Paul Choate DDS Data Extraction (916) 654-2160

-----Original Message----- From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Michael Murff Sent: Monday, June 27, 2005 11:25 AM To: SAS-L@LISTSERV.UGA.EDU Subject: Optimization question

Hi SAS-L,

I'm accessing a very large dataset (6 gigs) with the following code:

data subset;

set huge(keep=date id var1-var5);

where "01Jan1970"d <= date <= "31DEC2003"d;

by id date;

year=year(date);

if permno in(&list);

if first.date then

do;

var1_l = lag(var1);

var2_l = lag(var2);

end;

if first.id then

do;

var1_l = .;

var2_l = .;

end;

run;

&list contains a list of 2000 ids (sorted) that I care about. Each id will have a daily entry between the given dates. Huge dataset is already sorted by ID and DATE. I need a more efficient way to run this datastep as it takes several hours on our server. I have access to 8.2 and 9.1.3 SAS versions in Unix environments.

I tried putting &list in a compound where statement but I reach the 8.2 where byte limit discussed recently on the -l (haven't tried this on 9.1.3 yet). Does the by statement slow this down? And what about the subsetting if statement. The final dataset "subset" should a few hundred MBs. I can write a gig with our SCSI drives in about 15 minutes? so it seems like this little dstep could be written to go faster.

Thanks,

Michael Murff

Provo, UT


Back to: Top of message | Previous page | Main SAS-L page