Date: Mon, 27 Jun 2005 16:53:55 -0400
Reply-To: "Howard Schreier <hs AT dc-sug DOT org>" <nospam@HOWLES.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Howard Schreier <hs AT dc-sug DOT org>" <nospam@HOWLES.COM>
Subject: Re: Optimization question
1. I don't see anything to initialize PERMNO. Did you mean
if id in(&list);
? If not, the interaction of the subsetting IF and the "FIRST." tests may
be problematic.
2. "Each id will have a [note singular] daily entry" suggests that
FIRST.DATE will be true for every observation, in which case you don't have
to test it and you don't even need DATE in the BY statement.
3. About how many different ID values are there in the data set?
On Mon, 27 Jun 2005 12:24:34 -0600, Michael Murff <mjm33@MSM1.BYU.EDU>
wrote:
>Hi SAS-L,
>
>
>
>I'm accessing a very large dataset (6 gigs) with the following code:
>
>
>
>data subset;
>
> set huge(keep=date id var1-var5);
>
> where "01Jan1970"d <= date <= "31DEC2003"d;
>
> by id date;
>
> year=year(date);
>
> if permno in(&list);
>
> if first.date then
>
> do;
>
> var1_l = lag(var1);
>
> var2_l = lag(var2);
>
> end;
>
> if first.id then
>
> do;
>
> var1_l = .;
>
> var2_l = .;
>
> end;
>
>run;
>
>
>
>&list contains a list of 2000 ids (sorted) that I care about. Each id will
>have a daily entry between the given dates. Huge dataset is already sorted
>by ID and DATE. I need a more efficient way to run this datastep as it
takes
>several hours on our server. I have access to 8.2 and 9.1.3 SAS versions in
>Unix environments.
>
>
>
>I tried putting &list in a compound where statement but I reach the 8.2
>where byte limit discussed recently on the -l (haven't tried this on 9.1.3
>yet). Does the by statement slow this down? And what about the subsetting
if
>statement. The final dataset "subset" should a few hundred MBs. I can write
>a gig with our SCSI drives in about 15 minutes? so it seems like this
little
>dstep could be written to go faster.
>
>
>
>Thanks,
>
>
>
>Michael Murff
>
>Provo, UT