| Date: | Mon, 27 Jun 2005 21:57:24 -0400 |
| Reply-To: | diskin@alum.rpi.edu |
| Sender: | "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> |
| From: | Dennis Diskin <ddiskin@GMAIL.COM> |
| Subject: | Re: Optimization question |
|
| In-Reply-To: | <200506272247.j5RMl1xg024859@listserv.cc.uga.edu> |
| Content-Type: | text/plain; charset=ISO-8859-1 |
Mike,
I retract my suggestion of a format because I did not know that ID and
PERMNO were the same. In that case, the suggestion you have already received
of putting the selection list in a data set and merging on it makes more
sense.
Regards,
Dennis Diskin
On 6/27/05, Michael Murff <mjm33@msm1.byu.edu> wrote:
>
> Several persons of great wisdom have suggested that a format might be
> profitably used for table lookup in huge. Would someone be so kind as to
> demonstrate this technique for this situation? I think I could create a
> custom format on my list of 2k ids, but it's very unclear to me how this
> can
> be used when the format in Huge is different. Wouldn't the format have to
> be
> applied after the preprocessing where statement in the dstep?
>
> Thanks all,
>
> MM
>
> -----Original Message-----
> From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
> Dennis
> Diskin
> Sent: Monday, June 27, 2005 12:52 PM
> To: SAS-L@LISTSERV.UGA.EDU
> Subject: Re: Optimization question
>
> Roy,
> I would personnaly prefer to put a long list of selections into a format.
> Formats are pretty well optimized and can be used in where statements.
> The BY statement and associated first.s will cetainly add some processing
> time and at least in the code you've shown, accomplishes nothing since
> var1_i and var2_i are not kept in the dataset.
> HTH,
> Dennis Diskin
>
>
> On 6/27/05, Pardee, Roy <pardee.r@ghc.org> wrote:
> >
> > I eagerly await the responses to this question--I've got a very similar
> > set of sql queries I'd like to optimize.
> >
> > In the meantime--is your dset indexed & if not, is creating one
> > feasible? If so, what do you have indexed?
> >
> > -----Original Message-----
> > From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
> > Michael Murff
> > Sent: Monday, June 27, 2005 11:25 AM
> > To: SAS-L@LISTSERV.UGA.EDU
> > Subject: Optimization question
> >
> >
> > Hi SAS-L,
> >
> >
> >
> > I'm accessing a very large dataset (6 gigs) with the following code:
> >
> >
> >
> > data subset;
> >
> > set huge(keep=date id var1-var5);
> >
> > where "01Jan1970"d <= date <= "31DEC2003"d;
> >
> > by id date;
> >
> > year=year(date);
> >
> > if permno in(&list);
> >
> > if first.date then
> >
> > do;
> >
> > var1_l = lag(var1);
> >
> > var2_l = lag(var2);
> >
> > end;
> >
> > if first.id <http://first.id> <http://first.id> then
> >
> > do;
> >
> > var1_l = .;
> >
> > var2_l = .;
> >
> > end;
> >
> > run;
> >
> >
> >
> > &list contains a list of 2000 ids (sorted) that I care about. Each id
> > will have a daily entry between the given dates. Huge dataset is already
> > sorted by ID and DATE. I need a more efficient way to run this datastep
> > as it takes several hours on our server. I have access to 8.2 and 9.1.3
> > SAS versions in Unix environments.
> >
> >
> >
> > I tried putting &list in a compound where statement but I reach the 8.2
> > where byte limit discussed recently on the -l (haven't tried this on
> > 9.1.3 yet). Does the by statement slow this down? And what about the
> > subsetting if statement. The final dataset "subset" should a few hundred
> > MBs. I can write a gig with our SCSI drives in about 15 minutes? so it
> > seems like this little dstep could be written to go faster.
> >
> >
> >
> > Thanks,
> >
> >
> >
> > Michael Murff
> >
> > Provo, UT
> >
>
|