LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (June 2004, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Mon, 7 Jun 2004 20:47:11 +0200
Reply-To:     Christian Lensbjerg <lensbjerg@HOTMAIL.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Christian Lensbjerg <lensbjerg@HOTMAIL.COM>
Organization: Cybercity
Subject:      Re: WHERE: how and why?


Without posting the proper documentation, every experiment I've conducted, have shown that the best (or at least as good as the alternatives) performing way to subset, when you a) use the data step & b) don't use indexes (I mostly subset in queries on day-to-day updated data, hence the cost of indexing exceeds the gain)

is this:

data output; set db.input(keep=relevant variables var1-var4); where var1='value'; if var2='value' then if var3='value' then if var4='value' then output; *and so on; run;

Needless to say, the criterion should be sorted in descending order, by observation-subsetability.

If You're going to sort the data afterwards, and don't intend to mess further with the data during the datastep - I often find it worth doing it all in one proc sort, with one long composed where-statement.

Best regards Christian Lensbjerg

"Roger Lustig" <> skrev i en meddelelse news:BQowc.6559$ > Greetings to all and sundry (and Al and Sundhri)! > > I'm trying to subset data as efficiently as possible. Any comments, > experience, etc. will be welcome. > > Suppose I have a complicated subsetting expression for a DATA step, say: > > WHERE type in (3,4,5) > AND > ( > bankname in:('CHASE','PROVID','RIGGS') > or > <a whole bunch of other criteria> > ); > > Furthermore, suppose that the first criterion (the type code) will get > rid of, say, 80% of the records, and some proportion of the remaining > 20% will be filtered out by the other criteria. Those other criteria > are likely to require a good deal of processing time per observation. > Is there a way to make further evaluation contingent on passing previous > filters? > > Some possibilities: > --This already happens, because WHERE-expression evaluation includes > short-circuiting. > > --Multiple WHERE clauses are applied in order, not all at once (I doubt > this): > > WHERE type in (3,4,5); > WHERE SAME and <all the other stuff>; > > --A view: > > DATA mytype/VIEW=mytype; > set master.enormous; > WHERE type in (3,4,5); > run; > > DATA mysubset; > set mytype; > WHERE <all the other stuff>; > > I've used the view approach, and certainly haven't been slowed down by > it, but other resource constraints prohibit my re-running those jobs for > benchmarking. > > [On a related topic: SAS documentation, both V8 (CD) and V9.1 (online), > is curiously mum about WHERE ALSO, which is a synonym for WHERE SAME > AND. It shows up in a few places, such as PROC REPORT and VIEWTABLE, > but it's valid in DATA and PROC steps, and can come in very handy if > one's a) writing a macro and b) lazy. WHERE SAME AND is touted as being > useful for PROCs with internal RUN boundaries, such as REG and GPLOT.] > > Any ideas? > > Roger > >

Back to: Top of message | Previous page | Main SAS-L page