LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (June 2004, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Sat, 5 Jun 2004 19:06:09 GMT
Reply-To:   julierog@ix.netcom.com
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   Roger Lustig <trovato@VERIZON.NET>
Subject:   WHERE: how and why?
Content-Type:   text/plain; charset=ISO-8859-1; format=flowed

Greetings to all and sundry (and Al and Sundhri)!

I'm trying to subset data as efficiently as possible. Any comments, experience, etc. will be welcome.

Suppose I have a complicated subsetting expression for a DATA step, say:

WHERE type in (3,4,5) AND ( bankname in:('CHASE','PROVID','RIGGS') or <a whole bunch of other criteria> );

Furthermore, suppose that the first criterion (the type code) will get rid of, say, 80% of the records, and some proportion of the remaining 20% will be filtered out by the other criteria. Those other criteria are likely to require a good deal of processing time per observation. Is there a way to make further evaluation contingent on passing previous filters?

Some possibilities: --This already happens, because WHERE-expression evaluation includes short-circuiting.

--Multiple WHERE clauses are applied in order, not all at once (I doubt this):

WHERE type in (3,4,5); WHERE SAME and <all the other stuff>;

--A view:

DATA mytype/VIEW=mytype; set master.enormous; WHERE type in (3,4,5); run;

DATA mysubset; set mytype; WHERE <all the other stuff>;

I've used the view approach, and certainly haven't been slowed down by it, but other resource constraints prohibit my re-running those jobs for benchmarking.

[On a related topic: SAS documentation, both V8 (CD) and V9.1 (online), is curiously mum about WHERE ALSO, which is a synonym for WHERE SAME AND. It shows up in a few places, such as PROC REPORT and VIEWTABLE, but it's valid in DATA and PROC steps, and can come in very handy if one's a) writing a macro and b) lazy. WHERE SAME AND is touted as being useful for PROCs with internal RUN boundaries, such as REG and GPLOT.]

Any ideas?

Roger


Back to: Top of message | Previous page | Main SAS-L page