Date: Wed, 2 Apr 1997 21:42:33 +0000
Sender: "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
From: Howard Schreier <hschreier@IGC.APC.ORG>
Subject: Re: Define a format vs Merge
Douglas Dame <dougdame@HPE.UFL.EDU> wrote:
> I dynamically write formats for this kind of look-up fairly often,
> using a data _null_ statement to write out a proc format to an
> external file, which I then immdiately %include to load the format
> into memory. This works well, and is a convenient hands-off
> "data-driven" way of handling these problems.
I used to do much the same thing. I had a macro, adapted from one
presented at our local SUG. But now the CNTLIN option on PROC FORMAT
offers a more straightforward way of accomplishing this.
> However, for reasons now lost in the fogs of antiquity, I use
> 10,000 obs (expected to be in the look-up table) as the cut-off to
> decide whether I'm going to dynamically spin out the format or
> write code that will do a merge. This 10,000 number threshold
> probably needs to be revisited ... my PC has more memory than my
> mainframe did xx years ago .... things change, sometimes even for
> the better. <g> So maybe the practical limit on the number of
> values in a format is a larger now.
I recall hearing the same rule of thumb (though it was seldom an
issue in my work). The explanation I remember is that PROC FORMAT,
in checking ranges in a VALUE statement for overlap, used a brute
force approach (second range was checked against first, #3 against #1
and #2, #4 against #1 and #2 and #3, etc.). This makes the number of
checks for overlap a quadratic function of the number of ranges; the
algebra is left as an exercise for the reader :-). It's probably
because the early developers thought VALUE statements would be
written only for short recodes like 1='Male" 2='Female', and did not
anticipate the creativity of users in the absence of an explicit
table lookup capability.
::: (signed) Howard Schreier, HSchreier@igc.org :::