Date: Wed, 29 Oct 1997 07:23:08 -0800
Reply-To: TERJESON Mark <TERJEMW@DSHS.WA.GOV>
Sender: "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
From: TERJESON Mark <TERJEMW@DSHS.WA.GOV>
Subject: Re: Any Good Book on Common Mistakes to avoid in SAS Code
Content-Type: text/plain; charset="iso-8859-1"
Kamal R Desai wrote:
Keith Brown added a few:
Mark Terjeson added a few:
Mike Davenport added a few:
Wren Nessle Buck added a few:
> On Tue, 28 Oct 1997 21:01:49 GMT, krd@world.std.com (Kamal R Desai)
> wrote:
>
> >I am looking forward for a book/publication which outlines common
> >mistakes / errors in SAS Codes (so that syntax is correct but the
> output
> >is not what was desired).
> >---snip
>
> I have never seen such a book, but it sounds like a GREAT idea!
> Everyone most likely has a hard earned personal list of "Gotcha's" to
> share. Why don't we start a list right here in this thread, then we
> can all save it for reference after it becomes enormous. I'll
> contribute a few of my own personal favorites:
>
> ==========================================
>
> READING PAST THE END OF AN INPUT RECORD:
> This usually bites me when I am reading files of variable length
> records and I get some short ones. If an INPUT statement implicitly
> or explicitly moves the column pointer beyond the current input record
> length, the default behaviour is to spill into the next input record.
> Unless you are parsing, this is nearly always the wrong thing to do.
>
> SOLUTION: Always code the MISSOVER option on the INFILE statement.
> Then any variables that you try to read from the nonexistent part of a
> record are set to missing, which is exactly what they are. (IMHO this
> should have been the default).
>
> =======================================================
>
> OVERLAYED VARIABLE VALUES IN A MERGE:
> Ideally the only variable names in common among a list of MERGEd
> datasets should be the ones on the BY statement. When non-key
> variables of the same name exist in multiple datasets, the variable's
> value in the last matched dataset wins and the others are overlayed.
> That is how it is supposed to work, but it is often not the desired
> result. Even missing values can overlay nonmissing values (unless you
> use UPDATE rather than MERGE). This mistake can waste a lot of
> debugging time.
>
> SOLUTION: Keep tight control of which variables you want to take from
> which datasets. Use KEEP lists, either in the creating data step or
> as a dataset option on the MERGE statement.
>
> ==========================================================
>
> EQUALITY COMPARISONS WORK INCONSISTENTLY:
> Comparing fractional values for equality is hazardous, and not just
> because of roundoff errors. For example, 0.1 cannot be exactly
> expressed as a binary floating point value because in binary it is a
> repeating fraction. Rounding to a certain number of decimal places
> (other than zero) won't help because the result is still a binary
> floating point number on every platform that I have worked on.
>
> SOLUTION: Use a "fuzz factor" if you must test equality of
> nonintegers. Rather than "IF X=Y" use "IF ABS(X-Y) < 1E-8". This
> let's you control "how equal is equal?" within the limits of precision
> of your particular platform. It's ugly, but it's just a fact of life
> when you do floating point arithmetic.
>
> ======================================================
>
> I'll add a couple more easy ones to the list you folks started:
> =======================================================
> MERGE NOT WORKING CORRECTLY, BUT CODE/LOGIC LOOKS OKAY:
>
> SOLUTION: Don't forget you need a BY statement with the MERGE.
> =======================================================
> (all kinds of symptoms)
>
> SOLUTION: Missing semicolon
>
> SOLUTION: Misplaced comment symbols
> ======================================================
> THE DATASET YOU ARE READING GETS BLOWN AWAY:
> (I have never done this but was warned the first day I was taught
SAS)
> SOLUTION: For those who place the SET statement right after the DATA
> statement and forget the semicolon on the DATA statement line. You
> would end up with at least three output datasets; one the same as your
> output filename, one dataset named "set", and your input dataset now
> written over. This is essentially having three dataset names listed
> after
> your DATA statement for output.
> =======================================================
Here's a couple more
=========================================================
Beware imbeded comment symobls
/* comment
.
.
data lines
.
.
/* comment */
more lines
*/
This crashes every time
======================================================
Be carefull with the retain statement you can often get results that
look
correct and are not. Especially if you don't sort the data correctly
======================================================
I've run into trouble with "unclosed" if then else if ... statements
nested within an if-then-else structure.
i.e.
if x then do;
if y then ...;
else if z then ...;
end;
else if q then do;
etc;
It looks like the second else if is connected to the first if then do
loop, but it's not. (SAS follows C/C++ in this regard.) It's connected
to the last "if" statement - the "else if z" condition. It's really
tough to figure logic problems like this out late at night, so if one
person gets to bed a little earlier as a result of this post, I'll be
satisfied.
Solution:
if x then do;
if y then ...;
else if z then ...;
else;
end;
else if q then do;
(Re:the thread about good coding practices: this is also the source of
my concern over relying too much on indenting code. The structure looks
right, but the code is fundamentally flawed.)
International Trade Resources
Wren Nessle Buck
Consultant