Date: Fri, 14 May 2004 07:31:18 EDT
Reply-To: PGall9898@AOL.COM
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Philip Gallagher <PGall9898@AOL.COM>
Subject: Re: Scientific notation
Content-Type: text/plain; charset="US-ASCII"
David Cassell's reply (appended to this msg )to Kevin Christensen
was so "right on" that it nearly took out all the steam that I had
been generating on this topic. But not quite. So here goes:
COMMENT OF LESSER IMPORTANCE
Kevin, when David wrote
"[3] In some environments, an 'implicit character to numeric conversion'
note is considered a sign of a problem with the program. People
who have written log-checking programs have often included that
'NOTE' as something to be concerned about. You will know
better than any of us whether someone may be looking over your
shoulder because of that log message."
I know in my heart he was biting his tongue to avoid writing
something that would hurt your feelings. God bless him, but he
really ought to have been stronger. The [multiply by minus one}
technique is just plain bad practice which is great for a neophyte
to have thought of and perhaps even to have used if the medium
(SAS) utilized did not have an appreciably better mechanism
(the INPUT function) specifically designed for creating a numeric
variable from numeric information stored as characters. Besides
the NOTE feature (which you may be thinking of as a nice but
not very necessary luxury works so much more informatively when
you use the INPUT fcn - you only get a NOTE if the information
stored as character cannot be input as a number. This directs
your attention to only those observations in which the "conversion"
could not be done, thus letting you focus only on the trouble spots.
Much nicer - you might find some hideous surprises of which you
had been unaware in the character fields I have, so many times
that I shudder to remember them. And, after having found them
all you can write a formal (and correct) paragraph for your report,
detailing what kinds of conversion problems and how many of each
you encountered. Which is exactly what any hard-nosed prof or
boss or journal editor or FDA reviewer wants to see.
COMMENT OF GREATER IMPORTANCE
Kevin, I have not yet seen answers directed at the concept of
comparing the created numeric variable with the original character
variable. This is clearly not because the posters didn't understand
and know the concept all too well, they merely wanted to keep
their posts brief. As you see, I have not yet learned that virtue.
Here's the idea. Systematically compare the output from whatever
technique you decide to use against the input; make sure that
the transformation you got was the transformation you intended.
I would be willing to bet that even the cleverest participants on
SAS-L (like Paul Dorfman and Ian Whitlock and David Cassell and
...) have all, at least once in their programming lives, written some
transformation code that didn't work very well at all when presented
with some unusual, unanticipated input data. The reason they are
great programmers to whom we all would go for advice is that they
will, as a matter of good practice, "Check their work". Thus, the
solutions they present to you and me have the bugs worked out
and you and I never see the sweat that went into what we do see.
And they appear as brilliant as they are. So, even though earlier
postings omitted the checking idea, please do it. Suggestions:
1. If there are not so many distinct values as to make the output
unreadable, use PROC FREQ to do a crosstabulation. Any item
which has more than one antecedent needs to be examined.
2. Again, if there are not so many observations as to make you
crazy to look at them, a simple PROC PRINT of the two variables
can give you enormous confidence in your work. If you
SORT first you can make reading the output easier and quicker.
3. If there are an enormous number of distinct values you might
choose to PRINT a few and then examine exhaustively in great
detail all observations for which the created numeric variable is
missing. This homes in directly on all values the INPUT fcn could
not handle, telling you what the trouble conditions are and,
possibly, suggesting ways to get some or all of the offending
values translated in the best way for your study.
4. Other ideas I'm too dense to think of.
Please don't be insulted by the foregoing; I know we were all told in
the second grade to "Check your work", but it is so easy to forget to
do this in computer applications. And it is a fundamental part of
good programming. I got really burned a few years ago when frantic
clients snatched output from my subdirectory before I could check it,
and, God help me, I had left out a BY-stmnt! Of course the results
were wrong, and I leave it to you to guess who was blamed. By the
time I got it sorted out my reputation with that set of clients was
ruined, surprise, surprise. Checking really helps.
Phil Gallagher
Nantucket
David Cassell's post is appended here without change.
Date: Thu, 13 May 2004 15:08:20 -0700
From: "David L. Cassell" <cassell.david@EPAMAIL.EPA.GOV>
Subject: Re: Scientific notation
"CHRISTENSEN,KEVIN W" <chriske2@UFL.EDU> replied:
> I ended up figuring a way around it...just multiply everything by
> 1. It only works if the characters can be interpreted as numbers.
> And it will likely convert any missing values to zeros
> (thankfully I didn't have any missing values). I'll probably try
> it your way too so I can learn a few things. The code I used is
> below, for posterity's sake.
>
> data wbdata.pop4;
> set wbdata.pop3;
> array pop (43) _1960--_2002;
> do i = 1 to 43;
> pop(i) = pop(i)*1;
> end;
> run;
This will cause an implicit character-to numeric conversion and back,
since SAS will not change your variables from character to numeric.
You have to convert and assign that value to a numeric variable. I
think I should also point out that:
[1] This will NOT convert missing values to zeroes. Just try it.
It can convert missing values listed as a blank into character strings
that look like a period. Which isn't what you wanted above.
[2] If some of your _nnnn variables are character (due to the way you
read them in) and some are numeric, then your array statement will bomb
out. Your arrays needs to be all numeric or all character.
[3] In some environments, an 'implicit character to numeric conversion'
note is considered a sign of a problem with the program. People who
have
written log-checking programs have often included that 'NOTE' as
something
to be concerned about. You will know better than any of us whether
someone
may be looking over your shoulder because of that log message.
[4] The previous suggestions have all been really good ideas. You
really ought to read in your flat file and make sure your numbers are
properly processed, rather than having to patch things up after the
fact. And I do thing that the use of INPUT() is better than the
multiply-
by-one technique above, for several reasons, one of which is clarity of
the code.
HTH,
David
--
David Cassell, CSC
Cassell.David@epa.gov
Senior computing specialist
mathematical statistician