LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (May 2004, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Fri, 14 May 2004 07:31:18 EDT
Reply-To:     PGall9898@AOL.COM
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Philip Gallagher <PGall9898@AOL.COM>
Subject:      Re: Scientific notation
Content-Type: text/plain; charset="US-ASCII"

David Cassell's reply (appended to this msg )to Kevin Christensen was so "right on" that it nearly took out all the steam that I had been generating on this topic. But not quite. So here goes:

COMMENT OF LESSER IMPORTANCE Kevin, when David wrote "[3] In some environments, an 'implicit character to numeric conversion' note is considered a sign of a problem with the program. People who have written log-checking programs have often included that 'NOTE' as something to be concerned about. You will know better than any of us whether someone may be looking over your shoulder because of that log message." I know in my heart he was biting his tongue to avoid writing something that would hurt your feelings. God bless him, but he really ought to have been stronger. The [multiply by minus one} technique is just plain bad practice which is great for a neophyte to have thought of and perhaps even to have used if the medium (SAS) utilized did not have an appreciably better mechanism (the INPUT function) specifically designed for creating a numeric variable from numeric information stored as characters. Besides the NOTE feature (which you may be thinking of as a nice but not very necessary luxury works so much more informatively when you use the INPUT fcn - you only get a NOTE if the information stored as character cannot be input as a number. This directs your attention to only those observations in which the "conversion" could not be done, thus letting you focus only on the trouble spots. Much nicer - you might find some hideous surprises of which you had been unaware in the character fields I have, so many times that I shudder to remember them. And, after having found them all you can write a formal (and correct) paragraph for your report, detailing what kinds of conversion problems and how many of each you encountered. Which is exactly what any hard-nosed prof or boss or journal editor or FDA reviewer wants to see.

COMMENT OF GREATER IMPORTANCE Kevin, I have not yet seen answers directed at the concept of comparing the created numeric variable with the original character variable. This is clearly not because the posters didn't understand and know the concept all too well, they merely wanted to keep their posts brief. As you see, I have not yet learned that virtue. Here's the idea. Systematically compare the output from whatever technique you decide to use against the input; make sure that the transformation you got was the transformation you intended. I would be willing to bet that even the cleverest participants on SAS-L (like Paul Dorfman and Ian Whitlock and David Cassell and ...) have all, at least once in their programming lives, written some transformation code that didn't work very well at all when presented with some unusual, unanticipated input data. The reason they are great programmers to whom we all would go for advice is that they will, as a matter of good practice, "Check their work". Thus, the solutions they present to you and me have the bugs worked out and you and I never see the sweat that went into what we do see. And they appear as brilliant as they are. So, even though earlier postings omitted the checking idea, please do it. Suggestions:

1. If there are not so many distinct values as to make the output unreadable, use PROC FREQ to do a crosstabulation. Any item which has more than one antecedent needs to be examined. 2. Again, if there are not so many observations as to make you crazy to look at them, a simple PROC PRINT of the two variables can give you enormous confidence in your work. If you SORT first you can make reading the output easier and quicker. 3. If there are an enormous number of distinct values you might choose to PRINT a few and then examine exhaustively in great detail all observations for which the created numeric variable is missing. This homes in directly on all values the INPUT fcn could not handle, telling you what the trouble conditions are and, possibly, suggesting ways to get some or all of the offending values translated in the best way for your study. 4. Other ideas I'm too dense to think of.

Please don't be insulted by the foregoing; I know we were all told in the second grade to "Check your work", but it is so easy to forget to do this in computer applications. And it is a fundamental part of good programming. I got really burned a few years ago when frantic clients snatched output from my subdirectory before I could check it, and, God help me, I had left out a BY-stmnt! Of course the results were wrong, and I leave it to you to guess who was blamed. By the time I got it sorted out my reputation with that set of clients was ruined, surprise, surprise. Checking really helps.

Phil Gallagher Nantucket

David Cassell's post is appended here without change.

Date: Thu, 13 May 2004 15:08:20 -0700 From: "David L. Cassell" <cassell.david@EPAMAIL.EPA.GOV> Subject: Re: Scientific notation

"CHRISTENSEN,KEVIN W" <chriske2@UFL.EDU> replied: > I ended up figuring a way around it...just multiply everything by > 1. It only works if the characters can be interpreted as numbers. > And it will likely convert any missing values to zeros > (thankfully I didn't have any missing values). I'll probably try > it your way too so I can learn a few things. The code I used is > below, for posterity's sake. > > data wbdata.pop4; > set wbdata.pop3; > array pop (43) _1960--_2002; > do i = 1 to 43; > pop(i) = pop(i)*1; > end; > run;

This will cause an implicit character-to numeric conversion and back, since SAS will not change your variables from character to numeric. You have to convert and assign that value to a numeric variable. I think I should also point out that:

[1] This will NOT convert missing values to zeroes. Just try it. It can convert missing values listed as a blank into character strings that look like a period. Which isn't what you wanted above.

[2] If some of your _nnnn variables are character (due to the way you read them in) and some are numeric, then your array statement will bomb out. Your arrays needs to be all numeric or all character.

[3] In some environments, an 'implicit character to numeric conversion' note is considered a sign of a problem with the program. People who have written log-checking programs have often included that 'NOTE' as something to be concerned about. You will know better than any of us whether someone may be looking over your shoulder because of that log message.

[4] The previous suggestions have all been really good ideas. You really ought to read in your flat file and make sure your numbers are properly processed, rather than having to patch things up after the fact. And I do thing that the use of INPUT() is better than the multiply- by-one technique above, for several reasons, one of which is clarity of the code.

HTH, David -- David Cassell, CSC Cassell.David@epa.gov Senior computing specialist mathematical statistician


Back to: Top of message | Previous page | Main SAS-L page