LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous (more recent) messageNext (less recent) messagePrevious (more recent) in topicNext (less recent) in topicPrevious (more recent) by same authorNext (less recent) by same authorPrevious page (December 2008, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Mon, 1 Dec 2008 17:36:29 -0500
Reply-To:     Paul Dorfman <sashole@BELLSOUTH.NET>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Paul Dorfman <sashole@BELLSOUTH.NET>
Subject:      Re: How to clean character data with space and dotts
Comments: To: jn_mao@YAHOO.COM
Content-Type: text/plain; charset=ISO-8859-1

Jane,

It is fairly easy to delete the "bad" data.

First, however, you have to define "bad". "20 .3" looks like pretty good data to me, except that whoever was keying it in accidentally hit the space bar. Likewise, it is quite apparent what 12...4 was intended to be. Do you really want to throw these data away?

If what you consider good data is SAS standard numeric data, i.e. data that the standard numeric informat considers good to consume, then the sole instruction

if missing (input (weight, ?? 32.)) then delete ;

which return a missing value if a datum does not conform to the standard numeric rules, could accomplish the goal. Note that a datum in the scientific notation like 1E7 (10 million) is also considered standard.

If you regard "123,456.789" as valid input, the standard numeric informat is not what you need, and you will be better off using

if missing (input (weight, ?? comma32.)) then delete ;

instead; or if "123.456,789" is also good (in the case of an European data entry clerk), then this would be apter:

if missing (input (weight, ?? comma32.)) and missing (input (weight, ?? commax32.)) then delete ;

On the other hand, if you would like to preserve 12...4 as valid 12.4, you would need to kill consecutive periods replacing then by a single one. The expression

translate (compbl (translate (compress (weight), "", ".")), ".", "")

will do that, after which it can be plugged into

if missing (input (weight, ?? 32.)) then delete ;

instead of weight.

Methinks the first step should be a cursory analysis of "dirt" you have in your input data as a fair replacement of eyeballing. It will give you a better guide to decide whether you really ready to go for a data kill or not.

Kind regards ------------ Paul Dorfman Jax, FL ------------

entered Say, you may use the commaw.d and/or commaxw.d informat

On Mon, 1 Dec 2008 14:05:01 -0800, jn mao <jn_mao@YAHOO.COM> wrote:

>Hello SAS-Ls, >� >I have a large datasets�including weight variable. The weight variable�was set as Character variable. I�need to delete all invalid weight data, then SAS can convert it to numerica data.� >� >Because some weight data were entered with space or more dotts, like 20 .3, 12...4, they were not entered with right format. Can someone help me delete all those unqulified data? >� >I did compress, but still alot errors and can't convert well.� I need to delete all those bad data. Thanks much. >� >Jane


Back to: Top of message | Previous page | Main SAS-L page