Date: Mon, 1 Dec 2008 17:36:29 -0500
Reply-To: Paul Dorfman <sashole@BELLSOUTH.NET>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Paul Dorfman <sashole@BELLSOUTH.NET>
Subject: Re: How to clean character data with space and dotts
Content-Type: text/plain; charset=ISO-8859-1
Jane,
It is fairly easy to delete the "bad" data.
First, however, you have to define "bad". "20 .3" looks like pretty good
data to me, except that whoever was keying it in accidentally hit the space
bar. Likewise, it is quite apparent what 12...4 was intended to be. Do you
really want to throw these data away?
If what you consider good data is SAS standard numeric data, i.e. data that
the standard numeric informat considers good to consume, then the sole
instruction
if missing (input (weight, ?? 32.)) then delete ;
which return a missing value if a datum does not conform to the standard
numeric rules, could accomplish the goal. Note that a datum in the
scientific notation like 1E7 (10 million) is also considered standard.
If you regard "123,456.789" as valid input, the standard numeric informat
is not what you need, and you will be better off using
if missing (input (weight, ?? comma32.)) then delete ;
instead; or if "123.456,789" is also good (in the case of an European data
entry clerk), then this would be apter:
if missing (input (weight, ?? comma32.))
and missing (input (weight, ?? commax32.)) then delete ;
On the other hand, if you would like to preserve 12...4 as valid 12.4, you
would need to kill consecutive periods replacing then by a single one. The
expression
translate (compbl (translate (compress (weight), "", ".")), ".", "")
will do that, after which it can be plugged into
if missing (input (weight, ?? 32.)) then delete ;
instead of weight.
Methinks the first step should be a cursory analysis of "dirt" you have in
your input data as a fair replacement of eyeballing. It will give you a
better guide to decide whether you really ready to go for a data kill or
not.
Kind regards
------------
Paul Dorfman
Jax, FL
------------
entered Say, you may use the commaw.d and/or commaxw.d informat
On Mon, 1 Dec 2008 14:05:01 -0800, jn mao <jn_mao@YAHOO.COM> wrote:
>Hello SAS-Ls,
>�
>I have a large datasets�including weight variable. The weight
variable�was set as Character variable. I�need to delete all invalid
weight data, then SAS can convert it to numerica data.�
>�
>Because some weight data were entered with space or more dotts, like 20
.3, 12...4, they were not entered with right format. Can someone help me
delete all those unqulified data?
>�
>I did compress, but still alot errors and can't convert well.� I need to
delete all those bad data. Thanks much.
>�
>Jane