Date: Wed, 10 Nov 2010 02:49:48 -0500
Reply-To: Søren Lassen <s.lassen@POST.TELE.DK>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Søren Lassen <s.lassen@POST.TELE.DK>
Subject: Re: Any clever way to find if a char field contains anything that
isn't a number?
Content-Type: text/plain; charset=ISO-8859-1
Sorry, my original post did not quite do the job.
In order to use a format to check for all numeric values,
you have to use a numeric informat, e.g.:
proc format;
invalue type
low - high=1
'y','Y','n','N',' '=0
other=-1
;
value $class
'1'='Numeric'
'0'='Boolean'
other='Character'
;
run;
data test;
infile cards missover;
input a $;
cards;
4ghhgh
3425
456677
3432
y
n
;run;
data temp;
set test;
select(input(a,type.));
when(1) a='1';
when(0) a='0';
otherwise;
end;
run;
proc freq data=temp;
tables a/ out=count noprint;
run;
proc summary data=count nway missing;
class a;
format a $class.;
var count;
output out=stats sum=NumObs n=NumLevels;
run;
Regards,
Søren
On Tue, 9 Nov 2010 13:19:10 -0500, Andrea Zimmerman
<sassywench74va@GMAIL.COM> wrote:
>I'm looking to automate a very manual process. I've got data coming in
that
>I need to run through proc univ and I need basically all data that is
>numeric to be stored as numeric so I can get statistics. But many times
>fields that are numeric are stored as char data and I have to do a proc
freq
>to determine which ones I can convert to numeric and which ones I can't.
>
>I'd like to automate this process. I know how to use the dictionary tables
>to generate a list of data stored as char, but where I could use some help
>is a clever way to determine what they actually contain. If they only
>contain digits 0 to 9, I'd like to convert to numeric data. If they only
>contain Y/N/null I'd like to convert to 1/0/null. Otherwise I'd like a
>report of how many unique values there are. (If there are a small list I
>can do some distributions, however if there are millions of unique values
in
>my dataset then it is pointless to my analysis.)
>
>Any thoughts? I know of the NOTDIGIT function to find those vars that are
>just 0 to 9. I'm sure proc freq with an out option can give me some
>datasets to look at for how many unique values I've got, or help me find my
>Y/N fields. Any other tricks I'm not thinking of?
>
>And I need to automate this process since the input data is not always
>formatted the same, so a field that is char this month might be num the
>next, and vice versa (why they would do that I have no idea, but they seem
>to like to complicate my life) I've been doing it manually, and every
month
>I have to alter the code since some were char and now num and some that
were
>num are now char.
>
>TIA
>
>--
>Andrea W-Z
|