|
Andrea,
Clever? Probably not! Way to automate? I think so. Generalizable?
Absolutely! Can it be optimized? As can all of the code I have ever
proposed on SAS-L! Can it handle the amount of data you have? Unknown
(but I'd like to know if it can)!
The code is fairly basic and very similar to the other ideas that have
been suggested:
data have ;
input @1 var1 $1.
@3 var2 $1.
@5 var3 $1.
@7 var4 $1.
@9 var5 $1.
@11 var6 $1.;
datalines;
1 2 3 Y 5 Y
2 3 A N 6 N
3 4 5 7
8 9 0 1
;
run;
proc format;
invalue testnum
'0','1','2','3','4','5','6','7','8','9'=1
other=0;
invalue testchr
' ','y','Y','n','N'=1
other=0;
run;
proc sql noprint;
select "n_"||name||"="||"input("||name||",testnum.);",
"c_"||name||"="||"input("||name||",testchr.);",
"int(mean(c_"||name||"))+int(mean(n_"||name||")) as "||name
into :recode1 separated by ' ',
:recode2 separated by ' ',
:final separated by ','
from dictionary.columns
where libname='WORK' and
memname='HAVE' and
xtype='char';
quit;
data test(keep=n_: c_:);
set have end=eof;
&recode1.
&recode2.
run;
proc transpose data=test out=transpose_test_part1;
run;
proc transpose data=transpose_test_part1 out=transpose_test_part2;
by _name_ notsorted;
run;
proc sql noprint;
select distinct _name_
into :vars_to_format separated by ' '
from transpose_test_part2
group by _name_
having min(col1)= max(col1) =1
;
quit;
%put &vars_to_format.;
I don't know how you would like to incorporate the resulting macro
variable, &vars_to_format., but obtaining the number of variables included
and applying the appropriate formats should be rather trivial.
HTH,
Art
--------
On Tue, 9 Nov 2010 13:19:10 -0500, Andrea Zimmerman
<sassywench74va@GMAIL.COM> wrote:
>I'm looking to automate a very manual process. I've got data coming in
that
>I need to run through proc univ and I need basically all data that is
>numeric to be stored as numeric so I can get statistics. But many times
>fields that are numeric are stored as char data and I have to do a proc
freq
>to determine which ones I can convert to numeric and which ones I can't.
>
>I'd like to automate this process. I know how to use the dictionary
tables
>to generate a list of data stored as char, but where I could use some help
>is a clever way to determine what they actually contain. If they only
>contain digits 0 to 9, I'd like to convert to numeric data. If they only
>contain Y/N/null I'd like to convert to 1/0/null. Otherwise I'd like a
>report of how many unique values there are. (If there are a small list I
>can do some distributions, however if there are millions of unique values
in
>my dataset then it is pointless to my analysis.)
>
>Any thoughts? I know of the NOTDIGIT function to find those vars that are
>just 0 to 9. I'm sure proc freq with an out option can give me some
>datasets to look at for how many unique values I've got, or help me find
my
>Y/N fields. Any other tricks I'm not thinking of?
>
>And I need to automate this process since the input data is not always
>formatted the same, so a field that is char this month might be num the
>next, and vice versa (why they would do that I have no idea, but they seem
>to like to complicate my life) I've been doing it manually, and every
month
>I have to alter the code since some were char and now num and some that
were
>num are now char.
>
>TIA
>
>--
>Andrea W-Z
|