```Date: Thu, 11 Nov 2010 21:56:39 -0500 Reply-To: bbser2009 Sender: "SAS(r) Discussion" From: bbser2009 Subject: Re: Any clever way to find if a char field contains anything that isn't a number? Comments: To: Arthur Tabachneck Content-Type: text/plain; charset="us-ascii" So basically speaking, given a character variable x, we would like to know if it contains characters other than 0, 1, ..., 9? I am not sure if I misunderstand this question. If not, can we just use an input function like this? y1=input(x, dollar8.); y2=input(x, comma8.); ... if all these y1 and y2 have missing values, then x contains characters other than 0,1,...,9. -----Original Message----- From: Arthur Tabachneck [mailto:art297@ROGERS.COM] Sent: November-11-10 6:39 PM Subject: Re: Any clever way to find if a char field contains anything that isn't a number? Andrea, Clever? Probably not! Way to automate? I think so. Generalizable? Absolutely! Can it be optimized? As can all of the code I have ever proposed on SAS-L! Can it handle the amount of data you have? Unknown (but I'd like to know if it can)! The code is fairly basic and very similar to the other ideas that have been suggested: data have ; input @1 var1 \$1. @3 var2 \$1. @5 var3 \$1. @7 var4 \$1. @9 var5 \$1. @11 var6 \$1.; datalines; 1 2 3 Y 5 Y 2 3 A N 6 N 3 4 5 7 8 9 0 1 ; run; proc format; invalue testnum '0','1','2','3','4','5','6','7','8','9'=1 other=0; invalue testchr ' ','y','Y','n','N'=1 other=0; run; proc sql noprint; select "n_"||name||"="||"input("||name||",testnum.);", "c_"||name||"="||"input("||name||",testchr.);", "int(mean(c_"||name||"))+int(mean(n_"||name||")) as "||name into :recode1 separated by ' ', :recode2 separated by ' ', :final separated by ',' from dictionary.columns where libname='WORK' and memname='HAVE' and xtype='char'; quit; data test(keep=n_: c_:); set have end=eof; &recode1. &recode2. run; proc transpose data=test out=transpose_test_part1; run; proc transpose data=transpose_test_part1 out=transpose_test_part2; by _name_ notsorted; run; proc sql noprint; select distinct _name_ into :vars_to_format separated by ' ' from transpose_test_part2 group by _name_ having min(col1)= max(col1) =1 ; quit; %put &vars_to_format.; I don't know how you would like to incorporate the resulting macro variable, &vars_to_format., but obtaining the number of variables included and applying the appropriate formats should be rather trivial. HTH, Art -------- On Tue, 9 Nov 2010 13:19:10 -0500, Andrea Zimmerman wrote: >I'm looking to automate a very manual process. I've got data coming in that >I need to run through proc univ and I need basically all data that is >numeric to be stored as numeric so I can get statistics. But many times >fields that are numeric are stored as char data and I have to do a proc freq >to determine which ones I can convert to numeric and which ones I can't. > >I'd like to automate this process. I know how to use the dictionary tables >to generate a list of data stored as char, but where I could use some help >is a clever way to determine what they actually contain. If they only >contain digits 0 to 9, I'd like to convert to numeric data. If they only >contain Y/N/null I'd like to convert to 1/0/null. Otherwise I'd like a >report of how many unique values there are. (If there are a small list I >can do some distributions, however if there are millions of unique values in >my dataset then it is pointless to my analysis.) > >Any thoughts? I know of the NOTDIGIT function to find those vars that are >just 0 to 9. I'm sure proc freq with an out option can give me some >datasets to look at for how many unique values I've got, or help me find my >Y/N fields. Any other tricks I'm not thinking of? > >And I need to automate this process since the input data is not always >formatted the same, so a field that is char this month might be num the >next, and vice versa (why they would do that I have no idea, but they seem >to like to complicate my life) I've been doing it manually, and every month >I have to alter the code since some were char and now num and some that were >num are now char. > >TIA > >-- >Andrea W-Z ```

Back to: Top of message | Previous page | Main SAS-L page