LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous (more recent) messageNext (less recent) messagePrevious (more recent) in topicNext (less recent) in topicPrevious (more recent) by same authorNext (less recent) by same authorPrevious page (November 2010, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Thu, 11 Nov 2010 23:17:46 -0500
Reply-To:   Arthur Tabachneck <art297@ROGERS.COM>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   Arthur Tabachneck <art297@ROGERS.COM>
Subject:   Re: Any clever way to find if a char field contains anything that isn't a number?
Comments:   To: Bbser 2009 <bbser2009@GMAIL.COM>

Max,

I interpreted the question as follows;

Given a file with N variables, some of those were entered as character variables.

Of those, some contained just numbers which the OP wanted to identify and convert to numeric variables, some contained just the letters Y, N and blanks which the OP wanted to convert to a numeric 1,0 variable, and any the OP wanted to ignore any other character variables.

Further, the OP wanted to automate the process of identifying those variables, re-formatting them, and summarizing how many there were.

Art --------- On Thu, 11 Nov 2010 21:56:39 -0500, bbser2009 <bbser2009@GMAIL.COM> wrote:

>So basically speaking, given a character variable x, we would like to know >if it contains characters other than 0, 1, ..., 9? >I am not sure if I misunderstand this question. If not, can we just use an >input function like this? > >y1=input(x, dollar8.); >y2=input(x, comma8.); >... > >if all these y1 and y2 have missing values, then x contains characters other >than 0,1,...,9. > > >-----Original Message----- >From: Arthur Tabachneck [mailto:art297@ROGERS.COM] >Sent: November-11-10 6:39 PM >Subject: Re: Any clever way to find if a char field contains anything that >isn't a number? > >Andrea, > >Clever? Probably not! Way to automate? I think so. Generalizable? >Absolutely! Can it be optimized? As can all of the code I have ever >proposed on SAS-L! Can it handle the amount of data you have? Unknown >(but I'd like to know if it can)! > >The code is fairly basic and very similar to the other ideas that have >been suggested: > >data have ; > input @1 var1 $1. > @3 var2 $1. > @5 var3 $1. > @7 var4 $1. > @9 var5 $1. > @11 var6 $1.; > datalines; >1 2 3 Y 5 Y >2 3 A N 6 N >3 4 5 7 >8 9 0 1 >; >run; > >proc format; > invalue testnum > '0','1','2','3','4','5','6','7','8','9'=1 > other=0; > invalue testchr > ' ','y','Y','n','N'=1 > other=0; >run; > >proc sql noprint; > select "n_"||name||"="||"input("||name||",testnum.);", > "c_"||name||"="||"input("||name||",testchr.);", > "int(mean(c_"||name||"))+int(mean(n_"||name||")) as "||name > > into :recode1 separated by ' ', > :recode2 separated by ' ', > :final separated by ',' > from dictionary.columns > where libname='WORK' and > memname='HAVE' and > xtype='char'; >quit; > >data test(keep=n_: c_:); > set have end=eof; > &recode1. > &recode2. >run; > >proc transpose data=test out=transpose_test_part1; >run; > >proc transpose data=transpose_test_part1 out=transpose_test_part2; > by _name_ notsorted; >run; > >proc sql noprint; > select distinct _name_ > into :vars_to_format separated by ' ' > from transpose_test_part2 > group by _name_ > having min(col1)= max(col1) =1 >; >quit; > >%put &vars_to_format.; > >I don't know how you would like to incorporate the resulting macro >variable, &vars_to_format., but obtaining the number of variables included >and applying the appropriate formats should be rather trivial. > >HTH, >Art >-------- >On Tue, 9 Nov 2010 13:19:10 -0500, Andrea Zimmerman ><sassywench74va@GMAIL.COM> wrote: > >>I'm looking to automate a very manual process. I've got data coming in >that >>I need to run through proc univ and I need basically all data that is >>numeric to be stored as numeric so I can get statistics. But many times >>fields that are numeric are stored as char data and I have to do a proc >freq >>to determine which ones I can convert to numeric and which ones I can't. >> >>I'd like to automate this process. I know how to use the dictionary >tables >>to generate a list of data stored as char, but where I could use some help >>is a clever way to determine what they actually contain. If they only >>contain digits 0 to 9, I'd like to convert to numeric data. If they only >>contain Y/N/null I'd like to convert to 1/0/null. Otherwise I'd like a >>report of how many unique values there are. (If there are a small list I >>can do some distributions, however if there are millions of unique values >in >>my dataset then it is pointless to my analysis.) >> >>Any thoughts? I know of the NOTDIGIT function to find those vars that are >>just 0 to 9. I'm sure proc freq with an out option can give me some >>datasets to look at for how many unique values I've got, or help me find >my >>Y/N fields. Any other tricks I'm not thinking of? >> >>And I need to automate this process since the input data is not always >>formatted the same, so a field that is char this month might be num the >>next, and vice versa (why they would do that I have no idea, but they seem >>to like to complicate my life) I've been doing it manually, and every >month >>I have to alter the code since some were char and now num and some that >were >>num are now char. >> >>TIA >> >>-- >>Andrea W-Z


Back to: Top of message | Previous page | Main SAS-L page