Date: Thu, 31 May 2001 11:14:51 -0400
Reply-To: Ian Whitlock <WHITLOI1@WESTAT.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Ian Whitlock <WHITLOI1@WESTAT.COM>
Subject: Re: Changing character to number format
Content-Type: text/plain; charset="iso-8859-1"
Subject: Changing character to number format
Summary: Simple 2 step process illustrating list processing
Respondent: Ian Whitlock <whitloi1@westat.com>
Marc Gameroff [gameroff@EROLS.COM] wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
I have a dataset with 740 variables, all in character format, and all
of length 346. (Don't ask.) About 95% of the variables are really
numeric, i.e., I need to convert them to numeric format in order to
analyze them. Also, I want to reduce the length of these variables to
4 (which will accomodate the largest numeric value, all of which are
integers). I know I can't change the format of a variable directly and
would have to use some type of interim variable method, but the
intricacies are beyond me at this point.
For the 5% of the variables containing "true" character data (they are
text responses), I want to retain the character format but reduce the
storage length of the variables if this is easy enough to do. Some of
these variables have text that never exceeds 15 characters, whereas
some contain up to--surprise!--346 characters. Any help would be
greatly appreciated!
<<<<<<<<<<<<<<<<<<<<<<<<<<
Sometime shortly before SUGI Ya Huang gave an illustration of a
conversion problem showing that SQL made for the simplest code to handle
rename problems in the conversion.
A DATA step could look at the file and decide which variables are
capable of conversion to numeric and what the maximum length is for
each variable that is not. At the end of file one could summariaze
this information into two lists and convert it to two macro variables
that could then be used in the SQL code to do the conversion.
Here is the program together with some test data.
/* make test data */
data w ;
length x1 - x5 $ 346 ;
input ( x1 - x5 ) ( $ ) ;
cards ;
123 123 xxxxxxxxxxxxxxxxxxxxxxxxxxx 45 6
2 a 456 7 8
;
/* need total number of character variables on file */
%let nv = 5 ;
/* pass 1 - get list of numeric vars,
character vars and lengths
*/
data _null_ ;
if eof then link final ;
set w end = eof ;
array __chr (&nv) _character_ ;
array __len (&nv) _temporary_ ;
do __i = 1 to dim ( __chr ) ;
__temp = input ( left ( __chr(__i) ) , ?? best12. ) ;
if __temp = . then
do ;
__temp = length ( __chr ( __i ) ) ;
if __temp > __len ( __i ) then __len ( __i ) = __temp ;
end ;
end ;
return ;
final:
length __chrlist __numlist $32000 __vname $ 32 ;
do __i = 1 to dim ( __chr ) ;
__vname = vname ( __chr ( __i ) ) ;
if __len ( __i ) > 0 then
do ;
/* 7 because -999999 can be stored in 4 bytes */
__clen = max ( __len ( __i ) , 7 ) ;
__chrlist = trim ( __chrlist ) || ", "
|| trim(__vname)
|| " length=" || put ( __clen , 4. ) ;
end ;
else
__numlist = trim ( __numlist ) || ", "
|| " input ( left (" || trim(__vname)
|| ") , best12. ) as " || trim(__vname)
|| " length=4"
;
end ;
call symput ( "chrlist" , trim(substr(__chrlist,3)) ) ;
call symput ( "numlist" , trim(substr(__numlist,3)) ) ;
return ;
run ;
/* pass 2 - do the conversion
%put numlis=&numlist ;
%put chrlist=&chrlist ;
*/
proc sql ;
create table w2 as
select &numlist , &chrlist
from w ;
quit ;
The technique illustrates the power of SAS to make lists and how to
put this power into program. The above code is simpler in version 8
because long character variables are available to hold lists. Since
Marc has data of length 346 I assume he must be using a version beyond
6.12. If one wants to do the problem in version 6.12 only the
subroutine, FINAL, should need changing. Now one would have to use
CALL VNAME to get the variable names and CALL EXECUTE to do the
concatenation to fom the macro lists.
Ian Whitlock