LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (March 1998, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Mon, 16 Mar 1998 13:44:05 -0500
Reply-To:   WHITLOI1 <WHITLOI1@WESTAT.COM>
Sender:   "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
From:   WHITLOI1 <WHITLOI1@WESTAT.COM>
Subject:   Re: Accessing a Variable's Data Type in a Program
Comments:   To: "Robert C. Saunders" <saunders@UANSV5.VANDERBILT.EDU>
Content-Type:   text/plain; charset=US-ASCII

Subject: Accessing a Variable's Data Type in a Program Summary: A solution using an SQL dictionary file is carefully considered. Respondent: Ian Whitlock <whitloi1@westat.com>

Robert C. Saunders <saunders@UANSV5.VANDERBILT.EDU> asked for help automating a solution to:

> I have 10 SAS datasets that I want to concatenate. However, in some > files a variable is stored as character and in another it's stored as > numeric. Concatenate won't procede unless the common variables are all > of the same type. I need to convert the variables from numeric to > character or character to numeric as appropriate for the concatenated > data set.

His complete text is appended at the bottom.

First let me point out that the existence of the problem indicates a severe breakdown in project management control; hence I personally would not want to use an automated solution to this problem. There are probably too many other little details that should be carefully considered. With that said the problem is interesting and worth considering for the technique involved.

Robert did not make entirely clear how one is to determine which variables are converted in which direction. I will assume a data set STD exists with 0 or more obs which has the correct type and length for all variables; hece any differences from the standard indicate the need for a conversion.

For test data let's use

data std ; stop ; x = 0 ; y = '00' ; z = 'abc' ; run ;

data w1 ; x = 1 ; y = '11' ; z = 'abc' ; run ;

data w2 ; x = '2.0000' ; y = 22 ; z = 'xyz' ; run ;

data w3 ; x = 3 ; y = 33 ; z = 'xyz' ; run ;

data w4 ; x = '4.4' ; y = '44' ; z = 'xyz' ; run ;

data w5 ; x = 5 ; y = '55' ; z = 'abc' ; run ;

Now let's look at what we have to accomplish. The final data step should look something like:

/* combine the data sets */ data all ( drop = __t: ) ; set std ( obs = 0 ) w1 ( in = mem1 rename = ()) w2 ( in = mem2 rename = (X = __t2 Y = __t3)) w3 ( in = mem4 rename = (Y = __t4)) w4 ( in = mem5 rename = (X = __t5)) w5 ( in = mem6 rename = ()) ; if mem2 then do ; /* w2 */ X = input ( __t2 ,6. ) ; Y = put ( __t3 ,2. ) ; end; if mem4 then do ; /* w3 */ Y = put ( __t4 ,2. ) ; end; if mem5 then do ; /* w4 */ X = input ( __t5 ,3. ) ; end; run ;

What about the empty RENAME lists? Fortunately version 6.12 is friendly to empty lists. However with each new version of SAS the SI programmers tend to forget this user friendly principle so one can expect glitches from time to time. Of course the empty RENAME clauses could be eliminated in the same way that the empty IF-DO blocks were eliminated, but I am stubborn and tend to think it their problem instead of mine.

As presented above we see that there are two separate problems

1) How to get the correct rename options on the data sets in the SET statement.

2) How to coordinate the renames with the conversions after the SET statement.

We could get a list of names from each data set where the corresponding types differ. For example, for data set W3:

proc sql ; select b.memname , a.name , a.type , case when a.type = 'char' then a.length else b.length end as length from dictionary.columns as a , dictionary.columns as b where a.libname = b.libname = "WORK" and a.memname = "STD" and b.memname = "W3" and a.name = b.name and a.type ^= b.type ; quit ;

Note that we always need the length of the character variable in the conversion, either because we want to INPUT the variable to a numeric one or because we want to PUT it to a character one.

Armed with such a data set it is easy to generate a rename for each record and also to write the conversion corresponding to each record. Since there will be ten such SELECT clauses all unioned together let's make it a macro. There is one technical detail. What about data sets that don't need any conversions? Well let's make a list of all the data sets and then do a left join to insure that there is a record for every data set. If NAME is missing then this will indicate that none of the special code is needed.

data mems ; input memname $ ; cards ; w1 w2 w3 w4 w5 ;

%macro getspecs ( mem = ) ; b.memname , a.name , a.type , case when a.type = 'char' then a.length else b.length end as length from dictionary.columns as a , dictionary.columns as b where a.libname = b.libname = "WORK" and a.memname = "STD" and b.memname = "%upcase(&mem)" and a.name = b.name and a.type ^= b.type %mend getspecs ;

proc sql ; create table specs as select %getspecs ( mem = w1 ) union corr select %getspecs ( mem = w2 ) union corr select %getspecs ( mem = w3 ) union corr select %getspecs ( mem = w4 ) union corr select %getspecs ( mem = w5 ) ; create table specs as select m.memname , s.name , s.type , s.length from mems as m left join specs as s on s.memname = upcase ( m.memname ) ; ******select * from specs ; quit ;

Now to generate the code lets use two separate DATA steps - one for the RENAMES and one for the conversion code. We will write to a catalog to take care of the clean up problem.

filename code catalog 'work.temp' ;

/* write set statement */ data _null_ ; file code ( setstmt.source ) ; if eof then put ';' ; if _n_ = 1 then put 'set std ( obs = 0 ) ' ; set specs end = eof ; by memname notsorted ; __x + 1 ; if first.memname then put memname '( in = mem' __x 5.-l ' rename = (' ; if name ^= ' ' then put name '= __t' __x 5.-l ; if last.memname then put '))' ; run ;

/* write conversion code */ data _null_ ; file code ( convert.source ) ; length fmt $ 5 ; set specs end = eof ; by memname notsorted ; __x + 1 ; if first.memname and name ^= ' ' then put 'if mem' __x 5.-l ' then do ; /* ' memname '*/' ;

fmt = trim ( left ( put ( length , 3. ) ) ) || '.' ; if type = 'char' and name ^= ' ' then put name '= put ( __t' __x 5.-l ',' fmt ') ;' ; else if name ^= ' ' then put name '= input ( __t' __x 5.-l ',' fmt ') ;' ;

if last.memname and name ^= ' ' then put 'end; ' ; run ;

Finally, it is time to put it altogether with

/* combine the data sets */ data all ( drop = __t: ) ; %inc code ( setstmt.source ) / source2 ; %inc code ( convert.source ) / source2 ; run ;

I have intentionally left the code in a crude form to be modified to meet a particular circumstance. Of course one could make a managing macro to take as parameters the standard set and the list of sets to concatenate. The only changes needed to the above code would be a %DO-loop to generate the SQL code and a %DO-loop to generate the data set MEMS. The rest can be used as is.

What about the other obvious approach - use SCL functions in DATA steps to retrieve tne needed information? In this case the SQL code disappears and the the two writing DATA steps can be merged into one glorious mess of loops. Is it really any simpler? I would be interested in seeing someone carry out the plan in detail for comparison. Could it be faster? Possibly, but remember the savings is the onetime cost of generating code so I do not see it as significant.

Ian Whitlock

______________________________ Reply Separator _________________________________ Subject: Accessing a Variable's Data Type in a Program Author: "Robert C. Saunders" <saunders@UANSV5.VANDERBILT.EDU> at Internet-E-Mail Date: 3/15/98 9:42 AM

Hi,

I have a question about accessing a variable's data type as part of a program. Of course, I've spent enough time thinking about how to automate this, that I should just do the very slow process, but it's a problem that has come up enough times before that my time figure this out has been an investment. Healthy rationalization.

PREFACE: I use SAS612, Windows NT40.

PROBLEM: I have 10 SAS datasets that I want to concatenate. However, in some files a variable is stored as character and in another it's stored as numeric. Concatenate won't procede unless the common variables are all of the same type. I need to convert the variables from numeric to character or character to numeric as appropriate for the concatenated data set.

WHAT I WOULD LIKE TO DO: I want the program to open the data set, check the variable's data type. There are two types of switches (c to n and n to c). If the variable is c and I want it to be c, the program should skip to the next variable; if the variable is c and I want it to be n, then I want the program to do an INPUT() step to convert the data. I would do this for the checking and conversion of numeric variables.

UNDESIRABLE SOLUTION: Go through each data set's contents, open each data set, use INPUT() to convert the variables, close the data set, then concatenate them. This is slow and I am reluctant to believe this isn't automatable.

Concatenate the datasets with a RENAME= on each input data set and process it in "wide" form, tolerating the type mismatches.

WHAT I'M CONSIDERING: Use PROC CONTENTS DATA=FILE1 OUT=CFILE1 on each data set; Write a data step using FILE1 that tells it to look up each variable's TYPE in its corresponding OUT=dataset; Repeat for data files 2 through 10. I think I will need to make a data set that contains the "desired" data type for each variable for the program to lookup each variable's correct state before making the conversion.

ASSISTANCE: If anyone has alternate suggestions or knows of some PROC/MACRO that already exists to handle this, I would greatly appreciate your sharing it with me.

Thank you.

Robert

################################

Robert C. Saunders, M.P.P. Research Associate CMHP/VIPPS/Vanderbilt University 1207 18th Avenue South Nashville, Tennessee 37212

T: 615.322.8284 F: 615.322.8081


Back to: Top of message | Previous page | Main SAS-L page