LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (March 2010, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 11 Mar 2010 09:54:07 -0800
Reply-To:     "Richard A. DeVenezia" <rdevenezia@GMAIL.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "Richard A. DeVenezia" <rdevenezia@GMAIL.COM>
Organization: http://groups.google.com
Subject:      Re: Using character variales as continuous variables
Comments: To: sas-l@uga.edu
Content-Type: text/plain; charset=windows-1252

On Mar 10, 6:53 pm, Lance Smith <medicaltr...@gmail.com> wrote: > Dear all, > > I have a database of 50 SNP variables. Each SNP variable has 3 levels > let’s say AA, AG, GG. The levels vary with different SNPs, so another > one may be CC CT and TT and still another may be AA AC and CC. > > I also have levels of four markers that are on a continuous scale. > I need to do univariate linear regression to predict the level of > biomarkers using wach SNP seperately. > Thus I need to do 50*4 = 200 univariate linear regressions. > The SNPs need to be recoded to 0,1,2 for the regression as we want to > treat them as a continuous variable with the heterozygotes (AG or CT > or AC) coded as 1. > > Is there a way to efficiently do the recoding to 0,1,2 in SAS without > having to recode all the 50 SNPs separately? Or is there a way to tell > SAS to treat them as continuous variables even though they are coded > as character variables? > > Thank you

Yes, there is a way.

Q: How many rows are in the database ? You might want to tranpose the entire kaboodle in order to be able to use BY or CLASS statements.

If the allowed levels of each SNP variable are specified in a separate table, you can use that table to create a view to map the textual level value to a numeric value.

If the allowed level are not known apriori, a pass through the collected data _can_ extract the observed level values and map based on that. However, if some SNP variables have fewer than 3 different level values, the regression might be misleading or require closer examination.

There is a unfortunate side-effect from mapping to 0,1,2 -- you can't use a single format to reverse map a 0,1,2 to its original level value (because each SNP variable has a different set of levels)

This sample code will pass over a study's collected data to determine the level values and compute an appropriate recode value. The recode data is used to create a custom informat that is applied to each SNP variable to create an SNPX variable. The regressions would use SNPX.

Note: A hash table approach could also perform the same type of recoding.

-------------------- * fake snp level values are as such * AA, AB, BB * BB, BC, CC * aa, ab, bb *;

data fake_study; length sampleid biomarker 4;

array snp $2 snp1-snp50 ;

do sampleid = 1 to 100; biomarker = ceil(10*ranuni(1234)); do _n_ = 1 to dim(snp); x = floor(3*ranuni(1234)); if _n_ < 26 then code = rank('A') + _n_ - 1 ; else code = rank('a') + _n_ - 26;

snp(_n_) = byte(code + x/2) || byte(code + (x+1)/2); end; output; end; drop code x; run;

proc transpose data=fake_study out=level_values(rename=col1=level_value); by sampleid; var snp:; run;

proc sort data=level_values nodupkey; by _name_ level_value; run;

data level_informat_data; set level_values; by _name_; if first._name_ then label=0; else label+1;

start = catx ('_', upcase(_name_), upcase(level_value));

fmtname = 'SNP_LEVEL_NUM'; type = 'I';

keep start label fmtname type; run;

proc format cntlin = level_informat_data; run;

data fake_study_snpX / view = fake_study_snpX; set fake_study; array snp snp1-snp50; array snpx snpx1-snpx50; format snpx: 1.;

do _n_ = 1 to dim(snp); name_cat_level = catx ( '_' , upcase(VNAME(snp(_n_))) , upcase(snp(_n_)) );

snpx(_n_) = input (name_cat_level, SNP_LEVEL_NUM.); end;

drop name_cat_level; run; --------------------

Richard A. DeVenezia http://www.devenezia.com


Back to: Top of message | Previous page | Main SAS-L page