LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (September 2008, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Wed, 10 Sep 2008 14:18:49 -0400
Reply-To:   Muthia Kachirayan <muthia.kachirayan@GMAIL.COM>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   Muthia Kachirayan <muthia.kachirayan@GMAIL.COM>
Subject:   Re: Question on Transpose with two columns needed
In-Reply-To:   <01f701c91370$2d417820$832fa8c0@HP82083701405>
Content-Type:   text/plain; charset=ISO-8859-1

On Wed, Sep 10, 2008 at 2:08 PM, Mary <mlhoward@avalon.net> wrote:

> No, as I stated I have 3000 SNP's (a numeric and a character variable that > match), for 6000 variables total on each record, so this approach requires > too much coding, it seems. Thanks for trying. > > -Mary > ----- Original Message ----- > From: Steve Denham > To: Mary ; SAS-L@LISTSERV.UGA.EDU > Sent: Wednesday, September 10, 2008 12:22 PM > Subject: Re: Question on Transpose with two columns needed > > > Here's what I came up with. It's kind of brute force, but it did give what > you asked. > > data one; > input ID $ rs001 rs001_allele $ rs002 rs002_allele $; > datalines; > 001 0 AA 1 CT > 002 1 AB 0 CC > 003 2 BB 2 TT > 004 1 AB 1 CT > ; > > data twoa;set one; > rs=rs001; > rs_allele=rs001_allele; > if index(rs_allele,'A')>0 or index(rs_allele,'B')>0 then snp='rs001'; > if index(rs_allele,'C')>0 or index(rs_allele,'T')>0 then snp='rs002'; > output; > rs=rs002; > rs_allele=rs002_allele; > if index(rs_allele,'A')>0 or index(rs_allele,'B')>0 then snp='rs001'; > if index(rs_allele,'C')>0 or index(rs_allele,'T')>0 then snp='rs002'; > output; > keep ID snp rs rs_allele; > run; > > > You would need code for each of the haplotypes to put into the correct > snp, but if you only have the two snps per record, this should move through > it pretty quickly. > Steve Denham > Associate Director, Biostatistics > MPI Research, Inc. > Remove spamblock from header, and replace with stevedrd to reply to me. > > > > ----- Original Message ---- > From: Mary <mlhoward@AVALON.NET> > To: SAS-L@LISTSERV.UGA.EDU > Sent: Wednesday, September 10, 2008 12:47:41 PM > Subject: Question on Transpose with two columns needed > > Hi, > > I have data like this: > > ID rs001 rs001_allele rs002 rs002_allele > 001 0 AA 1 CT > 002 1 AB 0 CC > 003 2 BB 2 TT > 004 1 AB 1 CT > > > Where I have 6000 total variables, 3000 for the coded values (0, 1, 2), > and 3000 for the letter equivalents to those coded values (AA, AB, BB). The > variable names for the coded values and their letter equivalants are > identical, EXEMPT that the letter equivalent ends with _allele. > > I need to store this in Oracle, which has a limit of 1000 variables, so I > would have to break up the file in any case. Thus, it seems that transposing > the data from wide to long would be a reasonable approach prior to storage > in Oracle; particularly now that Howard S. has shown this morning that > transpose can be very effective in solving some problems :-) > > However, the numeric and letter variables go together, and so I would like > these on the same row on the transposed long file. I would like to have the > long file look like this: > > ID SNP Numeric_Value Char_value > 001 rs001 0 AA > 001 rs002 1 CT > 002 rs001 1 AB > 002 rs002 0 CC > 003 rs001 2 BB > 003 rs002 2 TT > 004 rs001 1 AB > 005 rs002 1 CT > > > Could someone propose a possible solution to this tranpose problem? > > -Mary >

Mary,

Here is an array approach. I am renaming your _allele variable to fit my array name for flexibilty. If you like you may run Proc dataset to reverse the name. You have just to replace the statements

array rs rs001-rs002; array rsa allele_rs001 - allele_rs002;

with

array rs rs001-rs3000; array rsa allele_rs001 - allele_rs3000;

to meet your needs.

data one; input ID $ rs001 rs001_allele $ rs002 rs002_allele $; datalines; 001 0 AA 1 CT 002 1 AB 0 CC 003 2 BB 2 TT 004 1 AB 1 CT ; run;

proc datasets library = work; modify one; rename rs001_allele = allele_rs001 rs002_allele = allele_rs002; quit; run;

data need(keep = id snp num str); set one; by id; array rs rs001-rs002; array rsa allele_rs001 - allele_rs002; do over rs; snp = vname(rs); num = rs; str = rsa; output; end; run;

Regards,

Muthia Kachirayan


Back to: Top of message | Previous page | Main SAS-L page