Date: Wed, 10 Sep 2008 13:08:03 -0500
Reply-To: Mary <mlhoward@avalon.net>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Mary <mlhoward@AVALON.NET>
Subject: Re: Question on Transpose with two columns needed
Content-Type: text/plain; charset="iso-8859-1"
No, as I stated I have 3000 SNP's (a numeric and a character variable that match), for 6000 variables total on each record, so this approach requires too much coding, it seems. Thanks for trying.
-Mary
----- Original Message -----
From: Steve Denham
To: Mary ; SAS-L@LISTSERV.UGA.EDU
Sent: Wednesday, September 10, 2008 12:22 PM
Subject: Re: Question on Transpose with two columns needed
Here's what I came up with. It's kind of brute force, but it did give what you asked.
data one;
input ID $ rs001 rs001_allele $ rs002 rs002_allele $;
datalines;
001 0 AA 1 CT
002 1 AB 0 CC
003 2 BB 2 TT
004 1 AB 1 CT
;
data twoa;set one;
rs=rs001;
rs_allele=rs001_allele;
if index(rs_allele,'A')>0 or index(rs_allele,'B')>0 then snp='rs001';
if index(rs_allele,'C')>0 or index(rs_allele,'T')>0 then snp='rs002';
output;
rs=rs002;
rs_allele=rs002_allele;
if index(rs_allele,'A')>0 or index(rs_allele,'B')>0 then snp='rs001';
if index(rs_allele,'C')>0 or index(rs_allele,'T')>0 then snp='rs002';
output;
keep ID snp rs rs_allele;
run;
You would need code for each of the haplotypes to put into the correct snp, but if you only have the two snps per record, this should move through it pretty quickly.
Steve Denham
Associate Director, Biostatistics
MPI Research, Inc.
Remove spamblock from header, and replace with stevedrd to reply to me.
----- Original Message ----
From: Mary <mlhoward@AVALON.NET>
To: SAS-L@LISTSERV.UGA.EDU
Sent: Wednesday, September 10, 2008 12:47:41 PM
Subject: Question on Transpose with two columns needed
Hi,
I have data like this:
ID rs001 rs001_allele rs002 rs002_allele
001 0 AA 1 CT
002 1 AB 0 CC
003 2 BB 2 TT
004 1 AB 1 CT
Where I have 6000 total variables, 3000 for the coded values (0, 1, 2), and 3000 for the letter equivalents to those coded values (AA, AB, BB). The variable names for the coded values and their letter equivalants are identical, EXEMPT that the letter equivalent ends with _allele.
I need to store this in Oracle, which has a limit of 1000 variables, so I would have to break up the file in any case. Thus, it seems that transposing the data from wide to long would be a reasonable approach prior to storage in Oracle; particularly now that Howard S. has shown this morning that transpose can be very effective in solving some problems :-)
However, the numeric and letter variables go together, and so I would like these on the same row on the transposed long file. I would like to have the long file look like this:
ID SNP Numeric_Value Char_value
001 rs001 0 AA
001 rs002 1 CT
002 rs001 1 AB
002 rs002 0 CC
003 rs001 2 BB
003 rs002 2 TT
004 rs001 1 AB
005 rs002 1 CT
Could someone propose a possible solution to this tranpose problem?
-Mary