Date: Tue, 10 Jul 2007 09:42:06 -0400
Reply-To: Paul Bartells <paul.bartells@TXU.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Paul Bartells <paul.bartells@TXU.COM>
Subject: Re: Need help in SAS CODE
On Tue, 10 Jul 2007 08:37:31 -0400, Nat Wooding <Nathaniel.Wooding@DOM.COM>
wrote:
>If the letter 't' always preceeds the number you can use the scan function
>
>id=scan(geneid,2,'t');
>
>as in the following code (Thanks to Ovidiu for the use of the data step
>code)
>
>data id;
> input geneID $ 1-33 ;
> datalines;
> 100_g_at1
> 100_g_at2
> 100_g_at3
> 100_g_at4
> 100_g_at5
> 100_g_at6
> 100_g_at7
> 100_g_at8
> 100_g_at9
> 100_g_at10
> 100_g_at11
> 100_g_at12
> AFFX-YEL024w/RIP1_at8
> AFFX-YEL024w/RIP1_at9
> AFFX-YEL024w/RIP1_at10
> AFFX-YEL024w/RIP1_at11
> AFFX-YEL024w/RIP1_at12
> AFFX-YEL024w/RIP1_at13
> AFFX-YEL024w/RIP1_at14
> AFFX-YEL024w/RIP1_at15
> AFFX-YEL024w/RIP1_at16
> AFFX-YEL024w/RIP1_at17
> AFFX-YEL024w/RIP1_at18
> AFFX-YEL024w/RIP1_at19
> AFFX-YEL024w/RIP1_at20
> ;
> run;
> data id;
> set id;
> id=scan(geneid,2,'t');* this produces a character variable;
> idn=input(scan(geneid,2,'t'),8.); * this produces a numeric variable;
> drop geneid;
> proc print;
> run;
>
>
>This could be made to work with additional letters by putting them in the
>quotes with the t but note that these letters cannot appears elsewhere in
>geneid.
>
>
>
>Nat Wooding
>Environmental Specialist III
>Dominion, Environmental Biology
>4111 Castlewood Rd
>Richmond, VA 23234
>Phone:804-271-5313, Fax: 804-271-2977
>
>
>
> gsantu_here@YAHOO
> .CO.IN
> Sent by: "SAS(r) To
> Discussion" SAS-L@LISTSERV.UGA.EDU
> <SAS-L@LISTSERV.U cc
> GA.EDU>
> Subject
> Need help in SAS CODE
> 07/09/2007 11:51
> PM
>
>
> Please respond to
> gsantu_here@YAHOO
> .CO.IN
>
>
>
>
>
>
>Hi,
>
>I have a problem with the following data. I have 6000 data values,my
>data set looks like folloing:
>
>geneID
>100_g_at1
>100_g_at2
>100_g_at3
>100_g_at4
>100_g_at5
>100_g_at6
>100_g_at7
>100_g_at8
>100_g_at9
>100_g_at10
>100_g_at11
>100_g_at12
>AFFX-YEL024w/RIP1_at8
>AFFX-YEL024w/RIP1_at9
>AFFX-YEL024w/RIP1_at10
>AFFX-YEL024w/RIP1_at11
>AFFX-YEL024w/RIP1_at12
>AFFX-YEL024w/RIP1_at13
>AFFX-YEL024w/RIP1_at14
>AFFX-YEL024w/RIP1_at15
>AFFX-YEL024w/RIP1_at16
>AFFX-YEL024w/RIP1_at17
>AFFX-YEL024w/RIP1_at18
>AFFX-YEL024w/RIP1_at19
>AFFX-YEL024w/RIP1_at20
>
>Now you can see that last one or two digits of each gene ID has a
>number 1,2,3........,16,.......20 etc.
>
>I need to make a dataset whose first column is the geneID and the 2nd
>column is just the number .
>
>Plz help me, I really need this programing, but I don't know how to do
>that.
>
>
>
>-----------------------------------------
>CONFIDENTIALITY NOTICE: This electronic message contains
>information which may be legally confidential and/or privileged and
>does not in any case represent a firm ENERGY COMMODITY bid or offer
>relating thereto which binds the sender without an additional
>express written confirmation to that effect. The information is
>intended solely for the individual or entity named above and access
>by anyone else is unauthorized. If you are not the intended
>recipient, any disclosure, copying, distribution, or use of the
>contents of this information is prohibited and may be unlawful. If
>you have received this electronic transmission in error, please
>reply immediately to the sender that you have received the message
>in error, and delete it. Thank you.
Nat,
You and I were thinking along the same lines, but I approached it from the
opposite direction (literally). To minimize the chance of getting the
wrong substring with SCAN, do a reverse scan like:
data id;
set id;
id=scan(geneID,-1,'t');
geneID=substr(geneID,1,(length(geneID)-length(id)));
run;
Paul Bartells
TXU Energy
Dallas, TX
|