LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (July 2008, week 5)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Tue, 29 Jul 2008 13:27:12 -0400
Reply-To:     msz03@albany.edu
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Mike Zdeb <msz03@ALBANY.EDU>
Subject:      Re: data mining
Content-Type: text/plain;charset=iso-8859-1

hi ... I was able to get the results you posted (plus I faked another sequence so I had two observations) ...

seq motif rept stpos endpos len seq1-1 ag 2 1 4 45 seq1-2 cg 2 12 15 45 seq1-3 ct 8 16 31 45 seq1-4 ga 2 37 40 45 seq1-6 tcga 2 31 38 45

with this ... but, I also got a SEQ1-5 that was not on your list ...

seq1-5 ctct 4 16 31 45

(***** we all await the 5-lines of code SQL method *****)

data sequence; infile datalines missover; input seq : $4. h : $100.; datalines; seq1 agagattcgatcgcgctctctctctctctctcgatcgagatcgat seq2 agagtctctcga ; run;

data x; set sequence; ll = length(h); s = 0; * start at position 1 in sequence, look for motifs length 2 to 5; do j=2 to 5; do i=1 to length(h)-4; motif = substr(h,i,j); start = i; rpt = 1; do while (trim(motif) eq trim(substr(h,i+j,j))); rpt + 1; i + j; end; if rpt ge 2 then do; end = start + (j*rpt) - 1; s + 1; seqq = catx('-',seq,s); output; end; end; end; keep seqq motif rpt start end ll; run;

proc print data=x; var seqq motif rpt start end ll; run;

-- Mike Zdeb U@Albany School of Public Health One University Place Rensselaer, New York 12144-3456 P/518-402-6479 F/630-604-1475


Back to: Top of message | Previous page | Main SAS-L page