LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (November 1999, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Wed, 17 Nov 1999 18:18:18 GMT
Reply-To:   dkb@CIX.COMPULINK.CO.UK
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   dkb@CIX.COMPULINK.CO.UK
Organization:   CIX - Compulink Information eXchange
Subject:   Re: Stepping through a sequence of data until a criteria is met

Stephen Arthur asks:

> > Hi, > > I have an ordered list of 73 (this is really > arbitrary) items with values of 'A', 'T', 'C', 'G'. > > I want to obtain a continuous sequence of these > letters that is 50 'bases' long. The only criteria > being that the 'C' and 'G' letters make up at least > 50% of an accepted sequence. If starting from > position one the sequence fails the 50% criteria test, > I want to be able to read the next 50 bases from > position 2, etc... > Steve,

This might help - based on your code:

/* Tested code Win 95 6.12 TS020 */

data dna73(keep = base i roller) start_at(keep=startpos);

retain i 0;

infile cards; input base $ @@; if base in ('C','G') then value = 1; else value = 0; i + 1;

retain roller 0; dropoff = lag50(value); roller = sum(roller,value,-dropoff);

if 1 le i le 73 then output dna73; if roller ge 25 then do; startpos = i - 49; output start_at; end; cards;

T G A G A T C A C T T C C C T T G C A C A G T T T G G A A G G G A G A G C A C T T T A T T A C A G A C C T T G G A A G C A A G A G G A T T G C A T ; run;

data dna50; retain seqnum 0; drop startpos; set start_at; seqnum + 1; do pointer = startpos to startpos+49; set dna73 point=pointer; output; end; run;

The variable called 'roller' is tracking how many C and G bases have come up in the last fifty observations, and if it hits 25 we write out where the sequence started. Then we can create DNA50 which contains a listing of all the valid sequences (there aren't any, but if you drop the criterion to 24, you get four). A bit of macro code could be wrapped round this to determine how many bases you scan and what the success criterion is.

Dave

.


Back to: Top of message | Previous page | Main SAS-L page