Date: Wed, 17 Nov 1999 15:45:40 -0500
Reply-To: Y.Huang@ORGANONINC.COM
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Ya Huang <Y.Huang@ORGANONINC.COM>
Subject: Re: Stepping through a sequence of data until a criteria is met
Content-type: text/plain; charset="iso-8859-1"
Here is my solution, I modified your data so that
I can find at least one case that meet the criteria,
as you can see, it was tested.
-------
data dna73;
retain i 0;
infile cards;
input base $ @@;
i + 1;
if 1 le i le 73 then output dna73;
cards;
T T A A A T C A C T T C C C T T G C
A C A G T T T G G A A G G G A G A G
C A C T T C A C G A C A G A C C T T
G G A A G C A A G A G G A T T G
C A T
;
data dna73;
set dna73;
if base in ('G','C') then cg=1; else cg=0;
options mprint nocenter;
%macro step50;
%global totcg k;
%do %while(&totcg < 25 and &k < 24); /* 24=73-50+1 */
%let k=%eval(&k+1);
proc sql noprint;
select sum(cg) into :totcg
from dna73
where &k <= i <= &k+49;
quit;
%end;
%mend;
%step50;
data dna50;
set dna73;
if &k <=i<=&k+49;
proc print;
run;
------ output ---
1The SAS System
15:31 Wednesday, November 17, 1999 1
OBS I BASE CG
1 3 A 0
2 4 A 0
3 5 A 0
4 6 T 0
5 7 C 1
6 8 A 0
7 9 C 1
8 10 T 0
9 11 T 0
10 12 C 1
11 13 C 1
12 14 C 1
13 15 T 0
14 16 T 0
15 17 G 1
16 18 C 1
17 19 A 0
18 20 C 1
19 21 A 0
20 22 G 1
21 23 T 0
22 24 T 0
23 25 T 0
24 26 G 1
25 27 G 1
26 28 A 0
27 29 A 0
28 30 G 1
29 31 G 1
30 32 G 1
31 33 A 0
32 34 G 1
33 35 A 0
34 36 G 1
35 37 C 1
36 38 A 0
37 39 C 1
38 40 T 0
39 41 T 0
40 42 C 1
41 43 A 0
42 44 C 1
43 45 G 1
44 46 A 0
45 47 C 1
46 48 A 0
47 49 G 1
48 50 A 0
49 51 C 1
50 52 C 1
Ya Huang
-----Original Message-----
From: Stephen Arthur [mailto:sarthur67@YAHOO.COM]
Sent: Wednesday, November 17, 1999 12:18 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Stepping through a sequence of data until a criteria is met
Hi,
I have an ordered list of 73 (this is really
arbitrary) items with values of 'A', 'T', 'C', 'G'.
I want to obtain a continuous sequence of these
letters that is 50 'bases' long. The only criteria
being that the 'C' and 'G' letters make up at least
50% of an accepted sequence. If starting from
position one the sequence fails the 50% criteria test,
I want to be able to read the next 50 bases from
position 2, etc...
T G A G A T C A C T T C C C T T G C
A C A G T T T G G A A G G G A G A G
C A C T T T A T T A C A G A C C T T
G G A A G C A A G A G G A T T G
C A T
I have already taken the meager step (below) to read
in the sequence, and I am currently trying to work out
an algorithm to complete this task.
Any help would be welcomed,
steve
data dna73;
retain i 0;
infile cards;
input base $ @@;
i + 1;
if 1 le i le 73 then output dna73;
cards;
T G A G A T C A C T T C C C T T G C
A C A G T T T G G A A G G G A G A G
C A C T T T A T T A C A G A C C T T
G G A A G C A A G A G G A T T G
C A T
;
__________________________________________________
Do You Yahoo!?
Bid and sell for free at http://auctions.yahoo.com