Date: Tue, 29 Jul 2008 13:37:45 -0500
Reply-To: Mary <mlhoward@avalon.net>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Mary <mlhoward@AVALON.NET>
Subject: Re: data mining
Content-Type: text/plain; charset="iso-8859-1"
True, but I don't trust that the user asking the question really knows the subject area, particularly when this user is asking for substrings of 5 long when SNP alleles are 2 each, and therefore haplotypes have an even number of alleles!
That's always the SAS-L dilemma- do you give the user what they asked for, or what you think they should have asked for??
-Mary
----- Original Message -----
From: Mike Zdeb
To: SAS-L@LISTSERV.UGA.EDU
Sent: Tuesday, July 29, 2008 1:19 PM
Subject: Re: data mining
hi ... I figured that'd be easy to suppress results that are not 'reasonable' given knowledge of
the subject area
I think that the hard part is done here, i.e. finding the strings and repeats in one pass through
the data, and like I said, it does agree with the posting except for the extra line
--
Mike Zdeb
U@Albany School of Public Health
One University Place
Rensselaer, New York 12144-3456
P/518-402-6479 F/630-604-1475
> Note that you've got a start position of 16; my solution is assuming that this data is haplotypes,
> and there's two alleles to every marker, so I'm not including strings that go across markers, this
> particular SNP marker would be "tc" in columns 15-16.
>
> So including that extra results depends on whether you want to do what's appropriate for the field
> or not- I don't think it is appropriate to report a haplotype string that splits a SNP in
> genetics, and thus I don't use the strings that start in even columns. The user doesn't say that
> this is genetics data, but given the letters used, it is likely.
>
> -Mary
|