LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (July 2008, week 5)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Tue, 29 Jul 2008 13:37:45 -0500
Reply-To:     Mary <mlhoward@avalon.net>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Mary <mlhoward@AVALON.NET>
Subject:      Re: data mining
Comments: To: msz03@albany.edu
Content-Type: text/plain; charset="iso-8859-1"

True, but I don't trust that the user asking the question really knows the subject area, particularly when this user is asking for substrings of 5 long when SNP alleles are 2 each, and therefore haplotypes have an even number of alleles!

That's always the SAS-L dilemma- do you give the user what they asked for, or what you think they should have asked for??

-Mary ----- Original Message ----- From: Mike Zdeb To: SAS-L@LISTSERV.UGA.EDU Sent: Tuesday, July 29, 2008 1:19 PM Subject: Re: data mining

hi ... I figured that'd be easy to suppress results that are not 'reasonable' given knowledge of the subject area

I think that the hard part is done here, i.e. finding the strings and repeats in one pass through the data, and like I said, it does agree with the posting except for the extra line

-- Mike Zdeb U@Albany School of Public Health One University Place Rensselaer, New York 12144-3456 P/518-402-6479 F/630-604-1475

> Note that you've got a start position of 16; my solution is assuming that this data is haplotypes, > and there's two alleles to every marker, so I'm not including strings that go across markers, this > particular SNP marker would be "tc" in columns 15-16. > > So including that extra results depends on whether you want to do what's appropriate for the field > or not- I don't think it is appropriate to report a haplotype string that splits a SNP in > genetics, and thus I don't use the strings that start in even columns. The user doesn't say that > this is genetics data, but given the letters used, it is likely. > > -Mary


Back to: Top of message | Previous page | Main SAS-L page