LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous (more recent) messageNext (less recent) messagePrevious (more recent) in topicNext (less recent) in topicPrevious (more recent) by same authorNext (less recent) by same authorPrevious page (June 2007, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Wed, 27 Jun 2007 17:27:55 -0700
Reply-To:     David L Cassell <davidlcassell@MSN.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         David L Cassell <davidlcassell@MSN.COM>
Subject:      Re: SAS douple loop question
In-Reply-To:  <1182871037.534725.36010@w5g2000hsg.googlegroups.com>
Content-Type: text/plain; format=flowed

dayday.sun@GMAIL.COM wrote back: > >Thanks for your suggestion. like what you said, my boss asked me to do >this.

Why don't you talk this over with your boss?

Tell him/her that you asked for advice on efficient processing, and you got your posterior reamed out by grouchy statisticians who told you that this is totally unacceptable as a model-building process. Then ask him/her if using unsupported statistical approaches might get him/her chopped up by journal editors, reviewers, auditors, professors, government agencies, ....

>i used the following codes to find the gene with largest AUC: > >%macro logistic; >%do i=1 %to 5; >proc logistic data=tsun; >model patient(event='c')=a&(i); >output out=out p=p; >ods output Association=auc; >run; >%end; >%mend; >%logistic > >Now, he asked me to find the pair of gene with largest AUC. At the >beginning, I wanted to revise the macro and add some loops but someone >told me it is possible but not likely to use MACRO to realise this >purpose. she suggested me to use by statement. Do you have any idea >with by statement?

I see that Howard has shown you how to do that. But I don't recommend using it.

Instead, think about this: your AUC is going to be highly susceptible to any errors or outliers or other wierdness in the data. So you need to check your regression diagnostics for your winning AUC, AND ALSO the losing AUC values, in order to find the regressions which are actually doing a good job of prediction *AND* are meeting the model assumptions.

PROC LOGISTIC *already* has selection methods that would be better than what you are doing. But none of these selection methods will stand up to statistical peer review. Just look at what the experts on STAT-L have to say about such methods. (Hint: it's in the STAT-L FAQ because it's such a problem.)

Furthermore, models like this do not stand up well when you split the data and use part of it for model building and the other part for model validation. You'll find that your process inflates the coefficient of determination, biases the parameter estimates high, biases the p-vlaues low, etc.

If you really need model prediction tools like this, then look into PROC GLMSELECT instead. But you'll always do better in terms of real model prediction if you use expert knowledge instead.

HTH, David -- David L. Cassell mathematical statistician Design Pathways 3115 NW Norwood Pl. Corvallis OR 97330

_________________________________________________________________ PC Magazine’s 2007 editors’ choice for best Web mail—award-winning Windows Live Hotmail. http://imagine-windowslive.com/hotmail/?locale=en-us&ocid=TXT_TAGHM_migration_HM_mini_pcmag_0507


Back to: Top of message | Previous page | Main SAS-L page