Date: Wed, 27 Jun 2007 17:41:06 -0400
Reply-To: "Howard Schreier <hs AT dc-sug DOT org>" <nospam@HOWLES.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Howard Schreier <hs AT dc-sug DOT org>" <nospam@HOWLES.COM>
Subject: Re: SAS douple loop question
Content-Type: text/plain; charset=ISO-8859-1
On Tue, 26 Jun 2007 15:17:17 -0000, dayday.sun@GMAIL.COM wrote:
>Thanks for your suggestion. like what you said, my boss asked me to do
>i used the following codes to find the gene with largest AUC:
>%do i=1 %to 5;
>proc logistic data=tsun;
>output out=out p=p;
>ods output Association=auc;
>Now, he asked me to find the pair of gene with largest AUC. At the
>beginning, I wanted to revise the macro and add some loops but someone
>told me it is possible but not likely to use MACRO to realise this
>purpose. she suggested me to use by statement. Do you have any idea
>with by statement?
Here is a small but representative data set:
11 21 31 41
12 22 32 42
The challenge is to process all combinations of two columns, using BY groups
rather than a macro.
First use nested loops to generate all of the needed observations, together
with a new variable (WHICH) pointing back to source columns:
data long(keep = which a b);
array xx(4) x1-x4;
do i = 1 to 3;
do j = i + 1 to 4;
which = cats('x',i) || ' and ' || cats('x',j);
a = xx(i);
b = xx(j);
proc sort data=long; by which;
The resulting data set is suitable for BY processing. To illustrate:
proc print data=long;
which a b
x1 and x2 11 21
x1 and x3 11 31
x1 and x4 11 41
x2 and x3 21 31
x2 and x4 21 41
x3 and x4 31 41
>On Jun 26, 1:42� am, davidlcass...@MSN.COM (David L Cassell) wrote:
>> mitbbs....@GMAIL.COM wrote:
>> >i need to do logistic regression 20 times. The response variable is
>> >explanatory variables are x1-x20. For each pair of x(i), we will do
>> >logistic regression between （x(i),x(j)) and Y. The target is to find
>> >pair of x that obtain the largest AUC.
>> This is a really bad approach, and is likely to yield a lousy model.
>> Why do you want to do this? � If the answer is "my boss is holding
>> a shotgun to my wife's head" then we'll understand.
>> If you're doing this for every pair, then you do not have 20 regressions.
>> You have 20*19/2 regressions. � That means that your overall
>> experiment-wise error rate is going to be *horrendous*. � So your
>> final p-values and parameter estimates are going to be biased. � Plus,
>> anything which can mess up your fundamental model assumptions
>> can make this go haywire.
>> I recommend that you re-think that problem.
>> David L. Cassell
>> mathematical statistician
>> Design Pathways
>> 3115 NW Norwood Pl.
>> Corvallis OR 97330