**Date:** Wed, 27 Jun 2007 17:41:06 -0400
**Reply-To:** "Howard Schreier <hs AT dc-sug DOT org>" <nospam@HOWLES.COM>
**Sender:** "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
**From:** "Howard Schreier <hs AT dc-sug DOT org>" <nospam@HOWLES.COM>
**Subject:** Re: SAS douple loop question
**Content-Type:** text/plain; charset=ISO-8859-1
On Tue, 26 Jun 2007 15:17:17 -0000, dayday.sun@GMAIL.COM wrote:

>Thanks for your suggestion. like what you said, my boss asked me to do
>this.
>i used the following codes to find the gene with largest AUC:
>
>%macro logistic;
>%do i=1 %to 5;
>proc logistic data=tsun;
>model patient(event='c')=a&(i);
>output out=out p=p;
>ods output Association=auc;
>run;
>%end;
>%mend;
>%logistic
>
>Now, he asked me to find the pair of gene with largest AUC. At the
>beginning, I wanted to revise the macro and add some loops but someone
>told me it is possible but not likely to use MACRO to realise this
>purpose. she suggested me to use by statement. Do you have any idea
>with by statement?

Here is a small but representative data set:

data wide;
input x1-x4;
cards;
11 21 31 41
12 22 32 42
;

The challenge is to process all combinations of two columns, using BY groups
rather than a macro.

First use nested loops to generate all of the needed observations, together
with a new variable (WHICH) pointing back to source columns:

data long(keep = which a b);
array xx(4) x1-x4;
set wide;
do i = 1 to 3;
do j = i + 1 to 4;
which = cats('x',i) || ' and ' || cats('x',j);
a = xx(i);
b = xx(j);
output;
end;
end;
run;

Now sort:

proc sort data=long; by which;
run;

The resulting data set is suitable for BY processing. To illustrate:

proc print data=long;
by which;
id which;
run;

Output:

which a b

x1 and x2 11 21
12 22

x1 and x3 11 31
12 32

x1 and x4 11 41
12 42

x2 and x3 21 31
22 32

x2 and x4 21 41
22 42

x3 and x4 31 41
32 42

>>
>On Jun 26, 1:42� am, davidlcass...@MSN.COM (David L Cassell) wrote:
>> mitbbs....@GMAIL.COM wrote:
>>
>> >i need to do logistic regression 20 times. The response variable is
>> >Y,
>> >explanatory variables are x1-x20. For each pair of x(i), we will do
>> >the
>> >logistic regression between （x(i),x(j)) and Y. The target is to find
>> >the
>> >pair of x that obtain the largest AUC.
>>
>> This is a really bad approach, and is likely to yield a lousy model.
>> Why do you want to do this? � If the answer is "my boss is holding
>> a shotgun to my wife's head" then we'll understand.
>>
>> If you're doing this for every pair, then you do not have 20 regressions.
>> You have 20*19/2 regressions. � That means that your overall
>> experiment-wise error rate is going to be *horrendous*. � So your
>> final p-values and parameter estimates are going to be biased. � Plus,
>> anything which can mess up your fundamental model assumptions
>> can make this go haywire.
>>
>> I recommend that you re-think that problem.
>>
>> HTH,
>> David
>> --
>> David L. Cassell
>> mathematical statistician
>> Design Pathways
>> 3115 NW Norwood Pl.
>> Corvallis OR 97330