| Date: | Wed, 3 Jul 2002 10:16:48 -0700 |
| Reply-To: | Dale McLerran <stringplayer_2@YAHOO.COM> |
| Sender: | "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> |
| From: | Dale McLerran <stringplayer_2@YAHOO.COM> |
| Subject: | Re: CATMOD |
|
| In-Reply-To: | <004001c222a2$fc632ec0$df686620@x6y2g7> |
| Content-Type: | text/plain; charset=us-ascii |
|---|
The reason why SAS employs the highest value of the response as the
reference category in logistic regression has little to do with why
the highest level of a class variable is employed as the reference
category. Rather, it is that SAS constructs linear functions eta1,
eta2, ..., eta<k-1> to obtain the probabilities that the response
is in level 1, level 2, etc. Note that the probability for the
last level is 1-p1-p2-...-p<k-1>. Thus, I am sure that SAS is
coding
eta1 = X*beta1
eta2 = X*beta2
...
eta<k-1> = X*beta<k-1>
and
p1 = exp(eta1) /
(1 + exp(eta1) + exp(eta2) + ... + exp(eta<k-1>))
p2 = exp(eta2) /
(1 + exp(eta1) + exp(eta2) + ... + exp(eta<k-1>))
...
p<k-1> = exp(eta<k-1>) /
(1 + exp(eta1) + exp(eta2) + ... + exp(eta<k-1>))
pk = 1 /
(1 + exp(eta1) + exp(eta2) + ... + exp(eta<k-1>))
SAS just applies eta1 to the first value of the response, eta2 to
the second value of the response, etc.
Formatting will not work to change the reference level when you
fit a generalized logit model using CATMOD. However, if you have
SAS version 8.2, the procedure LOGISTIC will allow you to fit a
generalized logits regression. The DESCENDING option to the LOGISTIC
procedure would allow you to specify the lowest level as the
referent level. The following code illustrates:
proc logistic data=mydata descending;
class <class variables>;
model response = <preds> / link=glogit;
run;
If you do not have version 8.2 and must use CATMOD, then Christianna's
suggestion to simply create an alternative form of the response is
preferable to sorting the data to force a particular referent level.
I might suggest that a SAS view be employed rather than creating
the data as a SAS dataset. With the view, the variable creation
is done at the time the PROC CATMOD is executed. The alternative
form is created in memory and is not saved to disk. As you read
the data, the variable is created employing modest CPU cycles.
To illustrate Christianna's example employing a view, you would
code
proc format;
value altresp
0 = 'Label for RESPONSE=3'
1 = 'Lable for RESPONSE=2'
2 = 'Label for RESPONSE=1'
3 = 'Label for RESPONSE=0';
run;
data myview / view=myview;
set mydata;
altresponse = 3 - response; /* Note I subtract from 3, not 4 */
label altresponse = 'Recode of RESPONSE';
format altresponse altresp.;
run;
proc catmod data=myview;
direct ...
model altresponse = ...;
response logits / ...;
run;
Dale
--- Christianna Williams <Christianna.Williams@YALE.EDU> wrote:
> Mark, Caroline -
> This doesn't answer the why question, but I believe all the SAS procs
> with CLASS statements (e.g. GLM, GENMOD) also by default make the
> highest level the reference. I guess the idea is that for
> classification variables (could even be character) the numeric value
> has no meaning. Sorting would probably not work in the multivariable
> case because the order for one variable would disrupt that for
> another (not to mention the inefficiency of this approach). There are
> ways to get around this with clever formatting and using
> ORDER=FORMAT, but in my experience the simplest thing to do is just
> recode. If you have a 0,1,2,3 variable and you want the 0 to be the
> reference just subtract all values from 4 so that 0 becomes 4, 1
> becomes 3, and so on. This could be done pretty quickly even for
> lots of variables with arrays of similarly coded variables.
>
> Hope this helps,
> Christianna
>
=====
---------------------------------------
Dale McLerran
Fred Hutchinson Cancer Research Center
mailto: dmclerra@fhcrc.org
Ph: (206) 667-2926
Fax: (206) 667-5977
---------------------------------------
__________________________________________________
Do You Yahoo!?
Yahoo! - Official partner of 2002 FIFA World Cup
http://fifaworldcup.yahoo.com
|