Date: Mon, 28 Sep 1998 06:14:33 -0400
Reply-To: Tra <Tra@PROTEUS.CO.UK>
Sender: "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
From: Tra <Tra@PROTEUS.CO.UK>
Subject: Re: Computation of Pearson residual in Logit Regression
FREQ and WEIGHT are not the same, in my view.
FREQ = 10, means there were 10 independent experimental units which
had these values of var1-var2. FREQ must be an integer (if not then
LOGISTIC will truncate it).
WEIGHT = 10, means that the variance (in some sense) is 10-fold
smaller for this unit than for another with WEIGHT = 1. This could be
useful for modelling over-dispersion. Unlike FREQ, WEIGHT can take
non-integer values.
In your problem, I guess you have frequencies. However, you have
separated the negative and positive responses. You will obtain more
informative residuals if you combine the negative and positive data
and use the
model y/n = VAR1
form of the model statement.
You could use this progarm fragment:
data a2;
merge aaa(where=(var2=0) rename=(var3=nneg))
aaa(where=(var2=1) rename=(var3=npos));
by var1;
drop var2;
ntot = nneg+npos;
run;
proc logistic data=a2;
model npos/ntot = var1/influence;
run;
Personally, I prefer to use GENMOD, although it does not produce the residual
plots within the procedure, you can capture the residual info into datasets and
use proc plot.
proc genmod data=a2;
model npos/ntot = var1/d=binomial residuals obstats;
run;
An advantage of genmod is that it gives you the deviance/DF, which can be used
to decide if over-dispersion is a problem. In your case, the value is very
close to 1, so the data are well modelled by the assumed binomial/logistic
model. The residuals appear to be random, with no outliers.
Hope this helps.
Tim Auton
Proteus Molecular Design Ltd
______________________________ Reply Separator _________________________________
Subject: Computation of Pearson residual in Logit Regression
Author: Tae-Sung Shin <STATSOFT.COM!sts>
Sender: "SAS(r) Discussion"
<AKH-WIEN.AC.AT!SAS-L> at interlink
Date: 24/09/1998 08:41
Received: by ccmail
Received: from Icthus by proteus.co.uk (UUPC/extended 1.11) with UUCP;
Thu, 24 Sep 1998 08:24:25 BST
Return-Path: <owner-sas-l@VM121.akh-wien.ac.at>
Received: from VM.AKH-WIEN.AC.AT (VM121.AKH-Wien.ac.at [149.148.150.2]) by
peters gate.proteus.co.uk (8.6.12/8.6.6) with SMTP id WAA04326 for
<Tra@PROTEUS.CO.UK>; Fri, 25 Sep 1998 22:37:33 GMT
Message-Id: <199809252237.WAA04326@petersgate.proteus.co.uk>
Received: from AKH-WIEN.AC.AT by VM.AKH-WIEN.AC.AT (IBM VM SMTP V2R3)
with BSMTP id 7477; Sat, 26 Sep 98 02:04:15 CED
Received: from AKH-WIEN.AC.AT (NJE origin LISTSERV@AWIIMC12) by AKH-WIEN.AC.AT
(L Mail V1.2c/1.8c) with BSMTP id 1481; Sat, 26 Sep 1998 02:04:15 +0200
Date: Fri, 25 Sep 1998 18:41:56 -0500
Reply-To: Tae-Sung Shin <STATSOFT.COM!sts>
Sender: "SAS(r) Discussion" <AKH-WIEN.AC.AT!SAS-L>
From: Tae-Sung Shin <STATSOFT.COM!sts>
X-ccAdmin: postmaster@Icthus
Subject: Computation of Pearson residual in Logit Regression
To: AKH-WIEN.AC.AT!SAS-L
Hello SAS users,
This is simple question for logit regression.
Say we have the following data & program.
data aaa;
input var1 var2 var3;
cards;
23.840 1.000 4.000
22.690 1.000 5.000
24.770 1.000 17.000
25.840 1.000 21.000
26.790 1.000 15.000
27.740 1.000 20.000
28.670 1.000 15.000
30.410 1.000 14.000
22.690 0.000 9.000
23.840 0.000 10.000
24.770 0.000 11.000
25.840 0.000 18.000
26.790 0.000 7.000
27.740 0.000 4.000
28.670 0.000 3.000
30.410 0.000 0.000
proc logistic;
model var2=var1/influence;
freq var3;run;
proc logistic;
model var2=var1/influence;
weight var3;run;
As far as I know and as in SAS manual, above two procedure should
give us the same pearson & deviance residuals but it's not true.
It seems to me that's because weights are included in the computation
of the residuals, but frequencies are not...
Could anybody explain why? or is it a bug?
Thanks in advance.
Tae-Sung Shin