| Date: | Tue, 5 Feb 2008 13:02:59 -0600 |
| Reply-To: | "Peck, Jon" <peck@spss.com> |
| Sender: | "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU> |
| From: | "Peck, Jon" <peck@spss.com> |
| Subject: | Re: R^2 computation in SPSS |
|
| In-Reply-To: | <B2A95412067E5C4CBA09E2E92D81BF290B38F584@TRX-V01.targetrx.com> |
| Content-Type: | text/plain; charset="utf-8" |
There are two issues here. First, you are using different samples when you go from listwise to pairwise deletion. There could be population characteristics that differ, especially if values are not missing at random. Imagine, for example a situation where men rarely answer some question while women usually answer. Then the pairwise-sample gender proportion will be very different from the listwise one, and if males and females differ in the regression response, the results will be quite different in the two samples.
Second, the residual means are doubtless different. Do Descriptives on them. You will see how the contribution to the R^2 from the residual means differ. You might also look at regression diagnostics.
HTH,
Jon Peck
-----Original Message-----
From: Joanne Tsai [mailto:jtsai@targetrx.com]
Sent: Tuesday, February 05, 2008 11:54 AM
To: Peck, Jon; SPSSX-L@LISTSERV.UGA.EDU
Subject: RE: Re: [SPSSX-L] R^2 computation in SPSS
Hi, Jon
Sorry I didn't make my question clear.
I meant to ask, by trying both listwise and pairwise, I observed that
both sets of estimated coefficients are similar though R^2 seemed to
perform a lot better with the pairwise. I am very curious of the reason
behind it. Can I get better coefficients by using pairwise since it
doesn't throw out any data? And how is R^2 computed by using pairwise,
why is it a lot better than the R^2 done listwise?
-----Original Message-----
From: Peck, Jon [mailto:peck@spss.com]
Sent: Tuesday, February 05, 2008 1:47 PM
To: Joanne Tsai; SPSSX-L@LISTSERV.UGA.EDU
Subject: RE: Re: [SPSSX-L] R^2 computation in SPSS
Regarding the R^2, when there is a constant term in the regression, the
residuals have mean zero, so the sums of squares in the numerator and
denominator match up with correlation coefficients. If there is no
constant term, the residual mean is not zero, so the sums of squares in
both numerator and denominator have a contribution from the mean square,
so the explained/total sum of squares will be closer to one.
Now, here is the quiz for today: construct an ordinary least squares
linear regression example where ALL of the residuals are positive.
Regards,
Jon Peck
-----Original Message-----
From: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU] On Behalf Of
Joanne Tsai
Sent: Tuesday, February 05, 2008 11:30 AM
To: SPSSX-L@LISTSERV.UGA.EDU
Subject: Re: [SPSSX-L] R^2 computation in SPSS
Thank you for the answer.
Is there anyway I can find out why the coeffecient estimates using two
different methods are similar, but R^2 is not. (I will be throwing out
25% of data if using listwise)
I am assuming the model should go through the origin, so the second
question is fully answered. Thank you.
-----Original Message-----
From: Peck, Jon [mailto:peck@spss.com]
Sent: Tuesday, February 05, 2008 11:17 AM
To: Joanne Tsai; SPSSX-L@LISTSERV.UGA.EDU
Subject: RE: Re: [SPSSX-L] R^2 computation in SPSS
If you use pairwise deletion, you can't be sure of the statistical
properties of your regression estimates. Pairwise deletion is rarely
appropriate. In fact, with pairwise deletion you can't even be sure
that the covariance matrix is positive definite. Stick with listwise
deletion.
As for the constant term, think of the model you are testing. Omitting
the constant term is perfectly appropriate if your model implies that
the regression line should go through the origin and you are confident
of linearity. In most cases, though, you should just keep the constant
term and not test it for significance. Forcing the regression line
through the origin does produce an R^2 that isn't really comparable to
the usual one.
HTH,
Jon Peck
-----Original Message-----
From: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU] On Behalf Of
Joanne Tsai
Sent: Tuesday, February 05, 2008 9:01 AM
To: SPSSX-L@LISTSERV.UGA.EDU
Subject: Re: [SPSSX-L] R^2 computation in SPSS
1. Yes, if I do use the listwise, R^2 is similar between Excel and SPSS.
But which R^2 is more reliable? I have 0.85 for pairwise, and 0.65 for
listwise. I'd love to show the higher R^2, but would not want to draw a
wrong conclusion based on it. Or is there any other tool that I can plot
the graph and get the similar 0.85? IS there anywhere I can find more
information in terms of the algorithm for pairwise?
2. When I run the linear regression including the constant, the p-value
on the constant is 0.91, so I would think it's not significant. Can I
remove the constant just based on the P-value I got, is it fair?
Thank you so much for your pointers!
-----Original Message-----
From: ViAnn Beadle [mailto:vab88011@gmail.com]
Sent: Tuesday, February 05, 2008 9:01 AM
To: Joanne Tsai; SPSSX-L@LISTSERV.UGA.EDU
Subject: RE: R^2 computation in SPSS
Try your SPSS analysis again using listwise deletion of missing data.
I'd
guess you'll get the same results as Excel which AFAIK doesn't have an
algorithm for pairwise.
When you do not include the constant, you are testing an entirely
different
model--that the relation is not significantly different from 0. Is that
what
you want?
-----Original Message-----
From: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU] On Behalf Of
Joanne Tsai
Sent: Tuesday, February 05, 2008 6:48 AM
To: SPSSX-L@LISTSERV.UGA.EDU
Subject: R^2 computation in SPSS
Dear Co-listers:
I have recently encountered this following question:
I got a pretty good R^2 estimation using Linear Regression model in
SPSS, 0.85. (Not all sample points have all the dependent as well as
independent variables, so I used the pairwise option.)
But when I plotted the predicted number vs actual number (my dependent
variable) in excel and curve expert, I can only get R^2 around 0.50
I am not sure what's causing this discrepancy, is it due to the
computation in SPSS or because of the fact that it's computed pariwise?
The other question I have is that what can one say about the result when
one uses the linear regression model without including the constant? The
R^2 is higher, but isn't that biased? Can one still use it as a
validation method?
Thank you so much for your help!
Joanne
=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
|