LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (July 2010, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Wed, 21 Jul 2010 12:36:37 -0700
Reply-To:   Steve Denham <stevedrd@YAHOO.COM>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   Steve Denham <stevedrd@YAHOO.COM>
Subject:   Re: Weighted Least Squares Question in SAS
Comments:   To: Jon Matthews <jmatthews7101@yahoo.com>
In-Reply-To:   <995375.41905.qm@web120417.mail.ne1.yahoo.com>
Content-Type:   text/plain; charset=iso-8859-1

Jon,

it is not the "perfectly correlated" values that drive the R**2 value.

Consider the following:

data temp; input x y w1 w2 w3; cards; 1 1 1 1 1 2 2 1 1 .1 3 4 1 .1 1 ; run; proc reg data=work.temp; weight w1; model y=x; run; quit; proc reg data=work.temp; weight w2; model y=x; run; quit; proc reg data=work.temp; weight w3; model y=x; run; quit;

where now we underweight the middle value using w3. And what is the R**2 for this? It is 0.9947, the highest of the three.

The catch is that a regression line is determined by its endpoints, much more than the midpoints, especially when you have a small sample size.

Consider for example one more regression, where we stretch the x axis out to 10, and leave the three weights as before:

data temp2; input x y w1 w2 w3; cards; 1 1 1 1 1 2 2 1 1 .1 10 4 1 .1 1 ; run; proc reg data=work.temp2; weight w1; model y=x; run; quit; proc reg data=work.temp2; weight w2; model y=x; run; quit; proc reg data=work.temp2; weight w3; model y=x; run; quit;

R**2 values are now 0.9472, 0.7879, and 0.9909.

By underweighting the extreme (x,y) value observation, we "miss" the y value with our predicted value, and increase the residual error, thus decreasing the R**2. By underweighting the mid value, we increase the accuracy at the ends--but not as much as when the extreme value doesn't have as much leverage.

All in all, weighted least squares is a muckety bog that hides many dangers. If you are aware of them, you get to use the shortcut across the island, but if not, you will end up being stuck. I know some professor of mine used that analogy.

Steve Denham Associate Director, Biostatistics MPI Research, Inc.

----- Original Message ---- From: Jon Matthews <jmatthews7101@YAHOO.COM> To: SAS-L@LISTSERV.UGA.EDU Sent: Wed, July 21, 2010 1:42:38 PM Subject: Weighted Least Squares Question in SAS

Hi,

I am using SAS to create a weighted least squares regression, and I've run into a question about the coefficient of determination when using weighted least squares regression. Here is some code I wrote: data work.temp; input x y w; cards; 1 1 1 2 2 1 3 4 1 ; run; proc reg data=work.temp; weight w; model y=x; run; quit;

Since the weights are all 1, this is the same as unweighted regression and this gives me an R-squared of .9643. Note that in my data, the first two observations are perfectly correlated while the third is not. Now, if I re-weight the last observation to place less weight on it since it's not perfectly corrected with the others and rerun the weighted least squares regression, I get a lower R-squared:

input x y w; cards; 1 1 1 2 2 1 3 4 .1 ; run; proc reg data=work.temp; weight w; model y=x; run; quit;

R-squared now equals .9391.

This does not seem intuitive. Since I'm now underweighting the only non-perfectly correlated observation, shouldn't R-squared improve or am I missing something?

Thanks for any insight.


Back to: Top of message | Previous page | Main SAS-L page