LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (October 2005, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Tue, 25 Oct 2005 20:24:13 -0400
Reply-To:   Peter Flom <flom@NDRI.ORG>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   Peter Flom <flom@NDRI.ORG>
Subject:   Re: Skewed variables & surveys
Comments:   To: not_used@COMCAST.NET
Content-Type:   text/plain; charset=US-ASCII

>> NOT_USED <not_used@COMCAST.NET> 10/25/05 7:11 PM >>> <<< I don't understand why some of the answers to this question imply that OLS is invalid if the dependent variable is skewed or not continuous.

OLS is based on the distribution of errors for the correct linear model, so everything is relative to the independent variables. Can't say much until we know about these variables. >>>

If the DV is not continuous (or close to it) OLS is the wrong model. A 7 point variable is not enough

First, the residuals cannot be continuous, and therefore cannot be normal. Second, the predicted values can be outside the range of the the DV (less than 1 or more than 7) which makes no sense.

there are other reasons too, but that's enough

<<<< Also-- just because most of the survey answers are 5 6 or 7 does not make the variable skewed-- could still be symmetric around 6 or even 5.5 >>>

Actually, no, it couldn't. First of all, in your original post you said

75% give scores of 6 or 7

If all 75% were 6 Then the median is really not determinable, but it's at least 6. The mean can't possibly be 6, because there are none above 6

hmmmmm

What if all 75% were 7? then the median is 7 and the mean can't be 7

hmmmmm hmmmmm

What if 37% were 6 and 38% were 7? Then the median is 6.5, and the mean can't be 6.5

If the median isn't = to the mean, the distribution is skewed.

And if Y is skewed so will the residuals be. (I recall seeing a proof in Faraway's book on linear models, but I don't recall the details). I tested it out, though, with a DV as you describe and a nearly perfect linear model (one IV that was the DV plus some random noise) and the residuals are, sure enough, skewed.

OLS is simply NOT THE RIGHT APPROACH.

The right approach is, as David and I (and I think others) have told you, to do ordinal or multinomial logistic, preferably using SURVEYLOGISTIC if you have the information, or using LOGISTIC if you do not.

Peter

Peter L. Flom, PhD Assistant Director, Statistics and Data Analysis Core Center for Drug Use and HIV Research National Development and Research Institutes 71 W. 23rd St http://cduhr.ndri.org www.peterflom.com New York, NY 10010 (212) 845-4485 (voice) (917) 438-0894 (fax)


Back to: Top of message | Previous page | Main SAS-L page