Date: Thu, 7 Mar 2002 09:22:03 -0800
Reply-To: Cassell.David@EPAMAIL.EPA.GOV
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "David L. Cassell" <Cassell.David@EPAMAIL.EPA.GOV>
Subject: Re: Weights in GLM and the estimated error variance
Content-type: text/plain; charset=us-ascii
Ulrike Groemping <ugroempi@FORD.COM> answered his own question:
> Solved it myself now:
> The weights have to sum to the sample size, for the standard deviation
> to be realistically estimated.
Technically, no.
In order to keep the standard errors on the same scale, you would
indeed want that the weights would sum to something around the
sample size.. which is equivalent to saying that your analysis is
on the population rather than a random subset thereof. That is
what you wanted in your case.
But, in most cases that isn't what is going on. The weights often
represent a 'correction' to transform from the sample back to the
population. A simple random sample of 10% of the population would
have weights of 10: each sample unit represents 10 units of the
actual population. So you need to expand the standard errors upward
in order to reflect your uncertainty about the true population
characteristics. If you look at the formulae for weighted variances
[either the standard, which assumes simple random sampling and hence
independence, or survey sampling formulae, which do not] you will see
that there are a lot of "weight**2" stuck in there to get the unbiased
estimators. That's what is happening inside proc glm too.
HTH,
David
--
David Cassell, CSC
Cassell.David@epa.gov
Senior computing specialist
mathematical statistician
|