LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (February 2000)Back to main SPSSX-L pageJoin or leave SPSSX-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 3 Feb 2000 20:04:07 -0300
Reply-To:     hmaletta@overnet.com.ar
Sender:       "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From:         "Hector E. Maletta" <hmaletta@OVERNET.COM.AR>
Subject:      Re: Weighting cases...
Comments: To: Aron Johnson <AJOHNS3@CO.PIERCE.WA.US>
Content-Type: text/plain; charset=us-ascii

Aron: SPSS weights cases as a matter of course. In every SPSS data file there is a hidden variable called $WEIGHT, whose default value is 1 for all cases. For every statistical procedure, SPSS first multiplies sample values by the value of $WEIGHT, then proceeds with the rest of the computation. The default value of $WEIGHT can be altered by designing any variable as a weighting variable. To do this, you should have an appropriate weighting variable in your file (on which more below). Once you have it, you weight by that variable by going to the DATA - WEIGHT CASES menu option, or (if you prefer) issuing the following command in a syntax window: WEIGHT BY X. (where X stands for your weighting variable). The weighting remains in force until you replace X by some other variable, or you return to the default weighting by means of the command WEIGHT OFF (or the appropriate choice in the same menu option). Now to the weights. Ordinarily, such weights are the reciprocals of sampling ratios. If your sample for the ith category is n(i) and the size of the corresponding population is N(i), the sampling ratio is n(i)/N(i) and the weight is X=N(i)/n(i). If you adopt this X as your weighting variable, all the observed values will be multiplied by X. Thus, any frequency table or crosstabulation, for instance, would yield a total of N=sum(N(i)) instead of n=sum(n(i)). This also applies to other procedures such as regression: each case counts for X(i) cases. This approach implies that the sample is representative, i.e., that results from the sample can be extended to the rest of the population within each subpopulation considered (where "subpopulation" means here "population items with the same sampling probability)". If, on the contrary, non response were an effect of some special characteristic of non respondents, then the sample of respondents would not be representative of non respondents. In other words, this applies only to random sampling. I do not know whether this is your case.

If your stratifying variable (in your case, the so-called "categories") capture the main sources of variation in your variables, you need not bother yourself with "weighting for each variable".

A final word on statistical significance. SPSS computes statistical significance based on the WEIGHTED number of cases. If you apply a weighting variable such as X, the weighted number of cases is expanded from n to N. Consequently, SPSS is fooled into believing that your sample is larger than it actually is, and thus yields an overestimate of the true significance (or an underestimate of the sampling error).

To avoid this danger, SPSS cannot offer a thorough solution, since all its procedures assumes data come from a simple random sampling process. For complex samples involving variable sampling ratios, the proper software is WesVar Complex Samples (also distributed by SPSS).

However, there is an approximate solution at hand. You may preserve the different RELATIVE weight of your various "categories" or subsamples, while avoiding the unwelcome expansion (or ABSOLUTE weighting) that converts n into N. This is achieved by using a new weighting variable W = X * n/N. This new variable yields a total frequency count of n (not N), but preserves the differential weighting in relative terms. If the sampling model only involves stratification (as it seems to be your case) this is a good enough solution. If the sampling model also involves clustering (i.e. a selection of subpopulations as a first step before selecting cases within each selected subpopulation), the above solution may overestimate the true significance.

Hector Maletta Universidad del Salvador Buenos Aires, Argentina

Aron Johnson wrote: > > I have never done this before and need a hand if you get a chance. > > I've got data from two schools, a HS ans JrHS, divided into 4 categories, 2 each for each school. (just call em cat1HS, cat2HS, cat1JrHS, cat2JrHS). > There are 40 variables measuring attitudes and opinions. > > The problem is that the number of returned surveys do not match the number of students in each grade. Here is the breakdown: > 7th grade: 395 students, 346 surveys (88% return) > 8th grade: 346 students, 284 surveys (82% return) > 9th grade: 390 students, 324 surveys (83% return) > 10th grade: 583 students, 417 surveys (72% return) > 11th grade: 527 students, 381 surveys (72% return) > 12th grade: 471 students, 316 surveys (67% return) > > As you can see, both the populations and samples are not equal, and therefore I believe that I need to weight the cases in order to be able to do any direct comparisons b/w the groups. Unfortunately i'm not certain how it works. If I weight cases it asks me for a single frequency variable but I want to weight the cases for every variable don't I? Especially if i'm comparing means or perfoming correlations. I cant really compare the means of two groups if I can only choose one variable to weight, that would mean I can only weight, for example, the 9th graders, but not the 10th graders. > > Maybe i'm way off here. Could someone please give me a hand with this? > > Thank you very much. > Aron Johnson


Back to: Top of message | Previous page | Main SPSSX-L page