Date: Fri, 9 Sep 2005 12:21:49 -1000
Reply-To: Bob Schacht <firstname.lastname@example.org>
Sender: "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From: Bob Schacht <email@example.com>
Subject: Re: Stratified Sampling
Content-type: text/plain; charset=us-ascii; format=flowed
At 11:00 AM 9/9/2005, Chao Yawo wrote:
>I am teaching a sophomore social statistics course.
>I've been covering sampling, especially stratified
>sampling this week.
>The students need some assistance in explaining the
>weighting procedures associated with disproportionate
>stratified sampling. How can I demonstrate this in a
>class with a concrete example.
>Also, is there any guidelines as to how to oversample
>a particular stratum? Assuming i have 2 groups (males
>are 20% and Females=80%). If I am drawing a sample of
>100 students - it means I would end up with 20 males
>and 80 females. If i need to oversample the males,
>what values should i chose - 30, 40, 50? - is the
>choice really arbitrary or is guided by theory or
>I will appreciate your thoughts on this.
Usually the purpose of over-sampling is to generate a sufficient sample
size for population subsamples so that one can calculate prevalence
estimates or conduct statistical tests, etc. An example is that information
on American Indians in National surveys is seldom sufficient to provide
prevalence estimates for almost anything. Find almost any national health
survey that advertises information about minorities, and you'll discover
that "minorities" usually refer only to Black and Hispanic (which are
over-sampled, BTW) but if there is any information on American Indians at
all, there will usually be an asterisk that leads you to a statement that
sample size was insufficient.
What is sufficient? Unfortunately, that depends on what you're looking at.
If, for example, you want to know the prevalence rate of American Indian
Males with HIV, you're looking at a subset of a subset of a subset, and
even the National Health Interview Survey (NHIS) is probably not going to
have enough cases to produce reliable statistics.
So what you need to do is to start with the real target population, and
then use some of the standard Sample Size estimators to tell you how many
cases you'll need to obtain useful conclusions. Then compare the needed n
(call it n1) with the sample you would come up with in a straightforward
random stratified sample of N cases (call it n2). This will give you some
idea of how much over-sampling will be needed.
I'm not sure that this will work well with the kind of examples you are
seeking for classroom use, but maybe someone else can help you think of
Robert M. Schacht, Ph.D. <firstname.lastname@example.org>
Pacific Basin Rehabilitation Research & Training Center
1268 Young Street, Suite #204
Research Center, University of Hawaii
Honolulu, HI 96814