Date: Wed, 9 Mar 2005 13:06:02 -0800
Reply-To: cassell.david@EPAMAIL.EPA.GOV
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "David L. Cassell" <cassell.david@EPAMAIL.EPA.GOV>
Subject: Re: Q: stats advice
In-Reply-To: <DF22A0377FD2524B9F0368AACCACE10A03415647@m-ncid-1.ncid.cdc.gov>
Content-type: text/plain; charset=US-ASCII
"Miller, Jeremy T." <zyp9@CDC.GOV> wrote:
> I have a national probability sample of hospitals (N=300). Within
these
> hospitals, there are staff (N=7500). The data are weighted.
>
> Within the facility-level data, there are 12 dichotomous questions on
> various policies in force at each hospital. Essentially, the policy
var
> is yes/no depending on whether or not that policy is in effect at that
> hospital.
>
> Within the individual-level data, there is information on whether or
not
> the person has received a vaccination for bug_x. So, vaccination
> coverage for a hospital can be calculated by (N staff vaccinated/ N
> staff).
>
> What I need to do is determine whether a policy or combination of
> policies will influence vaccination coverage in each hospital.
>
> (You're probably having many of the same questions as I had at this
> point: chains are going to have the same policies and thus may cannot
> be treated as independent; plus other similar concerns. I have been
> assured that the sampling method took care of this problem.)
>
> So, if all hospitals were independent, how would one find which, if
any,
> policies affect vaccination coverage.
>
> I was starting with GENMOD, but my hang-up is the dependent var: it's
> not really continuous in that it's bounded (0-1), but it's not
> dichotomous either.
>
> A point in the right direction would be appreciated.
[1] Since you can get the hospital-level vaccination coverage from your
data, you can address this as a survey sample with a binomial response,
and
you can use PROC SURVEYLOGISTIC to do the analysis.
[2] PROC GENMOD won't handle the covariance matrix that the survey
sample
structure imposes, nor will it do the sample-based statistical test that
you
will get from PROC SURVEYLOGISTIC. It won't even give you the right
degrees
of freedom for the tests. So there are other problems with using PROC
GENMOD
here.
[3] If you're working at the hospital level rather than the employee
level,
then your sample design will be simpler. You will have a single-stage
sample,
not a cluster sample or a two-stage sample, as you would end up with if
you
work at the employee level. This will make a difference on how you set
up
the proc and how you get your final sampling weights.
[4] You have "been assured" that the sampling method took care of the
problems
of hospital chains and inter-related hospital policies. Don't take
their
word for it. You need to know the exact process used. Why? Because
you need
to know about any stratification and/or clustering done to get to your
sample
of 300 hospitals. If entire chains were selected, then subsampled, that
introduces
a sampling structure that is different from a sample made across all
chains.
And the primary stage of the sample design is differnt, which will
affect the
correct choices for the STRATA and CLUSTER statements in the analysis
procedure.
The precise sampling design used may also impact the weights, if there
are any
missing values: hospitals which were originally selected but were not in
the
final analysis data set for one reason or another. Get their exact
sampling plan,
on paper, with any changes made noted appropriately. Without this, you
cannot
get the correct design effects for your analysis. Then verify that the
weights
they have provided are meaningful.
HTH,
David
--
David Cassell, CSC
Cassell.David@epa.gov
Senior computing specialist
mathematical statistician
"SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> wrote on 03/09/2005
10:42:01 AM:
> Thanks.
|