Date: Wed, 9 Mar 2005 13:06:02 -0800
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "David L. Cassell" <cassell.david@EPAMAIL.EPA.GOV>
Subject: Re: Q: stats advice
Content-type: text/plain; charset=US-ASCII
"Miller, Jeremy T." <zyp9@CDC.GOV> wrote:
> I have a national probability sample of hospitals (N=300). Within
> hospitals, there are staff (N=7500). The data are weighted.
> Within the facility-level data, there are 12 dichotomous questions on
> various policies in force at each hospital. Essentially, the policy
> is yes/no depending on whether or not that policy is in effect at that
> Within the individual-level data, there is information on whether or
> the person has received a vaccination for bug_x. So, vaccination
> coverage for a hospital can be calculated by (N staff vaccinated/ N
> What I need to do is determine whether a policy or combination of
> policies will influence vaccination coverage in each hospital.
> (You're probably having many of the same questions as I had at this
> point: chains are going to have the same policies and thus may cannot
> be treated as independent; plus other similar concerns. I have been
> assured that the sampling method took care of this problem.)
> So, if all hospitals were independent, how would one find which, if
> policies affect vaccination coverage.
> I was starting with GENMOD, but my hang-up is the dependent var: it's
> not really continuous in that it's bounded (0-1), but it's not
> dichotomous either.
> A point in the right direction would be appreciated.
 Since you can get the hospital-level vaccination coverage from your
data, you can address this as a survey sample with a binomial response,
you can use PROC SURVEYLOGISTIC to do the analysis.
 PROC GENMOD won't handle the covariance matrix that the survey
structure imposes, nor will it do the sample-based statistical test that
will get from PROC SURVEYLOGISTIC. It won't even give you the right
of freedom for the tests. So there are other problems with using PROC
 If you're working at the hospital level rather than the employee
then your sample design will be simpler. You will have a single-stage
not a cluster sample or a two-stage sample, as you would end up with if
work at the employee level. This will make a difference on how you set
the proc and how you get your final sampling weights.
 You have "been assured" that the sampling method took care of the
of hospital chains and inter-related hospital policies. Don't take
word for it. You need to know the exact process used. Why? Because
to know about any stratification and/or clustering done to get to your
of 300 hospitals. If entire chains were selected, then subsampled, that
a sampling structure that is different from a sample made across all
And the primary stage of the sample design is differnt, which will
correct choices for the STRATA and CLUSTER statements in the analysis
The precise sampling design used may also impact the weights, if there
missing values: hospitals which were originally selected but were not in
final analysis data set for one reason or another. Get their exact
on paper, with any changes made noted appropriately. Without this, you
get the correct design effects for your analysis. Then verify that the
they have provided are meaningful.
David Cassell, CSC
Senior computing specialist
"SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> wrote on 03/09/2005