--- On Thu, 6/19/08, Ryan <Ryan.Andrew.Black@GMAIL.COM> wrote:
> From: Ryan <Ryan.Andrew.Black@GMAIL.COM>
> Subject: Re: GLIMMIX Question - Dependent Observations
> To: SAS-L@LISTSERV.UGA.EDU
> Date: Thursday, June 19, 2008, 7:29 PM
> Thank you, Dale! You've helped me with so many questions
> already. I
> hope it's okay if I ask you two more...
> 1. The dichotomous variable in my model was collected at the subjects
> level (not city level), and the categories are not mutually exclusive--
> there were people who fit into both categories. I'm not sure how to
> handle this issue--one option I thought was to raise it to the city
> level, and code the city as a particular category based on the higher
> rate (by the way, DV (rate) and the continuous IV are functions of
> data at the city level). So if the rate is higher in category one,
> then that city is assigned category one. Would that work? Would you
> recommend an alternative approach that can maintain the variable at
> the city level?
> 2. As mentioned above, the DV (rate) and the continuous IV in my model
> are functions of aggregated data. After you mentioned that a city with
> less observations would be weighted less, I realized that all cases
> would actually have equal weights at the city level. Is there a way to
> deal with unequal Ns per case while maintaining city as the unit of
> analysis for all variables?
> Anyway, I realize I've asked much of you. I completely understand if
> you're too busy to respond. I appreciate your help. It's been a true
> learning experience!
I'm confused now. I don't know how your dependent variable (collected
at the individual level) can take on two values and those two values
are not mutually exclusive. It sounds to me as though there are two
boxes that the respondent can check off, and that there are no
constraints that if they check box 1 then they cannot check box 2
(and vice versa).
To me, that would represent two (almost certainly correlated) binary
responses. I would be looking at modeling the binary responses at the
individual level with the person-specific IV as a predictor. At the
same time, you can allow for variation across cities in the proportion
who respond positively. In addition to allowing for the person-specific
IV to relate directly to the person-specific response, this analysis
preserves information about differences in number of subjects in
the different cities. A city with only 10 respondents will have a
city random effect estimate which has a much larger standard error
than a city with 1000 respondents.
If I am correct that there are two check boxes and hence two binary
responses, then an appropriate model for check box 1 would be
proc glimmix data=muydata;
model box1 = x / s dist=binary;
random intercept / subject=city
A similar model could be fit for check box 2 as a response. One could
model check box 1 and check box 2 responses together as correlated
within individuals. There may be quite a few ways that such an analysis
could be constructed. It is not clear given the spatial covariance
structure assumed for the city random effects along with correlated
responses within individuals just what the appropriate code would be
for such a model.
Statisticians have the habit of adding confusion to seemingly simple
problems, don't we? Are you more or less confused than at the start
of this dialogue?
Fred Hutchinson Cancer Research Center
Ph: (206) 667-2926
Fax: (206) 667-5977