Date: Tue, 28 Jul 2009 10:30:31 -0700
Reply-To: Dale McLerran <stringplayer_2@YAHOO.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Dale McLerran <stringplayer_2@YAHOO.COM>
Subject: Re: sample size question when using proc mixed
Content-Type: text/plain; charset=iso-8859-1
--- On Tue, 7/28/09, elodie <elodie.gillain@GMAIL.COM> wrote:
> From: elodie <elodie.gillain@GMAIL.COM>
> Subject: sample size question when using proc mixed
> To: SAS-L@LISTSERV.UGA.EDU
> Date: Tuesday, July 28, 2009, 9:07 AM
> Hi everyone,
> I am running a (longitudinal) proc mixed analysis. The table below
> gives the sample size at each timepoint from time=0 through time=4.
> time Samplesize
> 0 81
> 1 43
> 2 33
> 3 24
> 4 24
> The output of proc mixed says there are 51 subjects in the analysis,
> with a max of 5 observations per subject. The output also says Number
> of Observations Read is 205, and the Number of Observations Used is
> I am not sure how there can be 51 subjects used for this analysis. I
> am guessing that it is OK for a subject to have data at time 0, but
> not at time 3 for example. Am I right?
> I greatly appreciate your help.
In the case of a longitudinal analysis, a subject is the
person/thing/unit which is measured on multiple occasions. In
your data, there is at least one subject who has observations
at all five time points (T=0,1,2,3,4).
Your PROC MIXED code for this longitudinal analysis should have
a REPEATED statement structured something like the following:
repeated time_var / subject=subjID type=ar(1);
Whatever you have in place of the subjID variable above is the
subject in your analysis. There are 51 unique values of subjID
in your data, or perhaps 51 unique values of subjID after
removing observations with missing values for any of the
variables that are named as part of the model: the subject ID,
time, response, and predictor variables.
The power of the MIXED procedure is that you do not have to
lose all information from subjects who have some missing data.
Thus, subjects who have an observation only at T=0 can contribute
to estimates of the mean and variance at T=0. Assuming that
the reason for missing data at subsequent time points is not
related to the missing response values (that is, that the
missing values are MAR - missing at random), then it is not
only permissible, but beneficial to be able to include subjects
who have incomplete information.
Fred Hutchinson Cancer Research Center
Ph: (206) 667-2926
Fax: (206) 667-5977