Date: Mon, 7 Mar 2005 16:37:00 -0800 Dale McLerran "SAS(r) Discussion" DomainKeys? See http://antispam.yahoo.com/domainkeys Dale McLerran Re: NLMIXED: REPLICATE statement <20050306141940.61457.qmail@web26406.mail.ukl.yahoo.com> text/plain; charset=us-ascii

> Hi Dale, > I try also a syntax using the binomial distribution > > model COUNT ~ binomial(total,p); > > random u~normal(0,exp(2*Log(s2u))) > with count is the count for Y=1 by region*X1*X2 and total is the > total of the frequency by region*X1*X2 for both Y=1 and Y=0 > > I use the proc freq to create a frequency for both Y=1 and Y=0 by > region*X1*X2 and then I use proc sql to create the total. > > > proc freq data=mydata; > > tables region*X1*X2*Y_dep/ out= test1; > > run; > > /*------------create total for Y_dep-------------*/ > > proc sql; > > create table test2 as select distinct > region,X1,X2,Y_dep,COUNT,sum(COUNT) as total from > > test1 group by region,X1,X2; > > quit; > > /*-----------slect data for Y_Dep=1-------*/ > > data test3; > > set test2; > > where Y_Dep=1; > > run; > > Then I use data test3 with binomial distribution > > for variable "count" with p defined as previuosly, and the variable > "total" > > The model gives very similar results to the approach that you suggest > using a general likelihood > > > Any remarks or questions will be usefuls > > Thanks > Adel

If your variance components are small, then the bernoulli and binomial models will be quite similar. However, if the variance components are large, then these two models can differ greatly. Take a look at the following simulation:

%macro Gen_fit_REmodels(root_var=, b0=, b1=, seed=); /* Generate some correlated binary responses */ data test; b0 = &b0; b1 = &b1; do ID=1 to 50; subject_effect = &root_var*rannor(1234579); do i=1 to 40; a = ceil(2*ranuni(&seed)) - 1; eta = b0 + b1*a + subject_effect; expeta = exp(eta); p = expeta / (1 + expeta); y = ranbin(&seed,1,p); output; end; end; keep ID y a; run;

/* Construct one observation per ID/predictor/response */ /* combination with frequency of combination attached. */ proc sql; create table test1 as select distinct id, a, y, count(*) as freq from test group by id, a, y; quit;

/* Construct total for each ID/predictor combination. */ proc sql; create table test2 as select distinct id, a, y, freq, sum(freq) as total from test1 group by id, a; quit;

/*-----------select data for Y=1-------*/ data test3; set test2; where Y=1; run;

title "Fit random effect model for bernoulli response"; title2 "employing aggregated data"; proc nlmixed data=test1; eta = b0 + b1*a + u; expeta = exp(eta); p = expeta / (1 + expeta); if y=1 then loglike = log(p); else loglike = log(1-p); loglike = freq*loglike; model y ~ general(loglike); random u ~ normal(0, exp(2*logsu)) subject=id;

/* Test whether the parameter estimates are consistent */ /* with the values employed to simulate the data. */ contrast "b0=&b0" b0-&b0; contrast "b1=&b1" b1-&b1; contrast "sqrt(Var(RE))=&root_var" exp(logsu)-&root_var; run;

title "Fit random effect model for binomial response"; proc nlmixed data=test3; eta = b0 + b1*a + u; expeta = exp(eta); p = expeta / (1 + expeta); model freq ~ binomial(total,p); random u ~ normal(0, exp(2*logsu)) subject=id;

/* Test whether the parameter estimates are consistent */ /* with the values employed to simulate the data. */ contrast "b0=&b0" b0-&b0; contrast "b1=&b1" b1-&b1; contrast "sqrt(Var(RE))=&root_var" exp(logsu)-&root_var; run; %mend;

%Gen_fit_REmodels(root_var=0.25, b0=0, b1=0.5, seed=1234579);

Note that for the small variance component employed here, the two approaches produce the same point estimates and standard errors. Since the point estimates and standard errors are identical, then we have identical tests for all contrasts. But if we repeat this experiment with a larger variance for our random effect, we observe quite different results. Run the model employing

%Gen_fit_REmodels(root_var=5, b0=0, b1=0.5, seed=1234579);

You observe that the point estimates and their standard errors are now quite different. Our test that confidence limits for the point estimates include the true values are not rejected when we assume a bernoulli distribution. However, the confidence intervals do not include the true values for the intercept and the square root of the variance component when the binomial model is fit.

Sorry, I don't have time to provide further comment at this time. I'll just suggest that you try other values for the parameters of the model, try other seed values, etc. You should observe that if the data are generated as a bernoulli response subject to random effects, then you cannot assume that a random effects binomial response model provides correct results.

Dale

--------------------------------------- Dale McLerran Fred Hutchinson Cancer Research Center mailto: dmclerra@NO_SPAMfhcrc.org Ph: (206) 667-2926 Fax: (206) 667-5977 ---------------------------------------

__________________________________ Celebrate Yahoo!'s 10th Birthday! Yahoo! Netrospective: 100 Moments of the Web http://birthday.yahoo.com/netrospective/

Back to: Top of message | Previous page | Main SAS-L page