LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous (more recent) messageNext (less recent) messagePrevious (more recent) in topicNext (less recent) in topicPrevious (more recent) by same authorNext (less recent) by same authorPrevious page (February 2010, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 4 Feb 2010 23:50:35 -0800
Reply-To:     Oliver Kuss <Oliver.Kuss@MEDIZIN.UNI-HALLE.DE>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Oliver Kuss <Oliver.Kuss@MEDIZIN.UNI-HALLE.DE>
Organization: http://groups.google.com
Subject:      Re: proc logistic: 'out of memory'
Comments: To: sas-l@uga.edu
Content-Type: text/plain; charset=ISO-8859-1

On 4 Feb., 20:40, stringplaye...@YAHOO.COM (Dale McLerran) wrote: > Brian, > > The 1:m matched design is quite easy to implement in NLMIXED. > Note that the data need to be structured with one record for > each stratum. The record must have m+1 variables representing > the case/control status and also m+1 variables representing > each of the predictor variables. In the code below, I assume > that the m+1 response variables are named Y_1-Y_5. Similarly, > I assume that there are two predictor variables (X1 and X2) > which are represented in wide form as X1_1-X1_5 and X2_1-X2_5. > Thus, the data set would appear as follows: > > stratum Y_1 Y_2 ... Y_5 X1_1 X1_2 ... X1_5 X2_1 X2_2 ... X2_5 > 1 1 0 0 36 43 39 97 78 102 > 2 1 0 0 39 38 44 92 81 78 > ... > > Now, for a 1:m design, the conditional likelihood is > > l = exp(x{case}*beta) / > sum from i=1 to m+1 { exp(x{i}*beta) } > > See Hosmer and Lemeshow, Applied Logistic Regression, for a > more detailed description of the conditional likelihood for a > case/control matched design. > > With data constructed as shown above, then we could fit the > conditional logistic regression model for a 1:m (max(m)=4) > with the following code: > > proc nlmixed data=mydata; > parms b_x1 b_x2 0; > array Y_ {5}; > array X1_ {5}; > array X2_ {5}; > do i=1 to 5; > if Y_{i}=1 then num = exp(b_x1*X1_{i} + b_x2*X2_{i}); > end; > denom = 0; > do i=1 to 5; > if y_{i} in (0,1) & nmiss(X1_{i}, X2_{i})=0 then > denom = denom + exp(b_x1*X1_{i} + b_x2*X2_{i}); > end; > if num>0 & denom>0 then ll = log(num / denom); > else ll = 0; > > model ll ~ general(ll); > run; > > Here is an example which constructs a 1:m design with m=4 for > all but the last stratum. In the last stratum, m=2. Data are > initially presented in a narrow format with a record for every > every case or control observation. The conditional logistic > regression is fit to the narrow data using PROC LOGISTIC. > Subsequently, the data are reshaped into a wide form and the > wide form data are passed to the NLMIXED procedure. You can > compare the point estimates and standard errors, as well a > the model fit statistics which are produced by the NLMIXED > procedure against the same statistics generated by the LOGISTIC > procedure. We do get the same results. (Oh happy day!) > > data Data1; > do ID=1 to 63; > do Outcome = 1 to 0 by -1; > count+1; > if count=1 then do; > stratum+1; > y = 1; > end; > else y = 0; > input Gall Hyper @@; > output; > if count=5 then count=0; > end; > end; > datalines; > 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 1 0 0 1 > 0 1 0 0 1 0 0 0 1 1 0 1 0 0 0 0 0 0 0 0 > 1 0 0 0 0 0 0 1 1 0 0 1 1 0 1 0 1 0 0 1 > 0 1 0 0 0 0 1 1 0 0 1 1 0 0 0 1 0 1 0 0 > 0 0 1 1 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 > 0 0 0 1 1 0 0 1 0 0 0 1 1 0 0 0 0 1 0 0 > 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 1 1 1 > 0 0 0 1 0 1 0 0 0 1 0 1 0 1 0 1 0 1 0 0 > 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 1 0 0 0 > 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 1 0 1 > 0 0 0 0 0 1 0 1 0 1 0 0 0 1 0 0 1 0 0 0 > 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 1 1 0 0 > 1 0 1 0 0 1 0 0 1 0 0 0 > ; > > proc logistic data=Data1; > strata stratum; > model y(event='1')=Gall Hyper; > run; > > data data2; > set data1; > by stratum; > array y_ {5}; > array Gall_ {5}; > array Hyper_ {5}; > if first.stratum then do; > pointer=0; > do i=1 to 5; > y_{i} = .; > gall_{i} = .; > hyper_{i} = .; > end; > end; > pointer + 1; > y_{pointer} + y; > gall_{pointer} + gall; > hyper_{pointer} + hyper; > if last.stratum then output; > keep stratum y_: gall_: hyper_:; > run; > > proc nlmixed data=data2; > parms b1 b2 0; > array Y_ {5}; > array X1_ {5} gall_1-gall_5; > array X2_ {5} hyper_1-hyper_5; > do i=1 to 5; > if Y_{i}=1 then num = exp(b1*X1_{i} + b2*X2_{i}); > end; > denom = 0; > do i=1 to 5; > if y_{i} in (0,1) & nmiss(X1_{i}, X2_{i})=0 then > denom = denom + exp(b1*X1_{i} + b2*X2_{i}); > end; > if num>0 & denom>0 then ll = log(num / denom); > else ll = 0; > > model ll ~ general(ll); > run; > > Let me know if this does allow you to fit the 1:m matched design > in the large data set which you have. I would think that it > would, but am not certain as to whether the NLMIXED procedure > stores the entire data in memory or re-reads data as needed for > the iterative process. Storing the data in memory would improve > computational efficiency for an iterative process. However, for > extremely large data sets, you could run out of memory. > > Note that it would be wise to pass only the variables which are > needed for the logistic regression. This would speed up data > throughput, and could also reduce the amount of memory required > to hold data in memory. Thus, it would be a good idea to use a > keep option to restrict the variables that are passed into the > NLMIXED procedure. > > HTH, > > Dale > > --------------------------------------- > Dale McLerran > Fred Hutchinson Cancer Research Center > mailto: dmclerra@NO_SPAMfhcrc.org > Ph: (206) 667-2926 > Fax: (206) 667-5977 > --------------------------------------- > > --- On Wed, 2/3/10, Brian Sauer <brian.sa...@GMAIL.COM> wrote: > > > > > From: Brian Sauer <brian.sa...@GMAIL.COM> > > Subject: Re: proc logistic: 'out of memory' > > To: SA...@LISTSERV.UGA.EDU > > Date: Wednesday, February 3, 2010, 9:02 AM > > Hi Dale, > > I am in a similar situation with Christine, but I have a > > 1:m matching > > problem. I am using a case-crossover design and the > > sas program I > > developed allows the user to select the number of control > > windows - up > > to 4. I didn't consider the limitations of > > conditional logistic when > > designing this program. This program is intended to > > be used an large > > healthcare databases and could easily have 100,000 cases or > > so. Proc > > logistic with a strata statement returns an out of memory > > warning. In > > your previous post you mentioned a NLMIXED solution. > > If you have > > worked this out would you please share it as this is beyond > > my skill > > set at this time. > > Thanks, > > Brian > >http://www.bmi.utah.edu/?module=facultyDetails&personId=8363&orgId=382 > > On Jan 7, 12:31 pm, stringplaye...@YAHOO.COM > > (Dale McLerran) wrote: > > > Christine, > > > > Apparently, you have a case/control design since you > > are > > > using a STRATA statement. You also indicate that > > you have > > > 80,000 case records and 80,000 control records which > > would > > > suggest further that you might have a 1:1 matched > > study. > > > If so, then you can restructure your data so that you > > can > > > use a simple logistic regression. That should > > solve your > > > out-of-memory problem. > > > > So, if you have a 1:1 matched design, here is what you > > can > > > do. First, merge the matched case and control > > records > > > by stratum (subjid) renaming the exposure variable so > > that > > > you have a case exposure variable and a control > > exposure > > > variable. We want to compute the difference > > between the > > > two exposure variable values. At the same time, > > you need > > > to construct a new response variable which has value > > 1 > > > for ALL records. > > > > With the restructured data, you can fit the > > conditional > > > logistic regression model for the 1:1 matched design > > without > > > need for the STRATA statement. You can fit > > the model > > > employing an ordinary logistic regression WITHOUT AN > > > INTERCEPT and using the difference of the exposure > > variables > > > as the predictor variable. > > > > Code for all of this (using the data set and variables > > shown > > > in your post) would be: > > > > proc sort data=outf.tendon_short > > out=tendon_short; > > > by subjid; > > > run; > > > > data matched_logistic_reg; > > > merge > > tendon_short(where=(case_flag=1) > > > rename=(exposure=exposure_case)) > > > tendon_short(where=(case_flag^=1) > > > rename=(exposure=exposure_control)); > > > by subjid; > > > exposure_diff = exposure_case > > - exposure_control; > > > response = 1; > > > run; > > > > proc logistic > > data=matched_logistic_reg; > > > model response = exposure_diff > > / noint; > > > run; > > > > This approach is described by Hosmer and Lemeshow in > > a > > > chapter on matched studies in their book "Applied > > Logistic > > > Regression". Now, if you have M:N matching, it > > will be > > > another whole kettle of fish. But let's start > > out with > > > the simple assumption first because I suspect that it > > will > > > meet your need. > > > > By the way, if you do have M:N matching so that the > > above > > > solution will not work for you, then post back to the > > list > > > specifying the maximum values of M and N across all > > strata. > > > We should be able to write code for fitting a > > conditional > > > logistic regression using the procedure NLMIXED. > > But we > > > would again need to restructure the data to have all > > > of the case and control records which are in a stratum > > on > > > a single record. The NLMIXED procedure would > > require a > > > fair bit of programming to construct the likelihood. > > > I would rather not go there unless it is necessary. > > > > Dale > > > > --------------------------------------- > > > Dale McLerran > > > Fred Hutchinson Cancer Research Center > > > mailto: dmclerra@NO_SPAMfhcrc.org > > > Ph: (206) 667-2926 > > > Fax: (206) 667-5977 > > > --------------------------------------- > > > > --- On Thu, 1/7/10, Christine Peloquin <christinepeloqu...@GMAIL.COM> > > wrote: > > > > > From: Christine Peloquin <christinepeloqu...@GMAIL.COM> > > > > Subject: proc logistic: 'out of memory' > > > > To: SA...@LISTSERV.UGA.EDU > > > > Date: Thursday, January 7, 2010, 7:01 AM > > > > hello. > > > > > i just started a job at BU. i am running proc > > logistic on a > > > > dataset with > > > > 160,000 observations (80,000 cases and 80,000 > > controls) - > > > > and am receiving > > > > an 'out of memory' message. here is the > > code that i > > > > am running: > > > > > proc logistic data=outf.tendon_short; > > > > class exposure (ref='0') / param=ref; > > ... > > Erfahren Sie mehr »- Zitierten Text ausblenden - > > - Zitierten Text anzeigen -

Dale, thank you for sharing another excellent piece of NLMIXED code with us. Maybe someone is interested in another piece of code for analysing 1:m- matched data. Before the days of PROC NLMIXED and the STRATA statement in PROC LOGISTIC, PROC PHREG was first choice for these models (check example 5 in the PHREG documentation). As such the following code PHREG code works also with Dale's data example. Please note that the definition of the variable STATUS is added to the first data step.

Hope that helps, Oliver

data Data1; do ID=1 to 63; do Outcome = 1 to 0 by -1; count+1; if count=1 then do; stratum+1; y = 1; end; else y = 0; input Gall Hyper @@; status=2-y; output; if count=5 then count=0; end; end;

datalines; 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 1 0 0 1 0 1 0 0 1 0 0 0 1 1 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0 1 1 0 1 0 1 0 0 1 0 1 0 0 0 0 1 1 0 0 1 1 0 0 0 1 0 1 0 0 0 0 1 1 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 1 1 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 1 1 1 0 0 0 1 0 1 0 0 0 1 0 1 0 1 0 1 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 1 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 1 0 0 1 0 0 1 0 0 0 ;

proc phreg data=Data1; model status*y(0)=Gall Hyper / ties=discrete; strata stratum; run;


Back to: Top of message | Previous page | Main SAS-L page