LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (August 2001, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Mon, 27 Aug 2001 16:26:43 -0700
Reply-To:     Dale McLerran <dmclerra@MY-DEJA.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Dale McLerran <dmclerra@MY-DEJA.COM>
Subject:      Re: PROC REG, backward elimination, model not full rank
Comments: To: seymour.d.douglas@accenture.com
Content-Type: text/plain

>Date: Mon, 27 Aug 2001 17:20:27 -0500 >Reply-To: seymour.d.douglas@ACCENTURE.COM > Seymour Douglas <seymour.d.douglas@ACCENTURE.COM> Re: PROC REG, backward elimination, model not full rank SAS-L@LISTSERV.UGA.EDU >Dale, > The SAS backward selection algorithm does not specify linear >dependence as a criterion for deleting variables. Remember this is a >variable selection tool not a model design. The metric that used to >determine exclusion is change in the RSS. I think full rank is being used >outside of the contest of linear algebra. Also, since two or more variables >are involved in linear dependence it is difficult to determine wu=hich of >the pair or tuple to delete hence the uss of RSS. > >Seymour Douglas Ph.D. >Advanced Analytics Group >Accenture

Seymour,

I have to disagree with the statement that full rank refers to something outside the context of linear algebra. SAS uses a sweep operation to fit most of its regression procedures. The sweep operation is especially useful where some variant of a stepwise selection method is being employed. When a column is swept and the pivot goes below a certain tolerance, then that column is assumed to be linearly dependent on other columns in the design matrix and the column and row of the result matrix are set to zero, producing a generalized inverse matrix.

Starting with a covariance matrix including both X and Y, which we can partition as

cov = |X'X X'Y| |Y'X Y'Y|

where X'X is KxK, then the regression of Y on X is obtained by the operation sweep(cov, {1 2 ... k}). The sweep operation is sequential so that this is the same as

sweep(...(sweep(sweep(cov),1),2),...),k)

and also reversible so that sweep(sweep(cov,{1 2 ... k}),{1 2 ... k})) returns cov. However, once a row and column are zeroed out during the sweep operation, reversibility is lost. I believe that in fitting a backward elimination regression, the covariance matrix is swept on the first k columns. If X is less than full rank, then this produces a generalized inverse solution for inv(X'X) in the first operation. This is the message which Robert Abelson got in his listing. By the sequential nature of the sweep operation, I believe that this should result in a column toward the right side of the X'X matrix being removed from the model (corresponding to a variable named later in the model statement being removed).

Note that if we started with X'X is KxK and Y'Y is 1x1, then the value in the k+1,k+1 position after the sweep operation is a residual sum of squares. So SAS will sweep the entire covariance matrix on the first k columns, removing any linearly dependent columns from X, and obtain the residual sum of squares as you suggest. From that point on, the reversibility of the sweep operation is employed to remove variables one at a time from the regression function. A new value for RSS is obtained as each column is removed. The column which has the smallest reduction in RSS is the column which is chosen for removal in this step of the backward elimination. So, columns are added to the index vector which records which columns should be swept or removed from the regression.

This is the basic algorithm for performing regressions in SAS. The utility of this algorithm for fitting a stepwise regression (any of its variants) can be seen. You can also see that the statement

NOTE: The model is not of full rank. A subset of the model which is of full rank is chosen.

means exactly what it says, that a unique inverse for X'X cannot be obtained because it is not full rank in the first place.

Dale

--------------------------------------- Dale McLerran Fred Hutchinson Cancer Research Center mailto: dmclerra@fhcrc.org Ph: (206) 667-2926 Fax: (206) 667-5977 ---------------------------------------

------------------------------------------------------------ --== Sent via Deja.com ==-- http://www.deja.com/


Back to: Top of message | Previous page | Main SAS-L page