Date: Mon, 27 Aug 2001 16:26:43 -0700
Reply-To: Dale McLerran <dmclerra@MY-DEJA.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Dale McLerran <dmclerra@MY-DEJA.COM>
Subject: Re: PROC REG, backward elimination, model not full rank
Content-Type: text/plain
>Date: Mon, 27 Aug 2001 17:20:27 -0500
>Reply-To: seymour.d.douglas@ACCENTURE.COM
> Seymour Douglas <seymour.d.douglas@ACCENTURE.COM> Re: PROC REG, backward elimination, model not full rank SAS-L@LISTSERV.UGA.EDU
>Dale,
> The SAS backward selection algorithm does not specify linear
>dependence as a criterion for deleting variables. Remember this is a
>variable selection tool not a model design. The metric that used to
>determine exclusion is change in the RSS. I think full rank is being used
>outside of the contest of linear algebra. Also, since two or more variables
>are involved in linear dependence it is difficult to determine wu=hich of
>the pair or tuple to delete hence the uss of RSS.
>
>Seymour Douglas Ph.D.
>Advanced Analytics Group
>Accenture
Seymour,
I have to disagree with the statement that full rank refers to
something outside the context of linear algebra. SAS uses a sweep
operation to fit most of its regression procedures. The sweep
operation is especially useful where some variant of a stepwise
selection method is being employed. When a column is swept and the
pivot goes below a certain tolerance, then that column is assumed to
be linearly dependent on other columns in the design matrix and the
column and row of the result matrix are set to zero, producing a
generalized inverse matrix.
Starting with a covariance matrix including both X and Y, which we
can partition as
cov = |X'X X'Y|
|Y'X Y'Y|
where X'X is KxK, then the regression of Y on X is obtained by the
operation sweep(cov, {1 2 ... k}). The sweep operation is sequential
so that this is the same as
sweep(...(sweep(sweep(cov),1),2),...),k)
and also reversible so that sweep(sweep(cov,{1 2 ... k}),{1 2 ... k}))
returns cov. However, once a row and column are zeroed out during
the sweep operation, reversibility is lost. I believe that in
fitting a backward elimination regression, the covariance matrix
is swept on the first k columns. If X is less than full rank, then
this produces a generalized inverse solution for inv(X'X) in the
first operation. This is the message which Robert Abelson got in
his listing. By the sequential nature of the sweep operation, I
believe that this should result in a column toward the right side of
the X'X matrix being removed from the model (corresponding to a
variable named later in the model statement being removed).
Note that if we started with X'X is KxK and Y'Y is 1x1, then the
value in the k+1,k+1 position after the sweep operation is a residual
sum of squares. So SAS will sweep the entire covariance matrix on
the first k columns, removing any linearly dependent columns from
X, and obtain the residual sum of squares as you suggest. From that
point on, the reversibility of the sweep operation is employed to
remove variables one at a time from the regression function. A new
value for RSS is obtained as each column is removed. The column
which has the smallest reduction in RSS is the column which is chosen
for removal in this step of the backward elimination. So, columns
are added to the index vector which records which columns should
be swept or removed from the regression.
This is the basic algorithm for performing regressions in SAS. The
utility of this algorithm for fitting a stepwise regression (any of
its variants) can be seen. You can also see that the statement
NOTE: The model is not of full rank. A subset of the model which is of full rank is chosen.
means exactly what it says, that a unique inverse for X'X cannot be
obtained because it is not full rank in the first place.
Dale
---------------------------------------
Dale McLerran
Fred Hutchinson Cancer Research Center
mailto: dmclerra@fhcrc.org
Ph: (206) 667-2926
Fax: (206) 667-5977
---------------------------------------
------------------------------------------------------------
--== Sent via Deja.com ==--
http://www.deja.com/