| Date: | Thu, 14 Apr 2005 18:57:02 -0700 |
| Reply-To: | Jim Simmons <emailjimsimmons@YAHOO.COM> |
| Sender: | "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> |
| From: | Jim Simmons <emailjimsimmons@YAHOO.COM> |
| Subject: | Re: Help with Amount of Variance Accounted for in Multivariate |
| In-Reply-To: | 6667 |
| Content-Type: | text/plain; charset=us-ascii |
|---|
Wendong LI,
I have to disagree with David on this one. Canonical
Correlation will give you the variance overlap between a
weighted linear combination of predictors and a weighted linear
combination of criterion variables, ie:
A(1)X(1) + A(2)X(2) + ... +A(n)X(n) = B(1)Y(1) + B(2)Y(2)+...
+ B(n)Y(n)
where X(1)--X(n) are predictors, and Y(1)--Y(n) are criterion
variables.
The A(n)'s and B(n)'s are the weights selected by the Canonical
Correlation procedure to maximize the variance overlap between
the two weighted linear combinations which are referred to as
variates. Once the weights are obtained, applying them to the
X's and Y's in your sample data would result in a single column
of numbers based on the X's and another single column of numbers
based on the Y's.
If you did a standard univariate pearson correlation between
these two columns of numbers you would get the Canonical
Coefficient of Correlation - R(c). The square of this is the
Canonical Coefficient of determination, the single indicator of
overlap between the predictor variate and the criteria variate.
Note that it is symmetrical like a pearson r, but refers to
variate overlap, not variable overlap, where a variate is
defined as a linear combination of variables.
If you want variable overlap, you can get a non-symmetrical
Coefficient of Redundancy from the Redundancy Analysis which, I
believe, can be optionally output along with the Canonical R.
Its square is interpreted as the average of the squared multiple
correlations suggested by Paige, but you get one for predicting
left-hand variables from right-hand variables, and another for
predicting right-hand variables from left-hand variables
(right-hand variables are those on the right side of the '=' in
the above equation).
Neither of these are perfect solutions. The Canonical solution
can produce more than one set of weights and Canonical R's if
the sets of variables are related along more than one dimension
of overlap. Tatsuoka called this a double-barreled principal
components analysis. It all depends on how you choose to
conceptualize the model of your investigation. Decide that
first. Don't let the tail wag the dog by choosing the
statistical procedure first and then contorting your hypotheses
to fit that procedure.
SAS is a top-down programming language, and research design is
also a top-down set of procedures. Do things in the proper
order. Best of luck with your analysis.
Jim Simmons
--- "David L. Cassell" <cassell.david@EPAMAIL.EPA.GOV> wrote:
> Paige Miller <paige.miller@KODAK.COM> sagely replied:
> > Wendong LI wrote:
> > > Hi friends,
> > > I am running a dataset with a set of independent variables
> (categorical
> > > and continuous) and a set of dependent variables (all
> continuous),
> as well
> > > as some control variables (categorical and continuous).
> > > I used PROC REG, regressing all the dependent variables
> and control
> > > variables from all of the independent variables. I want to
> find a
> single
> > > parameter, which can tell how much variance in ALL the
> dependent
> variables
> > > was explained by the predictors. But in the results, there
> are only
> R-
> > > squares, indicating the amount of variance in SEPARATE
> variable
> accounted
> > > for by the regression model.
> > >
> > > So can anyone tell me how I can get a combined indicator
> telling how
> much
> > > variance in ALL of the criteria can be accounted for by
> the
> regression
> > > model?
> >
> > A very simple way to do this is to average the R-squared
> values
> > across your dependent variables.
>
> I wouldn't disagree with Paige.
>
> But in this case, I don't see that you should be doing this at
> all.
> you have what *sounds*like* a Seemingly Unrelated Regressions
> (SUR)
> problem.
> So PROC REG is probably the wrong tool to start with. I would
> look at
> PROC SYSLIN in the SAS/ETS package. And, even in SUR you
> don't
> typically
> generate an 'overall' R squared. How is it meaningful to
> describe "the
> proportion of variability explained across several different
> regressions"?
>
> If you can write down a mathematical formula for what you
> mean, and
> statistically justify why that formula is reasonable, then I
> am sure
> that
> someone can help you further. For right now, I would suggest
> that you
> not try to do this.
>
> David
> --
> David Cassell, CSC
> Cassell.David@epa.gov
> Senior computing specialist
> mathematical statistician
>
|