Date: Thu, 14 Apr 2005 18:57:02 -0700 Jim Simmons "SAS(r) Discussion" Jim Simmons Re: Help with Amount of Variance Accounted for in Multivariate 6667 text/plain; charset=us-ascii

Wendong LI,

I have to disagree with David on this one. Canonical Correlation will give you the variance overlap between a weighted linear combination of predictors and a weighted linear combination of criterion variables, ie:

A(1)X(1) + A(2)X(2) + ... +A(n)X(n) = B(1)Y(1) + B(2)Y(2)+... + B(n)Y(n)

where X(1)--X(n) are predictors, and Y(1)--Y(n) are criterion variables.

The A(n)'s and B(n)'s are the weights selected by the Canonical Correlation procedure to maximize the variance overlap between the two weighted linear combinations which are referred to as variates. Once the weights are obtained, applying them to the X's and Y's in your sample data would result in a single column of numbers based on the X's and another single column of numbers based on the Y's.

If you did a standard univariate pearson correlation between these two columns of numbers you would get the Canonical Coefficient of Correlation - R(c). The square of this is the Canonical Coefficient of determination, the single indicator of overlap between the predictor variate and the criteria variate.

Note that it is symmetrical like a pearson r, but refers to variate overlap, not variable overlap, where a variate is defined as a linear combination of variables.

If you want variable overlap, you can get a non-symmetrical Coefficient of Redundancy from the Redundancy Analysis which, I believe, can be optionally output along with the Canonical R. Its square is interpreted as the average of the squared multiple correlations suggested by Paige, but you get one for predicting left-hand variables from right-hand variables, and another for predicting right-hand variables from left-hand variables (right-hand variables are those on the right side of the '=' in the above equation).

Neither of these are perfect solutions. The Canonical solution can produce more than one set of weights and Canonical R's if the sets of variables are related along more than one dimension of overlap. Tatsuoka called this a double-barreled principal components analysis. It all depends on how you choose to conceptualize the model of your investigation. Decide that first. Don't let the tail wag the dog by choosing the statistical procedure first and then contorting your hypotheses to fit that procedure.

SAS is a top-down programming language, and research design is also a top-down set of procedures. Do things in the proper order. Best of luck with your analysis.

Jim Simmons

--- "David L. Cassell" <cassell.david@EPAMAIL.EPA.GOV> wrote: > Paige Miller <paige.miller@KODAK.COM> sagely replied: > > Wendong LI wrote: > > > Hi friends, > > > I am running a dataset with a set of independent variables > (categorical > > > and continuous) and a set of dependent variables (all > continuous), > as well > > > as some control variables (categorical and continuous). > > > I used PROC REG, regressing all the dependent variables > and control > > > variables from all of the independent variables. I want to > find a > single > > > parameter, which can tell how much variance in ALL the > dependent > variables > > > was explained by the predictors. But in the results, there > are only > R- > > > squares, indicating the amount of variance in SEPARATE > variable > accounted > > > for by the regression model. > > > > > > So can anyone tell me how I can get a combined indicator > telling how > much > > > variance in ALL of the criteria can be accounted for by > the > regression > > > model? > > > > A very simple way to do this is to average the R-squared > values > > across your dependent variables. > > I wouldn't disagree with Paige. > > But in this case, I don't see that you should be doing this at > all. > you have what *sounds*like* a Seemingly Unrelated Regressions > (SUR) > problem. > So PROC REG is probably the wrong tool to start with. I would > look at > PROC SYSLIN in the SAS/ETS package. And, even in SUR you > don't > typically > generate an 'overall' R squared. How is it meaningful to > describe "the > proportion of variability explained across several different > regressions"? > > If you can write down a mathematical formula for what you > mean, and > statistically justify why that formula is reasonable, then I > am sure > that > someone can help you further. For right now, I would suggest > that you > not try to do this. > > David > -- > David Cassell, CSC > Cassell.David@epa.gov > Senior computing specialist > mathematical statistician >

Back to: Top of message | Previous page | Main SAS-L page