Date: Thu, 24 Feb 2005 17:46:00 -0800
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "David L. Cassell" <cassell.david@EPAMAIL.EPA.GOV>
Subject: Re: Relative Importance of Explanatory Variables,
Standardized Coefficients, STB option, etc.
Content-type: text/plain; charset=US-ASCII
Talbot Michael Katz <topkatz@MSN.COM> wrote:
> I'm gathering opinions, facts, anecdotes, etc., and what better place
Are foaming-at-the-mouth rants okay too?
> start than with SAS-L? Today's number one question is this:
> What is the "best" way to measure the relative importance of
> variables in a model when the model includes class variables?
There isn't a 'best' way to measure relative importance even before
you add in class variables. See my previous rants and complaints on
this subject, all lovingly preserved in the SAS-L archives.
> Let's start with OLS. Textbooks often say that the standardized
> coefficients ("betas") measure the relative importance, and that's an
And it's wrong. Find me a textbook which says specifically that
"standardized regression coefficients measure the relative importance"
and I'll show you a textbook not written by a statistician.
> appealingly intuitive picture; if all the variables are placed on the
> scale, then the betas show the effects of a unit change in any of the
> variables. This is even reasonable for logistic models. The SAS
It's appealing. Which is why so many statisticians have had to point
that the natural, intuitive interpretation only works in the simplest
cases. If you have orthogonal variables and no interaction terms,
you're good to go. You can get that if you build your own experimental
designs, for instance. Otherwise, you have to deal with all manner of
correlations, multi-collinearity, suppressor variables, measurement
etc. And the 'intuitive' idea falls apart.
> regression procedures will output the betas if the STB option is
> (the Enterprise Miner regression node outputs "Standardized Estimates"
> matter of course). However, it is documented in SAS and has been
> in other threads here, that betas are not computed for class
> which is reasonable because class variables cannot be standardized to
> normal distribution. But certainly the concept of relative importance
> should still apply to class variables, so how do you measure it? This
> question is particularly resonant for Enterprise Miner users, since
> variable selection node tends to turn all significant variables into
You might start with some of the works of William Kruskal. (That's THE
Kruskal of statistics.) His most accessible works on the subject are
out of The American Statistician, which is not a *technical* stats
Look up Vol 41 (Feb 1987) and Vol 43 (Feb 1989). Kruskal and Majors
'relative importance' using the phrase 'inherently vague concept of
neodescriptive statistics'. Like that one? I do. Evan Williams (and
have pointed out that relationships among the independent variables
to question use of association measures from the bivariate marginals.'
In economics, you sometimes see people take 'relative importance' to be
proportional to beta_i * mu_i, where beta is the NON-standardized
coefficient and mu is the corresponding expectation. The interpretation
the product is in terms of relative increase in the expected value when
is increased by 1% of mu_i. So here's a popular setting where the use
the standardized coefficient is considered to be sub-optimal (at least).
this still does nothing to address the probelsm I mention above.
> I have been using the square roots of the Wald chi-square values (I
> them "Wald t values," but I don't know if that's widely accepted
> terminology). For a univariate regression model, I believe this t
> equal to the beta value, so it seems like a reasonable proxy. Do you
> agree? Do you have any other ideas? I found a paper from the
> Decision Sciences, that studies this issue in more depth (the authors
> like the use of betas or t values or p values, etc., for measuring
> importance): (http://home.wi.rr.com/jjrr/dsj.pdf) "A Framework for
> Measuring the Importance of Variables with Applications to Management
> Research and Decision Models," E.S. Soofi, J.J. Retzer, M.
> Decision Sciences, Volume 31, Number 3, Summer 2000.
I'm not a big fan of this, but I really ought to look further into it.
Try looking at Kruskal's 1987 American Statistician paper too.
The problem is that for a univariate regression model, a LOT of things
to work, when they won't work in a more complicated setting. Even
approach (of averaging squared partial correlation coefficients over all
orderings of the independent variables) doesn't work when you introduce
of measurement error and other painfully real problems in actual data.
Kruskal has an example where things fail when the model is as simple as
Y = X plus an uncorrelated noise variable.
Evan Williams once wrote "Concepts of relative importance are generally
meaning unless there is a specific 'natural' ordering of the regression
Everyone insists that there has to be a 'most important' variable, and
but assigning such measures is often guesswork, and bad guesswork at
We all know that stepwise regression procedures fail to come up with
sets of independent variables.. and they're not even trying to rank the
things, much less fully account for all multi-collinearity. And typical
statistical apporaches like these pretend that there's no such thing as
differing sizes of measurement errors, etc., etc.
So, in short, "you can't get there from here." Sorry.
David Cassell, CSC
Senior computing specialist