```Date: Sat, 10 May 2008 08:45:00 -0400 Reply-To: Peter Flom Sender: "SAS(r) Discussion" From: Peter Flom Subject: Re: Finding Effectiveness of each variable Comments: To: cherish k Content-Type: text/plain; charset=UTF-8 cherish k wrote > >I have a Stats related question. > >I have a dataset with variables (assume 5 IV's) already defined and DV is the amount of usage at Region level (it is always >= 0). Information is collected at month wise for each region (we have one years data). So each region will have 12 entries in the data. > >Now through some means, I want to know which is the most significant variable out of all the given variables and also the weight of each variable contributing to the whole equation. > >To achieve this I have done the following. > >Since the variables are not scaled, I first Z transformed all the variables (including DV), so that they are all on the comparable scale (but Z transformation was done at each Region level). Then I ran a linear regression on all the variables (I have as of now run an intercept model, not sure if no intercept is better or not). > >Since the variables are all on comparable scale, can I take the estimates as the weights of each variable? > >Y = intercept + sigma(a(i)*x(i); where a(i) is the estimate and x(i) is the variable > >So now from the following equation a(i) can be positive negative or zero. > >So can I take the importance of each variable as abs(a(i)) and then rank order across the variables? > >If the method is wrong can somebody please suggest a way to do it. > >One obvious flaw in the above method is that I am assuming independence (which is ok as my boss is perfectly fine with it :-) ) > >Are there any other problems in the method. > >Please help me. (if the method is totally wrong kindly tell me if there is an alternative way of doing?). I am doing it in SAS (so the proc's I use are proc standard and proc reg). > >If am not clear with the problem, please let me know. > First, from what I can see, you have clustered data, so you need to be using PROC MIXED; or, since your DV is a positive number, maybe NLMIXED ... although MIXED may be OK Second, while many people do something similar to what you've suggested and call it 'effectiveness' or 'importance' of a variable, there are problems with doing so... David Cassell has written of these, more articulately than I can, citing a paper by Cochran (I think). The basic problem is that you are standardizing on the data you've got, and that probably isn't what you want to do HTH Peter Peter L. Flom, PhD Statistical Consultant www DOT peterflom DOT com ```

Back to: Top of message | Previous page | Main SAS-L page