LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (May 2008, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Mon, 12 May 2008 06:20:38 -0400
Reply-To:   Peter Flom <peterflomconsulting@mindspring.com>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   Peter Flom <peterflomconsulting@MINDSPRING.COM>
Subject:   Re: Finding Effectiveness of each variable
Comments:   To: cherish k <hawks_cherish@YAHOO.CO.IN>
Content-Type:   text/plain; charset=UTF-8

You can download GLMSELECT from the SAS site, along with its manual

Peter

-----Original Message----- >From: cherish k <hawks_cherish@YAHOO.CO.IN> >Sent: May 12, 2008 2:26 AM >To: SAS-L@LISTSERV.UGA.EDU >Subject: Re: Finding Effectiveness of each variable > >I do not have access to Proc GLMSELECT. Can somebody tell me a better alternative. I am using SAS 9 for windows. > >Cherish > >cherish k <hawks_cherish@yahoo.co.in> wrote: Arthur, > >The task at hand for me is to rank complaints in order of their importance to number of people churning in a region. But please note that I am not working at customer level. The reason is we see hardly few complainants churn. So treating complaints as general level of discontent among customers we want to see if (identify which) complaints has strong relation with # of people churning in a region. > >An initial test (used a subset of complaints only) yielded a pretty decent model (r sqr = 0.49) by using proc reg - stepwise (I know I shouldn't be using stepwise, but for testing purposes and to see if the hypothesis is working well, I used) and the complaints that came up (or entered the stepwsie) model also made sense. >Since the results are promising I want to pursue this further. I read an article written by Peter and David suggesting the use of proc GLMSELECT as better alternative to proc reg - stepwise using its LASSO and LAR options. > >Can I use proc GLMSELECT in the current context or are there better alternatives? > >Regards, >Cherish > > >Arthur Tabachneck <art297@NETSCAPE.NET> wrote: Cherish, > >I've been reading the discussion you and Peter have been having and, while >it first sounded like a question of measuring variables impact, it is >starting to sound more like a classic churn question. > >Have you looked into possible data mining-type solutions, such as decision >tree, logistic regression, or neural networks modeling. In short, not >looking to discover to the contribution of each variable, but under which >scenarios are people most likely to attrite. > >HTH, >Art >--------- >On Sun, 11 May 2008 08:13:56 +0100, cherish k >wrote: > >>Hi Peter, >> >>Thanks for pointing to the article. >> >>From all the articles what I could gather is its almost impossible to >rank variables if they are more than 10 because of the computation >infeasibility. >> >>But I somehow want to do the following. I have complaints data which has >close to some 300 complaints all together. I want to establish a >correlation of people attriting to the complaints (not necessarily that >the person complaining need to attrite). So i am trying to accumulate the >data at region level and also each complaints at region level. >> >>So I have for every month, region, number of people attrited, 300 >variables (complaints), each having the count of each complaint and I have >data for 1 year time period (which in turn means 12 records per region). >> >>From these available information I want know which are the top reasons >because of which many people attrite? >>Which inturn requires me to know what is the weight (importance) of each >variable which I will multiply with the count of complaints for every >month and know how the complaints are varying (doing) with each month. >> >>One strict no - no method is stepwise regression. Are there any >substitutes? >> >>Can you please point to any approximate method of what I want to do? >> >>Regards, >>Cherish >> >> >>Peter Flom > wrote: Cherish >> >>Item 167824 in the SAS-L archives at http://www.lexjansen.com/sugi/ >> >>or do the following google search >> >>cassell kruskal katz sas-l "relative importance" >> >>Peter >> >> >>-----Original Message----- >>>From: cherish k >>>Sent: May 10, 2008 2:26 PM >>>To: SAS-L@LISTSERV.UGA.EDU >>>Subject: Re: Finding Effectiveness of each variable >>> >>>Can somebody please point me to the article written by David. >>> >>>Thanks >>>Cherish >>> >>>Peter Flom >> wrote: cherish k wrote >>>> >>>>I have a Stats related question. >>>> >>>>I have a dataset with variables (assume 5 IV's) already defined and DV >is the amount of usage at Region level (it is always >= 0). Information is >collected at month wise for each region (we have one years data). So each >region will have 12 entries in the data. >>>> >>>>Now through some means, I want to know which is the most significant >variable out of all the given variables and also the weight of each >variable contributing to the whole equation. >>>> >>>>To achieve this I have done the following. >>>> >>>>Since the variables are not scaled, I first Z transformed all the >variables (including DV), so that they are all on the comparable scale >(but Z transformation was done at each Region level). Then I ran a linear >regression on all the variables (I have as of now run an intercept model, >not sure if no intercept is better or not). >>>> >>>>Since the variables are all on comparable scale, can I take the >estimates as the weights of each variable? >>>> >>>>Y = intercept + sigma(a(i)*x(i); where a(i) is the estimate and x(i) is >the variable >>>> >>>>So now from the following equation a(i) can be positive negative or >zero. >>>> >>>>So can I take the importance of each variable as abs(a(i)) and then >rank order across the variables? >>>> >>>>If the method is wrong can somebody please suggest a way to do it. >>>> >>>>One obvious flaw in the above method is that I am assuming independence >(which is ok as my boss is perfectly fine with it :-) ) >>>> >>>>Are there any other problems in the method. >>>> >>>>Please help me. (if the method is totally wrong kindly tell me if there >is an alternative way of doing?). I am doing it in SAS (so the proc's I >use are proc standard and proc reg). >>>> >>>>If am not clear with the problem, please let me know. >>>> >>> >>> >>>First, from what I can see, you have clustered data, so you need to be >using PROC MIXED; or, since your DV is a positive number, maybe >NLMIXED ... although MIXED may be OK >>> >>>Second, while many people do something similar to what you've suggested >and call it 'effectiveness' or 'importance' of a variable, there are >problems with doing so... David Cassell has written of these, more >articulately than I can, citing a paper by Cochran (I think). >>> >>>The basic problem is that you are standardizing on the data you've got, >and that probably isn't what you want to do >>> >>>HTH >>> >>>Peter >>> >>>Peter L. Flom, PhD >>>Statistical Consultant >>>www DOT peterflom DOT com >>> >>> >>> >>>--------------------------------- >>> Bring your gang together. Do your thing. Find your favourite Yahoo! >Group. >> >> >>Peter L. Flom, PhD >>Statistical Consultant >>www DOT peterflom DOT com >> >> >> >> >>--------------------------------- >> Best Jokes, Best Friends, Best Food. Get all this and more on Best of >Yahoo! Groups. > > > > >--------------------------------- > From Chandigarh to Chennai - find friends all over India. Click here. > > >--------------------------------- > Best Jokes, Best Friends, Best Food. Get all this and more on Best of Yahoo! Groups.

Peter L. Flom, PhD Statistical Consultant www DOT peterflom DOT com


Back to: Top of message | Previous page | Main SAS-L page