LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (March 2000, week 5)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Fri, 31 Mar 2000 03:05:43 GMT
Reply-To:     lim@recursive-partitioning.com
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "T.S. Lim" <nospam@NOSPAM.RECURSIVE-PARTITIONING.COM>
Organization: Recursive-Partitioning.com
Subject:      Re: discriminant analysis
Content-Type: Text/Plain; charset=US-ASCII

You're pushing your luck trying to classify 12 classes with only 180 cases. No matter what you do, it'd be quite hard to get a good prediction accuracy. Can you combine some of the classes? I'd recommend a classification tree algorithm combined with boosting/bagging.

In article <38E35D59.87468F8B@soc.kuleuven.ac.be>, Jan.Beyers@soc.kuleuven.ac.be says... > >Hi, > >I am trying to carry out a multiple discriminant analysis on a data set >that has about 40 variables, 12 classes and 180 cases. Some cases (35%) >are not classified because I do not have sufficient a priori knowledge >concerning the class to which they belong. The idea is to find a >discriminant function on basis of which I classify these unclassified >observations and which helps me to re-classify cases that have been put >into the wrong class (given the discriminant function). > >I am a bit uncertain about the procedure I intend to follow and >therefore some advice might be welcome. > >First, in order to get a substantially and statistically interesting >solution I first look under PROC CANDISC for the canonical structure of >these 40 variables given the class variable (the correlations in the set >of 40 variables if relatively high, from 0.20 to 0.60 and sometimes >higher). Since the eigenvalue of the fifth canonical variable is only >0.04 higher than the fourth I decide to retain 4 canonical >discrimination functions. In a second step I uses these four canonical >variables as input for the PROC DISCRIM procedure using the within-group >co-variance matrix. > >Here are my questions: >1. is it appropriate to use the canonical variables which are a result >of PROC CANDISC in PROC DISCRIM (PROC CANDISC does not produce a >classification while the canonical variables are the result of a >discrimination function)? The SAS-manual does not give an example that >goes in this direction. > >2. the problem is that my 40 variables are not all of the same >measurement level: 30 of these are dichotomous and 10 are ordinal (1, 2, >3, 4, 5). Is it a good idea to re-scale the ordinal data via PROC >PRINQUAL before using them in PROC CANDISC? What to do with dichotomous >variables? > >3. in case of unequal within groups-variance-covariance matrices it is >advisable to employ a quadratic discriminant function. The SAS-manual >indicates that PROC DISCRIM computes this function but it remains >unclear to me what this implies in terms of statements. Does it mean >that the measure of squared distance is based on the pooled covariance >matrix instead of the within-group covariance matrix? > >Jan >---------------------------------------------------------- >Jan Beyers >Katholieke Universiteit Leuven >Faculteit Sociale Wetenschappen >Departement Politieke Wetenschappen >Instituut voor Europees Beleid/Afdeling Internationale Betrekkingen >Van Evenstraat 2 B >3000 Leuven >Belgi=EB >tel +32.16.32.31.02 >fax +32.16.32.31.44 >email Jan.Beyers@soc.kuleuven.ac.be >URL http://www.kuleuven.ac.be/facdep/social/pol/ieb/ieb.htm >----------------------------------------------------------

-- T.S. Lim lim@recursive-partitioning.com www.Recursive-Partitioning.com ______________________________________________________________________ Get paid to write a review! http://recursive-partitioning.epinions.com


Back to: Top of message | Previous page | Main SAS-L page