Date: Fri, 31 Mar 2000 03:05:43 GMT
Reply-To: lim@recursive-partitioning.com
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "T.S. Lim" <nospam@NOSPAM.RECURSIVE-PARTITIONING.COM>
Organization: Recursive-Partitioning.com
Subject: Re: discriminant analysis
Content-Type: Text/Plain; charset=US-ASCII
You're pushing your luck trying to classify 12 classes with only 180 cases. No
matter what you do, it'd be quite hard to get a good prediction accuracy. Can
you combine some of the classes? I'd recommend a classification tree algorithm
combined with boosting/bagging.
In article <38E35D59.87468F8B@soc.kuleuven.ac.be>,
Jan.Beyers@soc.kuleuven.ac.be says...
>
>Hi,
>
>I am trying to carry out a multiple discriminant analysis on a data set
>that has about 40 variables, 12 classes and 180 cases. Some cases (35%)
>are not classified because I do not have sufficient a priori knowledge
>concerning the class to which they belong. The idea is to find a
>discriminant function on basis of which I classify these unclassified
>observations and which helps me to re-classify cases that have been put
>into the wrong class (given the discriminant function).
>
>I am a bit uncertain about the procedure I intend to follow and
>therefore some advice might be welcome.
>
>First, in order to get a substantially and statistically interesting
>solution I first look under PROC CANDISC for the canonical structure of
>these 40 variables given the class variable (the correlations in the set
>of 40 variables if relatively high, from 0.20 to 0.60 and sometimes
>higher). Since the eigenvalue of the fifth canonical variable is only
>0.04 higher than the fourth I decide to retain 4 canonical
>discrimination functions. In a second step I uses these four canonical
>variables as input for the PROC DISCRIM procedure using the within-group
>co-variance matrix.
>
>Here are my questions:
>1. is it appropriate to use the canonical variables which are a result
>of PROC CANDISC in PROC DISCRIM (PROC CANDISC does not produce a
>classification while the canonical variables are the result of a
>discrimination function)? The SAS-manual does not give an example that
>goes in this direction.
>
>2. the problem is that my 40 variables are not all of the same
>measurement level: 30 of these are dichotomous and 10 are ordinal (1, 2,
>3, 4, 5). Is it a good idea to re-scale the ordinal data via PROC
>PRINQUAL before using them in PROC CANDISC? What to do with dichotomous
>variables?
>
>3. in case of unequal within groups-variance-covariance matrices it is
>advisable to employ a quadratic discriminant function. The SAS-manual
>indicates that PROC DISCRIM computes this function but it remains
>unclear to me what this implies in terms of statements. Does it mean
>that the measure of squared distance is based on the pooled covariance
>matrix instead of the within-group covariance matrix?
>
>Jan
>----------------------------------------------------------
>Jan Beyers
>Katholieke Universiteit Leuven
>Faculteit Sociale Wetenschappen
>Departement Politieke Wetenschappen
>Instituut voor Europees Beleid/Afdeling Internationale Betrekkingen
>Van Evenstraat 2 B
>3000 Leuven
>Belgi=EB
>tel +32.16.32.31.02
>fax +32.16.32.31.44
>email Jan.Beyers@soc.kuleuven.ac.be
>URL http://www.kuleuven.ac.be/facdep/social/pol/ieb/ieb.htm
>----------------------------------------------------------
--
T.S. Lim
lim@recursive-partitioning.com
www.Recursive-Partitioning.com
______________________________________________________________________
Get paid to write a review! http://recursive-partitioning.epinions.com
|