LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (July 2008)Back to main SPSSX-L pageJoin or leave SPSSX-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 10 Jul 2008 05:54:36 -0700
Reply-To:     Linda Zientek <lrzientek@yahoo.com>
Sender:       "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From:         Linda Zientek <lrzientek@yahoo.com>
Subject:      Re: insufficient N for factor analysis
Comments: To: Bob Schacht <schacht@hawaii.edu>
In-Reply-To:  <6.2.1.2.2.20080709103543.04052cf0@hawaii.edu>
Content-Type: text/plain; charset=iso-8859-1

In addition to  the recommended ratios of 10 to 20 people per variable, the following has also been suggested:   Some Monte Carlo simulation research (Guadagnoli & Velincer, 1998) suggest ... replicable factors tend to be estimated if: 1. factors are each defined by four or more measured variables with structure coefficients each great than .6 [in absolute value], regardless or sample size; or 2. factors are each defined with 10 or more structure coefficients each around .4[in absolute value], if sample size is greater than 150; or 3. sample size is at least 300." (Thompson, 2004, p. 24)   Linda   Thompson, B. (2004). Exploratory and confirmatory factor analysis: Understanding concepts and applications. Washington, DC: American Psychological Association.

--- On Wed, 7/9/08, Bob Schacht <schacht@hawaii.edu> wrote:

From: Bob Schacht <schacht@hawaii.edu> Subject: Re: insufficient N for factor analysis To: SPSSX-L@LISTSERV.UGA.EDU Date: Wednesday, July 9, 2008, 4:10 PM

At 08:48 AM 7/9/2008, Hector Maletta wrote: >I do not remember a specific citation, but the general idea is that factor >analysis is a derivation of regression, and regression rests on the normal >distribution of estimation errors. This normal distribution of estimation >errors is known as "the law of large numbers" and is a tendency shown by >errors as N gets larger and larger. More exactly, as the "degrees of >freedom" get larger. The degrees of freedom equal number of cases minus >number of variables, N-k-1, which in your case is quite small. As the number >of cases are few, the margin of error of your estimates will be very wide, >and you could not be sure of their probable true value in the universe or >population, especially for minor factors after the first or second one, >where the coefficients or loadings will be close to zero (and there may >therefore be difficult to tell whether they are not zero in the population). > >An old rule of thumb says you need at the very least 10 cases per variable, >but this is "the very least". With less than 30-50 cases experimental error >distributions hardly (or very infrequently) resemble a normal curve. >So my advise is you try a model with fewer variables, possibly one >underlying factor if your 40 variables are mostly explained by one >overarching factor, or abandon factor analysis altogether and try some more >modest approaches like a simple summatory scale, simple regression, 2 or 3 >way cross tabulations, and the like. Next time, go bigger in your sample >design. And then again, do you really have a theory that is so complex that >no less than 40 independent factors are required by it? Isaac Newton >explained the universe with only two or three variables, and did very well >indeed, thank you. >Hector

I have been following this discussion with much interest, as I have a similar problem at hand. For years, we have been conducting a consumer satisfaction survey that consists of one page, about 10 questions, plus a single open-ended question. Although the questions were intended to probe consumer satisfaction in a number of different areas, basically the level of correlation is so high that it seems that we're really only tracking one factor: overall satisfaction.

So we conducted literature reviews, and went back to the drawing boards, formulating more than 100 questions in 6 broad areas of consumer satisfaction. Our intention was to pilot test these questions with participants, examine the results, throw out the redundant questions (discerned through factor analysis), and emerge with, say, 20 questions known to reflect different dimensions of consumer satisfaction. However, our sample size thus far is in the pitiful range: perhaps 35 respondents. Needless to say, we have a long way to go. With our response rates, and consumer base, we would be lucky to get more than 100 respondents in a year.

In order to improve the subjects to variables ratio (STV), we need either to greatly increase the sample size (which is difficult for us to do), or reduce the number of variables, or both. Our questions are short simple statements requesting responses on a 5-point likert scale. Some of the questions are worded in almost identical language, and some of these are almost certainly redundant. Given our relatively small sample size thus far, what is the best way to proceed to remove redundant questions while retaining maximum diversity of responses?

From one perspective, it would appear that rank correlations might be the preferred measure of association, but I wonder if Likert scales are, analytically speaking, equivalent to rank order variables? What other measures would be most appropriate? I hesitate to downgrade the measure of association to categorical, because that throws out the information on directionality and degree. Likewise, I hesitate to overgrade the measure of association to ratio, because clearly the intervals are arbitrary and not additive.

Intuitively, I am seeking to extract, out of these 100 questions, 4-5 groups of 2-3 questions each, such that within-group correlations are high, but correlations with the other groups are low. The within-group redundancy reinforces degree of satisfaction with that particular factor, and the low between-group correlation assures that different aspects of satisfaction are represented.

Suggestions, please?

Bob Schacht

Robert M. Schacht, Ph.D. <schacht@hawaii.edu> Pacific Basin Rehabilitation Research & Training Center 1268 Young Street, Suite #204 Research Center, University of Hawaii Honolulu, HI 96814

===================== To manage your subscription to SPSSX-L, send a message to LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

====================To manage your subscription to SPSSX-L, send a message to LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD


Back to: Top of message | Previous page | Main SPSSX-L page