Date: Wed, 30 May 2007 14:54:30 -0400
Reply-To: Richard Ristow <wrristow@mindspring.com>
Sender: "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From: Richard Ristow <wrristow@mindspring.com>
Subject: Re: Dummy variable omitted?
In-Reply-To: <06c201c7a2d3$8f39af70$0201a8c0@LIFEBOOK>
Content-Type: text/plain; charset="us-ascii"; format=flowed
At 11:59 AM 5/30/2007, Lisa Stickney wrote:
> I have a set of dummy variables with an unequal number of
> observations in each. I was wondering if anyone knows of a general
> rule on which category to use as the omitted category. Thanks in
> advance.
It's a matter or taste, not necessity, of course; the models are
equivalent (in the sense of defining the same space of possible
predicted values) no matter what category you omit.
I tend to omit the most 'normal' category.
That can be the most common category, especially if it's the most
common by a large margin: if I'm studying dogs by breed, and 75% of my
sample are golden retrievers, golden retrievers will be the omitted
category.
Or, 'normal' can be based on your judgement: If you think the golden
retriever is the prototypical dog (I've sometimes thought that), then
you drop the golden-retriever category regardless of how relatively
numerous it is.
There are other ways to make the judgement, but I'll have to post about
those later; I'm muzzy-headed today.
Good luck,
Richard