LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (February 2007, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Sun, 11 Feb 2007 22:25:57 -0800
Reply-To:     David L Cassell <davidlcassell@MSN.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         David L Cassell <davidlcassell@MSN.COM>
Subject:      Re: Comparing variable distribution between different groups
In-Reply-To:  <20070209121209.44481.qmail@web50810.mail.yahoo.com>
Content-Type: text/plain; format=flowed

hema_dave15@YAHOO.COM wrote back: >hema <hema_dave15@YAHOO.COM> wrote: > Hi all, > >How do i compare a variable distribution using proc freq with the CMH >option between separate datasets...I don't know which dataset to use. >one dataset is training dataset from the universe. other one is universe. >So how do i compare the same variable in two different datasets > >Thanks, >Hema

> >Basically i want to check whether my sample is representative of other >sample using certain variables one at a time. > So i was told that cmh option in freq will do that for categorical >variables. > I want to do this for continious variable also.. > can anyone suggest me wht can be done to achieve this > > Thanks in advance

Okay, I think you are doing the wrong thing here.

Unless you started out with a really lousy sampling plan, there is not much point to this! If you used a lousy sampling plan, then you will find lots of differences from your intended (target) population, and you will not be able to correct for *all* of them. So you would be better served by starting over with a proper sample.

If you start out with a really good sampling plan, but you try this with, say, 100 separate variables, then guess what? At alpha=0.05, assuming complete independence (which you wouldn't really get) you would expect that about 5 variables would flag as different. So are they really different? No, you're just seeing random variation and the natural consequences of error rates. What if you get 3 variables significant? Or 8? How can we tell what the right cutoff would be, when the variables are not going to be truly independent so that 'assuming independence' number is not that helpful? Answer: you're stuck without a lot more math and stats.

If you are trying to match *your* sample against someone else's sample, then you have a lot more problems. But univariate approaches are probably not the right way to go. In some cases, people pretend that the sample would be fine if they just jiggled the weights a lot, and they use what is called 'raking'. I don't like that, except in specifically designed cases.

So it would really help if you would explain in *detail* what you are trying to do, and why, and what you mean by "representative of other sample" here.

HTH, David -- David L. Cassell mathematical statistician Design Pathways 3115 NW Norwood Pl. Corvallis OR 97330

_________________________________________________________________ FREE online classifieds from Windows Live Expo – buy and sell with people you know http://clk.atdmt.com/MSN/go/msnnkwex0010000001msn/direct/01/?href=http://expo.live.com?s_cid=Hotmail_tagline_12/06


Back to: Top of message | Previous page | Main SAS-L page