LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (October 2008, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Mon, 27 Oct 2008 10:09:09 -0700
Reply-To:     "sophe88@yahoo.com" <sophe88@YAHOO.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "sophe88@yahoo.com" <sophe88@YAHOO.COM>
Organization: http://groups.google.com
Subject:      Sampling or not, this is the question
Comments: To: sas-l@uga.edu
Content-Type: text/plain; charset=ISO-8859-1

Hi,

Hope to hear some input on this question. This is more like a thought question. At frequency table

variable2 Variable 1 x y z Row total 0 366,700,892 256,364,259 592,514,321 1,215,579,472 1 0 265,412,326 69,512,786 334,925,112 2 0 26,598,741 263,578,912 290,177,653 3 0 61,037,890 689,478,021 750,515,911 total 366,700,892 609,413,216 1,615,084,040 2,591,198,148

This table is from merging 2 data sets by simple "if A & B ", by ID. Data set A has variable 1 and B has variable 2.

This table used 2006 data. Now we need to update it with 2008. Problem is 2008 has much larger counts

So one suggested: we should do a random sample, say ranuni ( 92929) <0.25 on both a and b.

I frowned : if so, each ID has 25% chance to be picked. One ID could be excluded because it belongs to the 'unlucky' 75%, not because it is not matched to the other table.

So, to sample or not to sample? Thanks.

PD


Back to: Top of message | Previous page | Main SAS-L page