Date: Mon, 27 Oct 2008 10:09:09 -0700
Reply-To: "sophe88@yahoo.com" <sophe88@YAHOO.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "sophe88@yahoo.com" <sophe88@YAHOO.COM>
Organization: http://groups.google.com
Subject: Sampling or not, this is the question
Content-Type: text/plain; charset=ISO-8859-1
Hi,
Hope to hear some input on this question. This is more like a thought
question. At frequency table
variable2
Variable 1 x y
z Row total
0 366,700,892 256,364,259 592,514,321 1,215,579,472
1 0 265,412,326 69,512,786 334,925,112
2 0 26,598,741 263,578,912 290,177,653
3 0 61,037,890 689,478,021 750,515,911
total 366,700,892 609,413,216 1,615,084,040
2,591,198,148
This table is from merging 2 data sets by simple "if A & B ", by ID.
Data set A has variable 1 and B has variable 2.
This table used 2006 data. Now we need to update it with 2008.
Problem is 2008 has much larger counts
So one suggested: we should do a random sample, say ranuni ( 92929)
<0.25 on both a and b.
I frowned : if so, each ID has 25% chance to be picked. One ID could
be excluded because it belongs to the 'unlucky' 75%, not because it is
not matched to the other table.
So, to sample or not to sample? Thanks.
PD
|