Date: Thu, 27 Feb 2003 08:18:07 -0500
Reply-To: Art@DrKendall.org
Sender: "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From: "Arthur J. Kendall" <Art@DrKendall.org>
Organization: Social Research Consultants
Subject: Re: Randomize
Content-Type: text/plain; charset=us-ascii; format=flowed
If you run something like
sample 300 from 1000.
on exactly the same file file you should get exactly the same cases.
call to sample or any of the rv.* functions involve calls to the
pseudo-random number generator behind the scenes.
randomizing is a good idea if you are going to draw multiple samples
and you want them to be non-overlapping. It is not necessary if you are
just drawing one simple random sample.
show seed.
set seed 20030227.
get file . . .
compute randval = uniform(1).
sort cases by randval.
compute ranorder = $casenum.
* designate cases for non-overlapping samples of 300,400, and 50.
numeric mysample (f2).
recode mysample (1 thru 300 =1) (301 thru 700 =2) (701 thru 750 =3)
(else=4).
value labels
1 'first sample'
2 'second sample'
3 'third sample'
4 'still avail to sample from'.
Hope this helps.
Art
Art@DrKendall.org
Social Research Consultants
University Park, MD USA
(301) 864-5570
Rachel Tong wrote:
> I've experienced problems in the past with the random select function. I pulled two random samples from the same list, and both lists ended up being extremely similar. Also, I'm pulling about 300 names out of 1 million plus
> records, I thought randomizing it first might be a good idea. Am I approaching the problem incorrectly? Any suggestions?
>
> Rachel
> ---------------------------------------------------
> Rachel Tong
> Strategic Research Group, Inc.
> Columbus, Ohio
> ---------------------------------------------------
>