LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (November 2005, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Tue, 8 Nov 2005 21:52:01 -0500
Reply-To:     Paul Walker <walker.627@OSU.EDU>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Paul Walker <walker.627@OSU.EDU>
Subject:      Random Sample on Dataset Subject to Where Statement

For purposes of creating summary statistics about every variable in a particular 'large' dataset, I want to first take a simple random sample of records to use in the calculations. My usual method for generating such a sample is through the use of direct access to rows using the point= option in the set statement. However, this method falls apart when I want to AT THE SAME TIME allow the user of my application to specify a where statement.

The problem is: take a random sample of 5,000 records from dataset A which contains 500,000 records subject to some where statement, without prior knowledge about whether the dataset subject to the where clause will have more or less than 5,000 records (the chosen sample size).

My current way of dealing with this is to (1) create dataset B which is dataset A subject to the where clause, (2) check if B contains more or less than 5,000 records, and (3) if B contains more than 5,000 records then use my usual simple random sample program to sample B down to 5,000 records. This is extremely inefficient but I don't know a better way...

So, does anyone know a better way??? Note that sampling WITHOUT replacement must be used.

Final note: I tested proc SURVEYSELECT based on other SAS-L postings and found it to be very slow.


Back to: Top of message | Previous page | Main SAS-L page