LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (June 1996, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 27 Jun 1996 01:52:01 +0100
Reply-To:     John Whittington <johnw@MAG-NET.CO.UK>
Sender:       "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
From:         John Whittington <johnw@MAG-NET.CO.UK>
Subject:      Re: Subsetting OBS from a large dataset
Comments: To: W HU <whu@UVIC.CA>

On Wed, 26 Jun 1996, W HU <whu@UVIC.CA>

>I have a large SAS dataset containing about 4 million records. I want to >subset some records from it in the way that every the one fifth (or >other proportions) record will be extracted. To illustrate, supposed there >are 21 obs, I want to extract the 5th, 10th, 15th, and the 20th obs into >the sub-dataset. > >What I do now to solve this problem is that I get the total number of >OBS first, then use this total number didvided by 5 to get the ranking of >those obs to be extracted. It works well. However, this is not a efficient >way if the dataset is too large. I am looking for a solution which can do >the same job with no need for pre-defined total number of obs.

Weimin, I'm not sure that I completely understand what you want to achieve, and am by no means sure whether either of the solutions I have seen posted actually correspond to what you want! In the example you give, 21 obs divided by 5 gives 4.2, which you presumably round down to 4, but then I'm not sure how you translate that into the need for the 5th, 10th, 15th and 20th obs to be selected.

My initial interpretation (which I suspect is also wrong!) is that (usuing your example of 5) you wanted to select the observation which was one fifth, two fifths etc. of the way through the dataset - so that you would actually always end up with 5 observations being selected, with the last one being betwen 0 and 4 observations from the end of the datset. On that basis, the following code would work:

data minitest ; do x = 1 to 59 ; output ; end ; run ;

data subset (drop = num increm); retain num increm ; if _n_=1 then do ; num = total ; increm = floor( num / 5 ) ; /* change '5' as desired, or use macrovar */ end ; do i = increm to num by increm ; set minitest nobs=total point=i ; output ; end ; stop ; run ;

proc print data=subset ; run ;

... which gives output:


1 11 2 22 3 33 4 44 5 55

On the other hand, if you removed the rounding FLOOR function, you would get as close as possible to those one fifth, two fifth etc. points, with the last observation selected being the final one in the dataset:


1 11 2 23 3 35 4 47 5 59

I suspect that neither of these are what you want. If you can clarify your requirement, I suspect that the above code can be adapted to suit.


----------------------------------------------------------- Dr John Whittington, Voice: +44 1296 730225 Mediscience Services Fax: +44 1296 738893 Twyford Manor, Twyford, E-mail: Buckingham MK18 4EL, UK CompuServe: 100517,3677 -----------------------------------------------------------

Back to: Top of message | Previous page | Main SAS-L page