Date: Wed, 26 Jun 1996 16:16:16 PDT
Reply-To: Melvin Klassen <KLASSEN@UVVM.UVIC.CA>
Sender: "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
From: Melvin Klassen <KLASSEN@UVVM.UVIC.CA>
Subject: Re: Subsetting OBS from a large dataset
Weimin Hu <whu@UVIC.CA> writes:
>Hello Everyone;
Given your E-mail addresses, I'm surprised that you didn't pose
your question to either the HelpDesk in the same building as the
Geography Department, or the BCSC Help Desk, if you needed an
*immediate* answer, but anyway, since you asked ...
>I have a large SAS dataset containing about 4 million records. I want to
>subset some records from it in the way that every the one fifth (or other
>proportions) record will be extracted. To illustrate, supposed there are 21
>obs, I want to extract the 5th, 10th, 15th, and the 20th obs into the
>sub-dataset.
The "automatic" SAS-variable named '_N_' is available to be manipulated
inside a DATA step, i.e.,
DATA new.file; SET old.file; IF MOD(_N_,5) = 0 THEN OUTPUT; RUN;
will output each fifth record.
>What I do now to solve this problem is that I get the total number of OBS
>first, then use this total number divided by 5 to get the ranking of those
>obs to be extracted. It works well. However, this is not a efficient way
>if the dataset is too large.
Why isn't it efficient?
Given that you are starting with a SAS dataset, the number-of-observations
is *immediately* and "efficiently" available. For example:
DATA new.file;
RETAIN FRACTION 5; /* or any other positive, integer, value. */
DROP FRACTION; /* Don't write this variable to the output file. */
RETAIN COUNT 0; /* Count the number of records we've written. */
DROP COUNT; /* Don't write this variable, either. */
SET old.file NOBS=NOBS; /* Get the number-of-observations and 1 OBS. */
IF MOD(_N_,FRACTION)=0 /* Is this observation to be selected? */
THEN DO;
COUNT=COUNT=1; /* Count it. */
OUTPUT; /* Write it. */
IF COUNT = FLOOR(NOBS/FRACTION) THEN STOP; /* Time to quit? */
END;
RUN;
>I am looking for a solution which can do the same job
>with no need for pre-defined total number of obs.
>--------------------------------
>Weimin Hu
>Department of Geography
>University of Victoria
>Victoria, B. C.
>CANADA, V8W 2Y2
>Email: whu@uvic.ca
> weiminhu@bcsc02.gov.bc.ca
>--------------------------------