LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (September 2008, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Fri, 5 Sep 2008 10:01:33 -0700
Reply-To:     "Choate, Paul@DDS" <pchoate@DDS.CA.GOV>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "Choate, Paul@DDS" <pchoate@DDS.CA.GOV>
Subject:      Re: Conditionally iterate data step until condition is met
Comments: To: iw1junk@COMCAST.NET
In-Reply-To:  A<090520080357.10451.48C0AE1700011250000028D3220700095305029A06CE9907@comcast.net>
Content-Type: text/plain; charset="us-ascii"

Thanks Ian. I find myself using this sort of "elimination until convergence" a couple times a year or so when winnowing down data based on interdependent criteria.

In this case I actually used a conditional statement and a single pass - you were right that the criteria didn't require the iteration:

if (last.FY and (Ending_Eff_Date<mdy(6,30,2000+input(FY,2.)-1))) or (first.FY and (Beginning_Eff_Date>mdy(6,30,2000+input(FY,2.)))) or ((Beginning_Eff_Date<mdy(7,1,2000+input(FY,2.))) and (Ending_Eff_Date>mdy(6,30,2000+input(FY,2.)-1)));

This had the same effect as the iterative method. I just saw it as a chance to explore the other method a bit, which I'm glad I did. I hadn't yet written an iterative macro like what Data _Null_, Toby and you suggested and it corrected a misunderstanding I had about the macro processor. I don't use macro too much, so it was a chance to play and get something to add to the toolkit.

My gratitude as always!

Paul Choate DDS Data Extraction (916) 654-2160

-----Original Message----- From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Ian Whitlock Sent: Thursday, September 04, 2008 8:57 PM To: SAS-L@LISTSERV.UGA.EDU Subject: Re: Conditionally iterate data step until condition is met

Summary: Use conditional CALL SYMPUT at end-of-file #iw-value=1

Paul,

It would have been better to give an example with data, but since you didn't I made up a silly little problem.

data w ; do obs = 1 to 100 ; x = ranuni ( 234765 ) ; output ; end ; run ;

The problem is to eliminate obs when X > .5. The catch is that the test is conditional on a random number. Hence some obs slip through. This means the step must be repeated until there are no more to be eliminated or you have reached the allowed number of iterations.

Here is the macro.

%macro q ( data= , out=, var=, lim= .5 , maxtimes= 7) ; %local stop loopcnt ;

%let stop = 0 ; %do %until ( &stop ) ; %let loopcnt = %eval(&loopcnt + 1) ; data &out ( drop = count ) ; if eof then do ; put "loopcnt=&loopcnt" / count= nobs= ; if count = nobs then call symput ( "stop" , "1" ) ; end ; set &data end = eof nobs=nobs ; if ranuni(1203987) < .4 then do ; if x > &lim then delete ; end ; count + 1 ; run ; %if &loopcnt >= &maxtimes %then %let stop = 1 ; %let data = &out ; %end ; %mend q ;

Here is the test.

options mprint ; %q(data=w, out=q, var=x, lim=.5, maxtimes=20)

proc summary data = q ; var x ; output out = chk max= / autoname ; run ;

data _null_ ; set chk ; put _all_ ; run ;

The process stopped at LOOPCNT = 6. The final result shows

_TYPE_=0 _FREQ_=46 x_Max=0.7377042792 _ERROR_=0 _N_=1

Note that not all X > .5 are eliminated. However a stable point has been reached. Any further execution with values as given would be useless, since no observations were eliminated on the last iteration and the random number condition will simply repeat from this point on.

I usually seen this sort of problem when there was some sort of convergence and one wanted to stop iterating the process when the difference between results gets sufficiently close to 0. I have also used it in SQL where some CREATE/SELECT is to be repeated until it stabilizes.

You could, of course, package the DATA step as a macro say MAC. Then Q would reduce to

%macro q ( data= , out=, var=, lim= .5 , maxtimes= 7) ; %local stop loopcnt ;

%let stop = 0 ; %do %until ( &stop ) ; %let loopcnt = %eval(&loopcnt + 1) ;

/* unquote in case there is macro quoting in the parm */ %UNQUOTE(%&MAC)

%if &loopcnt >= &maxtimes %then %let stop = 1 ; %let data = &out ; %end ; %mend q ;

with the call

%q( mac=mymac(data=w, out=q, var=x, lim=.5) , maxtimes=20)

If this does not capture what you wanted, then I think you will have to construct of specific simplified example.

What you gave did not make much sense to me. I cannot envision how repeating the DATA step in %REPEAT can change anyything. As I see it you are deleting based on an unchanging condition

if last.FY and not first.FY and (Ending_Eff_Date>mdy(6,30,2000+input(FY,2.))) then delete;

So why will repeating the step make any difference? What changes? The step in front, calculating the maximum number of records in the FY group, leaves me wondering why it has anything to do with the subsetting. Consequently, if you use this example then please give data and explain.

Ian Whitlock ============== Date: Thu, 4 Sep 2008 09:50:58 -0700 Reply-To: "Choate, Paul@DDS" <pchoate@DDS.CA.GOV> Sender: "SAS(r) Discussion" From: "Choate, Paul@DDS" <pchoate@DDS.CA.GOV> Subject: Conditionally iterate data step until condition is met Content-Type: text/plain; charset="us-ascii"

Every once in a while I need to repeatedly run a data set through a data step with a set of conditions until the set stabilizes at a certain previously unknown record count. Usually when winnowing down a pool of messy data to the best available set using a date criteria.

Typically I wrap the data in a macro and set a %do %until counter based on a previously generated nobs or other count, such as a maximum record count across certain by-groups in the data. For example:

data _null_; set rates nobs=nobs; by vendor svscd sub FY; if first.FY then count=0; Count+1; if last.FY then Num=max(count,Num); if _n_=nobs then call symput('Num',put(Num,8.)); run;

%macro repeat; %do i=1 %to %eval(&Num); data rates; set rates; by vendor svscd sub FY Date_Received Beginning_Eff_Date Ending_Eff_Date; if last.FY and not first.FY and (Ending_Eff_Date> mdy(6,30,2000+input(FY,2.))) then delete; run; %end; %mend repeat;

%repeat

But the criteria may be met before the last loop and so it chews through the data an unnecessary number of times.

What I would like to do is not use a previously determined cut off, but to instead check the number of observations after each iteration and exit the loop if it hasn't changed. Maybe with call execute, maybe macro loops, maybe DoW loops. I have come up with one idea for a kludge but it's ugly and I would like something succinct.

Any thoughts?

Paul Choate DDS Data Extraction (916) 654-2160


Back to: Top of message | Previous page | Main SAS-L page