Date: Fri, 5 Sep 2008 10:01:33 -0700
Reply-To: "Choate, Paul@DDS" <pchoate@DDS.CA.GOV>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Choate, Paul@DDS" <pchoate@DDS.CA.GOV>
Subject: Re: Conditionally iterate data step until condition is met
In-Reply-To: A<090520080357.10451.48C0AE1700011250000028D3220700095305029A06CE9907@comcast.net>
Content-Type: text/plain; charset="us-ascii"
Thanks Ian. I find myself using this sort of "elimination until
convergence" a couple times a year or so when winnowing down data based
on interdependent criteria.
In this case I actually used a conditional statement and a single pass -
you were right that the criteria didn't require the iteration:
if (last.FY and (Ending_Eff_Date<mdy(6,30,2000+input(FY,2.)-1)))
or (first.FY and (Beginning_Eff_Date>mdy(6,30,2000+input(FY,2.))))
or ((Beginning_Eff_Date<mdy(7,1,2000+input(FY,2.))) and
(Ending_Eff_Date>mdy(6,30,2000+input(FY,2.)-1)));
This had the same effect as the iterative method. I just saw it as a
chance to explore the other method a bit, which I'm glad I did. I
hadn't yet written an iterative macro like what Data _Null_, Toby and
you suggested and it corrected a misunderstanding I had about the macro
processor. I don't use macro too much, so it was a chance to play and
get something to add to the toolkit.
My gratitude as always!
Paul Choate
DDS Data Extraction
(916) 654-2160
-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Ian
Whitlock
Sent: Thursday, September 04, 2008 8:57 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Re: Conditionally iterate data step until condition is met
Summary: Use conditional CALL SYMPUT at end-of-file
#iw-value=1
Paul,
It would have been better to give an example with data, but since you
didn't I made up a silly little problem.
data w ;
do obs = 1 to 100 ;
x = ranuni ( 234765 ) ;
output ;
end ;
run ;
The problem is to eliminate obs when X > .5. The catch is that the
test is conditional on a random number. Hence some obs slip through.
This means the step must be repeated until there are no more to be
eliminated or you have reached the allowed number of iterations.
Here is the macro.
%macro q ( data= , out=, var=, lim= .5 , maxtimes= 7) ;
%local stop loopcnt ;
%let stop = 0 ;
%do %until ( &stop ) ;
%let loopcnt = %eval(&loopcnt + 1) ;
data &out ( drop = count ) ;
if eof then
do ;
put "loopcnt=&loopcnt" / count= nobs= ;
if count = nobs then
call symput ( "stop" , "1" ) ;
end ;
set &data end = eof nobs=nobs ;
if ranuni(1203987) < .4 then
do ;
if x > &lim then delete ;
end ;
count + 1 ;
run ;
%if &loopcnt >= &maxtimes %then
%let stop = 1 ;
%let data = &out ;
%end ;
%mend q ;
Here is the test.
options mprint ;
%q(data=w, out=q, var=x, lim=.5, maxtimes=20)
proc summary data = q ;
var x ;
output out = chk max= / autoname ;
run ;
data _null_ ;
set chk ;
put _all_ ;
run ;
The process stopped at LOOPCNT = 6. The final result shows
_TYPE_=0 _FREQ_=46 x_Max=0.7377042792 _ERROR_=0 _N_=1
Note that not all X > .5 are eliminated. However a stable point
has been reached. Any further execution with values as given
would be useless, since no observations were eliminated on the last
iteration and the random number condition will simply repeat from this
point on.
I usually seen this sort of problem when there was some sort of
convergence and one wanted to stop iterating the process when the
difference between results gets sufficiently close to 0. I have also
used it in SQL where some CREATE/SELECT is to be repeated until it
stabilizes.
You could, of course, package the DATA step as a macro say MAC. Then
Q would reduce to
%macro q ( data= , out=, var=, lim= .5 , maxtimes= 7) ;
%local stop loopcnt ;
%let stop = 0 ;
%do %until ( &stop ) ;
%let loopcnt = %eval(&loopcnt + 1) ;
/* unquote in case there is macro quoting in the parm */
%UNQUOTE(%&MAC)
%if &loopcnt >= &maxtimes %then
%let stop = 1 ;
%let data = &out ;
%end ;
%mend q ;
with the call
%q( mac=mymac(data=w, out=q, var=x, lim=.5)
, maxtimes=20)
If this does not capture what you wanted, then I think you will have
to construct of specific simplified example.
What you gave did not make much sense to me. I cannot envision how
repeating the DATA step in %REPEAT can change anyything. As I see it
you are deleting based on an unchanging condition
if last.FY
and not first.FY
and (Ending_Eff_Date>mdy(6,30,2000+input(FY,2.))) then
delete;
So why will repeating the step make any difference? What changes?
The step in front, calculating the maximum number of records in the FY
group, leaves me wondering why it has anything to do with the
subsetting. Consequently, if you use this example then please give
data and explain.
Ian Whitlock
==============
Date: Thu, 4 Sep 2008 09:50:58 -0700
Reply-To: "Choate, Paul@DDS" <pchoate@DDS.CA.GOV>
Sender: "SAS(r) Discussion"
From: "Choate, Paul@DDS" <pchoate@DDS.CA.GOV>
Subject: Conditionally iterate data step until condition is met
Content-Type: text/plain; charset="us-ascii"
Every once in a while I need to repeatedly run a data set through a
data step with a set of conditions until the set stabilizes at a
certain previously unknown record count. Usually when winnowing down a
pool of messy data to the best available set using a date criteria.
Typically I wrap the data in a macro and set a %do %until counter
based on a previously generated nobs or other count, such as a maximum
record count across certain by-groups in the data. For example:
data _null_;
set rates nobs=nobs;
by vendor svscd sub FY;
if first.FY then count=0;
Count+1;
if last.FY then Num=max(count,Num);
if _n_=nobs then call symput('Num',put(Num,8.));
run;
%macro repeat;
%do i=1 %to %eval(&Num);
data rates;
set rates;
by vendor svscd sub FY Date_Received
Beginning_Eff_Date Ending_Eff_Date;
if last.FY and not first.FY and (Ending_Eff_Date>
mdy(6,30,2000+input(FY,2.))) then delete;
run;
%end;
%mend repeat;
%repeat
But the criteria may be met before the last loop and so it chews
through the data an unnecessary number of times.
What I would like to do is not use a previously determined cut off,
but to instead check the number of observations after each iteration
and exit the loop if it hasn't changed. Maybe with call execute, maybe
macro loops, maybe DoW loops. I have come up with one idea for a
kludge but it's ugly and I would like something succinct.
Any thoughts?
Paul Choate DDS Data Extraction (916) 654-2160