LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (October 2010, week 5)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Fri, 29 Oct 2010 15:22:26 -0700
Reply-To:     Sterling Paramore <gnilrets@GMAIL.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Sterling Paramore <gnilrets@GMAIL.COM>
Subject:      Collapsing date and coverage records with no gaps
Content-Type: text/plain; charset=ISO-8859-1

Nice work, Dan! I ran my enrollment data through your method and it was in perfect agreement with mine, but an order of magnitude faster! Too bad the OP hasn't responded, but I'm learning a lot. Anyway, I think I came up with my bloated solution because I knew it would have to be modified and expanded to handle a bit more complicated case. I have enrollment records with three different coverage types (M - Medical, D - Dental, X - Prescription). I want to collapse overlapping spans and indicate all the coverage types that were available at that time. For example:

data have; length cov $1; input id $ cov $ start_date :mmddyy10. stop_date :mmddyy10.; format start_date stop_date yymmdd10.; cards; 1 M 1/1/10 6/30/10 1 M 7/1/10 12/31/10 1 X 4/1/10 6/30/10 1 X 7/1/10 12/31/10 2 M 1/1/10 8/30/10 2 X 7/1/10 12/30/10 3 M 1/1/10 12/30/10 3 D 2/1/10 6/30/10 3 X 4/1/10 12/30/10 ; run;

WANT: id start_date stop_date multicov

1 2010-01-01 2010-03-31 M 1 2010-04-01 2010-12-31 MX 2 2010-01-01 2010-06-30 M 2 2010-07-01 2010-08-30 MX 2 2010-08-31 2010-12-30 X 3 2010-01-01 2010-01-31 M 3 2010-02-01 2010-03-31 DM 3 2010-04-01 2010-06-30 DMX 3 2010-07-01 2010-12-30 MX

After seeing your elegant solution to the previous problem, I wonder if my solution to this one is just as bloated:

%macro Continuous_Spans_MultiCov( in_libdsn = , /* INPUT dataset to process */ out_libdsn = , /* OUTPUT dataset with continuous spans */ group_vars = , /* INPUT space-delimited list of variables that identify a group for which to process continuous spans */ cov_var = , /* INPUT name of coverage var - must NOT be part of group_vars*/ multicov_var = Span_MultiCov_Cd, /* OUTPUT name of multicoverage code - multicov cd always in the order DMVX */ start_var = , /* INPUT the name of the variable in &in_libdsn that starts a span */ end_var = , /* INPUT the name of the variable in &in_libdsn that ends a span */ worklib = WORK ) / store;

*** Create a format to convert a binary code into a text string ; * that represents multiple coverage types;

data &worklib.._multiCov_fmt;

length FMTNAME $20.; length Start 8; length End 8; length Type $1.; length Label $4.;

do i = 0 to 15;

FMTNAME = "multiCov"; Type = "N"; Label = "____"; Start = i; End = i;

if band(i,1) ^= 0 then substr(Label,4,1) = "X"; if band(i,2) ^= 0 then substr(Label,3,1) = "V"; if band(i,4) ^= 0 then substr(Label,2,1) = "M"; if band(i,8) ^= 0 then substr(Label,1,1) = "D";

Label = compress(Label,"_");

output;

end;

run;

proc format CNTLIN = &worklib.._multiCov_fmt library = WORK; run;

** The complementary format converts a single coverage code into a binary number; proc format; invalue $binCov "X" = 1 "V" = 2 "M" = 4 "D" = 8 ; run;

data &out_libdsn (keep = &group_vars &start_var &end_var &multicov_var); set &in_libdsn; by &group_vars;

if _N_ = 1 then do;

length _order _cov $1; length _date _start _end 8; format _date _start _end DATE9.;

* Hash table to store each date per group; * _order = S for start, E for end; declare hash h_date(ordered:'a'); h_date.definekey('_date','_order','_cov'); h_date.definedone();

* Hash table to store individual spans to be grouped; declare hash h_pair(ordered:'a'); h_pair.definekey('_cov','_start','_end'); h_pair.definedone();

* Iterator objects needed for traversing hash tables; declare hiter hi_date('h_date'); declare hiter hi_pair('h_pair');

end;

if first.%scan(&group_vars,-1,%str( )) then do; h_date.clear(); h_pair.clear(); end;

* Clear hash tables at the beginning of a new group; h_date.ref(key:&start_var,key:"S",key:&cov_var, data:&start_var,data:"S",data:&cov_var); h_date.ref(key:&end_var,key:"E",key:&cov_var, data:&end_var,data:"E",data:&cov_var); h_pair.ref(key:&cov_var,key:&start_var,key:&end_var, data:&cov_var,data:&start_var,data:&end_var);

length Span_Start Span_End endplus1 8; format Span_Start Span_End endplus1 DATE9.;

length &multicov_var $4; length start_multicov_bin end_multicov_bin endp1_multicov_bin 8; format start_multicov_bin end_multicov_bin endp1_multicov_bin MultiCov.;

if last.%scan(&group_vars,-1,%str( )) then do;

** Read the first start date; * Note: if this is a zero-day span, this will actually be an E, but treat it like an S; rc_date = hi_date.first();

Span_Start = _date;

* Compute multicov code for start dates; rc_pair = hi_pair.first(); start_multicov_bin = 0; do while (rc_pair = 0); if (Span_Start >= _start and Span_Start <= _end) then start_multicov_bin = bor(start_multicov_bin,input(_cov,$binCov.)); rc_pair = hi_pair.next(); end;

* Move to next date; if _order = "S" then rc_date = hi_date.next();

do while (rc_date = 0);

** Start dates; if _order = "S" then do;

* See if this start date introduces a new coverage code; * If so, then terminate previous multicov code and start a new one; * If not, then ignore start date; if band(start_multicov_bin,input(_cov,$binCov.)) = 0 then do;

* Terminate previous multicov code on the previous day; Span_End = intnx('days',_date,-1); &multicov_var = put(start_multicov_bin,MultiCov.);

&start_var = Span_Start; &end_var = Span_End; output &out_libdsn;

* Start new span and calculate multicov code on this new start date; Span_Start = _date; rc_pair = hi_pair.first(); start_multicov_bin = 0; do while (rc_pair = 0); if (Span_Start >= _start and Span_Start <= _end) then start_multicov_bin = bor(start_multicov_bin,input(_cov,$binCov.)); rc_pair = hi_pair.next(); end;

end;

* Move to next date point; rc_date = hi_date.next();

end;

** End dates; else if _order = "E" then do;

Span_End = _date; endplus1 = intnx('days',_date,1);

* Determine multicov code for this end date and the day after; rc_pair = hi_pair.first(); end_multicov_bin = 0; endp1_multicov_bin = 0; do while (rc_pair = 0); if (Span_End >= _start and Span_End <= _end) then end_multicov_bin = bor(end_multicov_bin,input(_cov,$binCov.)); if (endplus1 >= _start and endplus1 <= _end) then endp1_multicov_bin = bor(endp1_multicov_bin,input(_cov,$binCov.)); rc_pair = hi_pair.next(); end;

* If coverage code is unchanged, then just move to next date; if start_multicov_bin = end_multicov_bin and start_multicov_bin = endp1_multicov_bin then do; rc_date = hi_date.next(); end;

* If coverage changes on the end date; * This can happen when new coverage starts the day old coverage terms; * Occurance of this may result in single-day coverage spans; if start_multicov_bin ^= end_multicov_bin then do;

* Terminate current multicov code; Span_End = intnx('days',_date,-1); &multicov_var = put(start_multicov_bin,MultiCov.);

&start_var = Span_Start; &end_var = Span_End; output &out_libdsn;

* Start new multicov span; Span_Start = _date; start_multicov_bin = end_multicov_bin;

* Move to next date on or after new Span Start if multicov does not immediately change; if end_multicov_bin = endp1_multicov_bin then do; rc_date = hi_date.next(); do while (rc_date = 0 and _date < Span_Start); rc_date = hi_date.next(); end; end;

end;

* If coverage changes after the end date then output this span and move to next date; if end_multicov_bin ^= endp1_multicov_bin then do;

* Terminate current multicov code; Span_End = _date; &multicov_var = put(start_multicov_bin,MultiCov.);

&start_var = Span_Start; &end_var = Span_End; output &out_libdsn;

* Start new multicov span, if applicable; if endp1_multicov_bin ^= 0 then do; Span_Start = intnx('days',_date,1); start_multicov_bin = endp1_multicov_bin;

* Move to next date on or after new Start Date; rc_date = hi_date.next(); do while (rc_date = 0 and _date < Span_Start); rc_date = hi_date.next(); end;

end;

* If there is no new multicov span, find next start date and calc new multicov; else do;

* Move to next start date; do until (_order = "S" or rc_date ^= 0); rc_date = hi_date.next(); end; Span_Start = _date;

rc_pair = hi_pair.first(); start_multicov_bin = 0; do while (rc_date = 0 and rc_pair = 0); if (Span_Start >= _start and Span_Start <= _end) then start_multicov_bin = bor(start_multicov_bin,input(_cov,$binCov.)); rc_pair = hi_pair.next(); end;

end;

end;

end; * End _order = E;

end; * End loop over enrollment spans;

* Remove pointers; hi_date.last(); rc_d = hi_date.next();

hi_pair.last(); rc_e = hi_pair.next();

end; * End last-dot;

run;

%mend Continuous_Spans_MultiCov;

On Fri, Oct 29, 2010 at 1:33 PM, Nordlund, Dan (DSHS/RDA) < NordlDJ@dshs.wa.gov> wrote:

> > -----Original Message----- > > From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of > > Søren Lassen > > Sent: Friday, October 29, 2010 1:31 AM > > To: SAS-L@LISTSERV.UGA.EDU > > Subject: Re: Collapsing date records with no gaps > > > > David, > > A large number of solutions have been proposed. Most of them > > are variations on a DOW loop. I did not study the other solutions > > in detail, but it seems to me that all the DOW solutions proposed > > will output nothing if there is only one record for an ID. > > > > My suggested solution is this: > > > > proc sort data=have; > > by id start_date stop_date; > > /* Get rid of zero/negative length intervals */ > > where stop_date ge start_date or stop_date is null; > > run; > > > > data want (keep=id start_date stop_date); > > set have(rename=(start_date=_start stop_date=_stop)); > > by id; > > format start_date stop_date mmddyy8.; > > retain start_date stop_date; > > select; > > when(first.id) do; > > start_date=_start; > > stop_date=_stop; > > end; > > when(missing(stop_date)); /* never stops, "infinite" stop date */ > > when(stop_date ge _stop); /* nested intervals */ > > when (stop_date ge _start-1) do; /* adjacent or overlappping */ > > if missing(_stop) then > > stop_date=.; > > else > > stop_date = max(_stop,stop_date); > > end; > > otherwise do; > > output; > > start_date=_start; > > stop_date=_stop; > > end; > > end; > > if last.id then output; > > drop _:; > > run; > > > > As you can see, I left out the DOW loop and put in a retain statement > > instead. This is mostly a matter of personal taste, likewise the > > use of SELECT instead of nested IF..THEN..ELSE statements. > > > > I added an extra twist: if stop_date is missing, it probably means > > that the record is still valid (supposedly into eternity). > > > > Regards, > > Søren > > > > Sören, > > You are correct that my original solution did not handle the case of a > single record for an id. I offer a corrected version which also handles > overlapping and nested spans in addition to the case where a span begins the > following day. It could easily be modified to handle the case where a stop > _date was missing, but at first blush I would probably handle it differently > than you if there were a following record for the id with a non-missing > stop_date. But that brings me back to a comment that I made in response to > Sterling. And that is, what to do with spans that overlap in various ways > (or have missing values) depends on local business rules. It is unfortunate > that the OP hasn't responded with clarification or told us how the problem > was solved (if I missed that response, I apologize). But anyway, here is my > modification. > > data have; > input id start_date :mmddyy10. stop_date :mmddyy10.; > format start_date stop_date yymmdd10.; > cards; > 1 1/1/10 1/30/10 > 1 1/30/10 2/15/10 > 1 2/13/10 3/15/10 > 1 3/16/10 3/30/10 > 2 5/1/10 5/30/10 > 2 7/1/10 7/15/10 > 3 7/15/10 7/30/10 > 4 5/1/10 7/30/10 > 4 7/1/10 7/15/10 > ; > run; > proc print; > run; > > data want; > do until(last.id); > set have(rename=(start_date=_start stop_date=_stop)); > by id; > > if first.id then do; > start_date=_start; > stop_date=_stop; > end; > else if _start<=stop_date+1 then do; > if _stop > stop_date then stop_date = _stop; > end; > else do; > output; > start_date=_start; > stop_date=_stop; > end; > if last.id then output; > end; /*end do until*/ > > format start_date stop_date yymmdd10.; > drop _: ; > run; > > proc print; > run; > > Hope this is helpful, > > Dan > > Daniel J. Nordlund > Washington State Department of Social and Health Services > Planning, Performance, and Accountability > Research and Data Analysis Division > Olympia, WA 98504-5204 >


Back to: Top of message | Previous page | Main SAS-L page