| Date: | Fri, 29 Oct 2010 04:30:54 -0400 |
| Reply-To: | Søren Lassen <s.lassen@POST.TELE.DK> |
| Sender: | "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> |
| From: | Søren Lassen <s.lassen@POST.TELE.DK> |
| Subject: | Re: Collapsing date records with no gaps |
|
| Content-Type: | text/plain; charset=ISO-8859-1 |
David,
A large number of solutions have been proposed. Most of them
are variations on a DOW loop. I did not study the other solutions
in detail, but it seems to me that all the DOW solutions proposed
will output nothing if there is only one record for an ID.
My suggested solution is this:
proc sort data=have;
by id start_date stop_date;
/* Get rid of zero/negative length intervals */
where stop_date ge start_date or stop_date is null;
run;
data want (keep=id start_date stop_date);
set have(rename=(start_date=_start stop_date=_stop));
by id;
format start_date stop_date mmddyy8.;
retain start_date stop_date;
select;
when(first.id) do;
start_date=_start;
stop_date=_stop;
end;
when(missing(stop_date)); /* never stops, "infinite" stop date */
when(stop_date ge _stop); /* nested intervals */
when (stop_date ge _start-1) do; /* adjacent or overlappping */
if missing(_stop) then
stop_date=.;
else
stop_date = max(_stop,stop_date);
end;
otherwise do;
output;
start_date=_start;
stop_date=_stop;
end;
end;
if last.id then output;
drop _:;
run;
As you can see, I left out the DOW loop and put in a retain statement
instead. This is mostly a matter of personal taste, likewise the
use of SELECT instead of nested IF..THEN..ELSE statements.
I added an extra twist: if stop_date is missing, it probably means
that the record is still valid (supposedly into eternity).
Regards,
Søren
On Wed, 27 Oct 2010 19:10:04 -0400, David Friedman
<harrypotterdhf@EARTHLINK.NET> wrote:
>I have a set of records and I want to collapse them by id into one record
if
>the stop date of the preceding record = the start date of the next record.
>So in the case below only id=1 would be collapsed since the preceding
record
>has a stop date of 1/30/10 and the start date of the next record is also
>1/30/10. id #2 would not collapse since this condition is not met, i.e.
>5/30/10 not equal to 7/1/10.
>
>Input:
>
>id start date stop date
>-- ---------- ---------
>1 1/1/10 1/30/10
>1 1/30/10 2/15/10
>2 5/1/10 5/30/10
>2 7/1/10 7/15/10
>
>Desired output:
>
>id start date stop date
>-- ---------- ---------
>1 1/1/10 2/15/10
>2 5/1/10 5/30/10
>2 7/1/10 7/15/10
|