Date: Thu, 23 Dec 2010 14:47:02 -0500
Reply-To: Jerry <i89rt5@GMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Jerry <i89rt5@GMAIL.COM>
Subject: Re: Collapsing date records with no gaps
Content-Type: text/plain; charset=ISO-8859-1
Gee, this weird thing is not going away. As shown in the log of EG 4.2,
"want" data still have 9 records, which should not occur. Anyone has run
into this problem before?
1 The SAS System 14:33 Thursday,
December 23, 2010
1 ;*';*";*/;quit;run;
2 OPTIONS PAGENO=MIN;
3 %LET _CLIENTTASKLABEL='coverage_DOW-LookAhead';
4 %LET _CLIENTPROJECTPATH='Z:\mmr_retroactivity\mmr_retro.egp';
5 %LET _CLIENTPROJECTNAME='mmr_retro.egp';
6 %LET _SASPROGRAMFILE=;
7
8 ODS _ALL_ CLOSE;
9 OPTIONS DEV=ACTXIMG;
10 GOPTIONS XPIXELS=0 YPIXELS=0;
11 FILENAME EGHTML TEMP;
12 ODS HTML(ID=EGHTML) FILE=EGHTML ENCODING='utf-8' STYLE=Analysis
12 !
STYLESHEET=(URL="file:///C:/Program%20Files/SAS/SharedFiles/BIClientStyles/4.2/
12 ! Analysis.css") NOGTITLE NOGFOOTNOTE GPATH=&sasworklocation;
NOTE: Writing HTML(EGHTML) Body file: EGHTML
13 FILENAME EGSR TEMP;
14 ODS tagsets.sasreport12(ID=EGSR) FILE=EGSR STYLE=Statistical
14 !
STYLESHEET=(URL="file:///C:/Program%20Files/SAS/SharedFiles/BIClientStyles/4.2/
14 ! Statistical.css") NOGTITLE NOGFOOTNOTE GPATH=&sasworklocation
ENCODING=UTF8
14 ! options(rolap="on");
NOTE: Writing TAGSETS.SASREPORT12(EGSR) Body file: EGSR
15
16 GOPTIONS ACCESSIBLE;
17 data want;
18 do until(last.id);
19 set have(rename=(start_date=_start stop_date=_stop));
20 by id;
21
22 if first.id then do;
23 start_date=_start;
24 stop_date=_stop;
25 end;
26
27 else if _start<=stop_date+1 then do;
28 if _stop > stop_date then stop_date = _stop;
29 end;
30
31 else do;
32 output;
33 start_date=_start;
34 stop_date=_stop;
35 end;
36
37 if last.id then output;
38
39 end; /*end do until*/
40
41 format start_date stop_date yymmdd10.;
42 drop _: ;
43
44 run;
NOTE: There were 9 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.WANT has 9 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
45
46
47 GOPTIONS NOACCESSIBLE;
48 %LET _CLIENTTASKLABEL=;
49 %LET _CLIENTPROJECTPATH=;
50 %LET _CLIENTPROJECTNAME=;
51 %LET _SASPROGRAMFILE=;
52
53 ;*';*";*/;quit;run;
54 ODS _ALL_ CLOSE;
55
56
57 QUIT; RUN;
58
On Thu, 23 Dec 2010 13:58:08 -0500, Jerry <i89rt5@GMAIL.COM> wrote:
>By "marked", do you mean the code which is selected and thus highlighted in
>dark blue?
>
>I did highlighted Dan's code in my EG project and choose "Run Selection"
>from the menu to run it. I think this is the way to selectively run his
>code, right?
>
>On Thu, 23 Dec 2010 12:50:34 -0500, Gerhard Hellriegel
><gerhard.hellriegel@T-ONLINE.DE> wrote:
>
>>do you have something marked in a editor in EG? In that case only the
>>marked code might be executed.
>>
>>Gerhard
>>
>>
>>
>>
>>On Thu, 23 Dec 2010 12:40:25 -0500, Jerry <i89rt5@GMAIL.COM> wrote:
>>
>>>Art,
>>>
>>>Thanks for testing it.
>>>
>>>I don't think I was hallucinating. But when I run Dan's code below on
>>>Enterprise Miner 4.2, sometimes the output data "want" is exactly same as
>>>the input data "have", as if the code was not run, But sometime this weird
>>>thing did not occur when I ran the code. I could not consistently
>>replicate
>>>this weird thing.
>>>
>>>I have never seen this happen, but I only started using EG a bit over one
>>>month ago. In any event, this makes a me a bit wary about EG.
>>>
>>>Maybe I need to start a new project file or a new program file for testing
>>>the code?
>>>
>>>/******Dan's code**********/
>>>data have;
>>> input id start_date :mmddyy10. stop_date :mmddyy10.;
>>> format start_date stop_date yymmdd10.;
>>>cards;
>>>1 1/1/10 1/30/10
>>>1 1/30/10 2/15/10
>>>1 2/13/10 3/15/10
>>>1 3/16/10 3/30/10
>>>2 5/1/10 5/30/10
>>>2 7/1/10 7/15/10
>>>3 7/15/10 7/30/10
>>>4 5/1/10 7/30/10
>>>4 7/1/10 7/15/10
>>>;
>>>run;
>>>proc print;
>>>run;
>>>
>>>data want1;
>>> do until(last.id);
>>> set have(rename=(start_date=_start stop_date=_stop));
>>> by id;
>>>
>>> if first.id then do;
>>> start_date=_start;
>>> stop_date=_stop;
>>> end;
>>> else if _start<=stop_date+1 then do;
>>> if _stop > stop_date then stop_date = _stop;
>>> end;
>>> else do;
>>> output;
>>> start_date=_start;
>>> stop_date=_stop;
>>> end;
>>> if last.id then output;
>>> end; /*end do until*/
>>>
>>>format start_date stop_date yymmdd10.;
>>>drop _: ;
>>>run;
>>>
>>>
>>>
>>>
>>>On Thu, 23 Dec 2010 11:30:24 -0500, Arthur Tabachneck <art297@ROGERS.COM>
>>wrote:
>>>
>>>>Jerry,
>>>>
>>>>I haven't looked back to see what problem the code is trying to solve,
>>but
>>>>Dan's code takes 9 records and outputs 5 records. That is, file want is
>>>>definitely different than file have.
>>>>
>>>>Art
>>>>--------
>>>>On Thu, 23 Dec 2010 10:38:07 -0500, Jerry <i89rt5@GMAIL.COM> wrote:
>>>>
>>>>>Dan,
>>>>>
>>>>>I'm not sure what I am missing. But running your code of "corrected
>>>>version"
>>>>>(see bottom part). The input data set "have" is the exactly same
>>as "want".
>>>>>I'm lost.....Can you/anyone help?
>>>>>
>>>>>Sorry for digging out this old thread , but it's very valuable! Sterling
>>>>>Paramore pointed me to this thread. Sterling, thank you!
>>>>>
>>>>>Thanks.
>>>>>
>>>>>On Fri, 29 Oct 2010 13:33:57 -0700, Nordlund, Dan (DSHS/RDA)
>>>>><NordlDJ@DSHS.WA.GOV> wrote:
>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
>>>>>>> Søren Lassen
>>>>>>> Sent: Friday, October 29, 2010 1:31 AM
>>>>>>> To: SAS-L@LISTSERV.UGA.EDU
>>>>>>> Subject: Re: Collapsing date records with no gaps
>>>>>>>
>>>>>>> David,
>>>>>>> A large number of solutions have been proposed. Most of them
>>>>>>> are variations on a DOW loop. I did not study the other solutions
>>>>>>> in detail, but it seems to me that all the DOW solutions proposed
>>>>>>> will output nothing if there is only one record for an ID.
>>>>>>>
>>>>>>> My suggested solution is this:
>>>>>>>
>>>>>>> proc sort data=have;
>>>>>>> by id start_date stop_date;
>>>>>>> /* Get rid of zero/negative length intervals */
>>>>>>> where stop_date ge start_date or stop_date is null;
>>>>>>> run;
>>>>>>>
>>>>>>> data want (keep=id start_date stop_date);
>>>>>>> set have(rename=(start_date=_start stop_date=_stop));
>>>>>>> by id;
>>>>>>> format start_date stop_date mmddyy8.;
>>>>>>> retain start_date stop_date;
>>>>>>> select;
>>>>>>> when(first.id) do;
>>>>>>> start_date=_start;
>>>>>>> stop_date=_stop;
>>>>>>> end;
>>>>>>> when(missing(stop_date)); /* never stops, "infinite" stop date */
>>>>>>> when(stop_date ge _stop); /* nested intervals */
>>>>>>> when (stop_date ge _start-1) do; /* adjacent or overlappping */
>>>>>>> if missing(_stop) then
>>>>>>> stop_date=.;
>>>>>>> else
>>>>>>> stop_date = max(_stop,stop_date);
>>>>>>> end;
>>>>>>> otherwise do;
>>>>>>> output;
>>>>>>> start_date=_start;
>>>>>>> stop_date=_stop;
>>>>>>> end;
>>>>>>> end;
>>>>>>> if last.id then output;
>>>>>>> drop _:;
>>>>>>> run;
>>>>>>>
>>>>>>> As you can see, I left out the DOW loop and put in a retain statement
>>>>>>> instead. This is mostly a matter of personal taste, likewise the
>>>>>>> use of SELECT instead of nested IF..THEN..ELSE statements.
>>>>>>>
>>>>>>> I added an extra twist: if stop_date is missing, it probably means
>>>>>>> that the record is still valid (supposedly into eternity).
>>>>>>>
>>>>>>> Regards,
>>>>>>> Søren
>>>>>>>
>>>>>>
>>>>>>Sören,
>>>>>>
>>>>>>You are correct that my original solution did not handle the case of a
>>>>>single record for an id. I offer a corrected version which also handles
>>>>>overlapping and nested spans in addition to the case where a span begins
>>>>the
>>>>>following day. It could easily be modified to handle the case where a
>>stop
>>>>>_date was missing, but at first blush I would probably handle it
>>>>differently
>>>>>than you if there were a following record for the id with a non-missing
>>>>>stop_date. But that brings me back to a comment that I made in
>>response to
>>>>>Sterling. And that is, what to do with spans that overlap in various
>>ways
>>>>>(or have missing values) depends on local business rules. It is
>>>>unfortunate
>>>>>that the OP hasn't responded with clarification or told us how the
>>problem
>>>>>was solved (if I missed that response, I apologize). But anyway, here
>>is
>>>>my
>>>>>modification.
>>>>>>
>>>>>>data have;
>>>>>> input id start_date :mmddyy10. stop_date :mmddyy10.;
>>>>>> format start_date stop_date yymmdd10.;
>>>>>>cards;
>>>>>>1 1/1/10 1/30/10
>>>>>>1 1/30/10 2/15/10
>>>>>>1 2/13/10 3/15/10
>>>>>>1 3/16/10 3/30/10
>>>>>>2 5/1/10 5/30/10
>>>>>>2 7/1/10 7/15/10
>>>>>>3 7/15/10 7/30/10
>>>>>>4 5/1/10 7/30/10
>>>>>>4 7/1/10 7/15/10
>>>>>>;
>>>>>>run;
>>>>>>proc print;
>>>>>>run;
>>>>>>
>>>>>>data want;
>>>>>> do until(last.id);
>>>>>> set have(rename=(start_date=_start stop_date=_stop));
>>>>>> by id;
>>>>>>
>>>>>> if first.id then do;
>>>>>> start_date=_start;
>>>>>> stop_date=_stop;
>>>>>> end;
>>>>>> else if _start<=stop_date+1 then do;
>>>>>> if _stop > stop_date then stop_date = _stop;
>>>>>> end;
>>>>>> else do;
>>>>>> output;
>>>>>> start_date=_start;
>>>>>> stop_date=_stop;
>>>>>> end;
>>>>>> if last.id then output;
>>>>>> end; /*end do until*/
>>>>>>
>>>>>>format start_date stop_date yymmdd10.;
>>>>>>drop _: ;
>>>>>>run;
>>>>>>
>>>>>>proc print;
>>>>>>run;
>>>>>>
>>>>>>Hope this is helpful,
>>>>>>
>>>>>>Dan
>>>>>>
>>>>>>Daniel J. Nordlund
>>>>>>Washington State Department of Social and Health Services
>>>>>>Planning, Performance, and Accountability
>>>>>>Research and Data Analysis Division
>>>>>>Olympia, WA 98504-5204
|