LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (October 2010, week 5)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Fri, 29 Oct 2010 17:21:37 +0000
Reply-To:     "Keintz, H. Mark" <mkeintz@WHARTON.UPENN.EDU>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "Keintz, H. Mark" <mkeintz@WHARTON.UPENN.EDU>
Subject:      Re: Collapsing date records with no gaps
Comments: To: Arthur Tabachneck <art297@NETSCAPE.NET>
In-Reply-To:  <201010282251.o9SMHR1T005071@willow.cc.uga.edu>
Content-Type: text/plain; charset="us-ascii"

Art, thanks.

I reproduced your results, but unbeliever that I am, I figured it couldn't actually be the lookahead method that slowed things down, but rather my use of the MIN function that degraded performance. Only partly true.

I replaced the MINs by more resource-effective code and reduced it to 50% slower than yours. For those who might be interested, here's the look-ahead method, with improved efficiency. But it's not as fast as DOW. Hmm.

Regards, Mark

data want (keep=id startseries stop_date rename=(startseries=start_date)); retain startseries; format startseries mmddyy8.; set have; by id;

** Now look ahead, with wraparound to avoid premature end-of-input signal **; set have (firstobs=2 keep=start_date rename=(start_date=nextstart)) have (obs=1 keep=start_date rename=(start_date=nextstart));

startseries=coalesce(startseries,start_date); if last.id=1 or nextstart ^= stop_date then do; output; startseries=nextstart; end; run;

> -----Original Message----- > From: Arthur Tabachneck [mailto:art297@NETSCAPE.NET] > Sent: Thursday, October 28, 2010 6:52 PM > To: SAS-L@LISTSERV.UGA.EDU; Keintz, H. Mark > Subject: Re: Collapsing date records with no gaps > > Mark, > > I had compared the two before submitting the post. My comparison test > was: > > data have (drop=i); > infile cards ; > input id (start_date stop_date) (mmddyy8.); > format start_date stop_date mmddyy8.; > output; > do i=1 to 10000; > id=id*2+i; > output; > end; > cards; > 1 1/1/10 1/30/10 > 1 1/30/10 2/15/10 > 1 2/15/10 3/15/10 > 1 3/16/10 4/15/10 > 2 5/1/10 5/30/10 > 2 7/1/10 7/15/10 > ; > run ; > > proc sort data=have; > by id start_date; > run; > > data want (keep=id start_date stop_date); > do until(last.id); > set have(rename=(start_date=_start stop_date=_stop)); > by id; > format start_date stop_date mmddyy8.; > if first.id then do; > start_date=_start; > stop_date=_stop; > end; > else if stop_date eq _start then do; > stop_date = _stop; > if last.id then output; > end; > else if stop_date gt _stop then do; > _stop=stop_date; > if last.id then output; > end; > else do; > output; > start_date=_start; > stop_date=_stop; > if last.id then output; > end; > end; > run; > > data want2 (drop=startseries nextstart); > retain startseries; > set have; > by id; > > ** Now look ahead, with wraparound to avoid premature end-of-input > signal **; > set have (firstobs=2 keep=start_date rename=(start_date=nextstart)) > have (obs=1 keep=start_date rename=(start_date=nextstart)); > > if last.id=1 or nextstart ^= stop_date then do; > start_date=min(start_date,startseries); > output; > startseries=.; > end; > > else startseries=min(startseries,start_date); > run; > > My results, given numerous tests in various orders, were around 4 > seconds > for the first method, and around 9 seconds for the second method. I > was > running 9.1.3 m3 on a 4-processor windows 2003 server. > > Art > ------- > On Thu, 28 Oct 2010 22:21:57 +0000, Keintz, H. Mark > <mkeintz@WHARTON.UPENN.EDU> wrote: > > >Art: > > > >I agree that for many people, the DOW approach is as easy to > understand > as the lookahead method, and I can see why someone would prefer it. > For > one thing, it probably will be more robust if the OP's specifications > are > expanded. > > > >But using "half the IO time" is, in this situation, is very unlikely. > Yes, in my offering, dataset HAVE is being read twice, but the two read > streams are essentially in parallel (one stream reads record I, the > other > record I+1). So I believe one SET statement will merely read from the > operating system's I/O cache generated by servicing the other SET > statement. > > > >Regards, > >Mark > > > > > >> -----Original Message----- > >> From: Arthur Tabachneck [mailto:art297@NETSCAPE.NET] > >> Sent: Thursday, October 28, 2010 5:43 PM > >> To: SAS-L@LISTSERV.UGA.EDU; Keintz, H. Mark > >> Subject: Re: Collapsing date records with no gaps > >> > >> Mark, > >> > >> Of course the problem doesn't "require" the use of a DOW loop, but I > >> think > >> it is the most efficient of the approaches that were offered. > >> > >> I was originally going to suggest the look-ahead method, but then > never > >> sent it after seeing Dan's code. > >> > >> However, to address Sterling's comment/question, I think we all > accept > >> something that gets the job done and, even better, code that someone > >> else > >> can understand. I'm sure that much of my own code would be > considered > >> fluff by many, but as long as it does that which was intended, does > it > >> correctly, and allows me to me a deadline, I'm quite happy with it. > >> > >> Why I prefer Dan's code, in this case (over the look ahead method), > is > >> that it only requires half the IO time and is no harder to follow > than > >> any > >> of the other methods suggested. > >> > >> Art > >> ------- > >> On Thu, 28 Oct 2010 13:09:42 +0000, Keintz, H. Mark > >> <mkeintz@WHARTON.UPENN.EDU> wrote: > >> > >> >Folks: > >> > > >> >I don't think this problem as stated requires use of a DOW loop > >> approach. It looks to me like a simple look-ahead task, in which a > >> record > >> is output if: > >> > 1. The current record is the last for an ID, or > >> > 2. the next record does not have start_date = this record's > >> stop_date. > >> > > >> >This plus a way to RETAIN the start_date for the current record > series > >> should be enough: > >> > > >> > > >> >data want (drop=startseries nextstart); > >> > retain startseries; > >> > set have; > >> > by id; > >> > > >> > ** Now look ahead, with wraparound to avoid premature end-of- > input > >> signal **; > >> > set have (firstobs=2 keep=start_date > rename=(start_date=nextstart)) > >> > have (obs=1 keep=start_date > rename=(start_date=nextstart)); > >> > > >> > if last.id=1 or nextstart ^= stop_date then do; > >> > start_date=min(start_date,startseries); > >> > output; > >> > startseries=.; > >> > end; > >> > > >> > else startseries=min(startseries,start_date); > >> >run; > >> > > >> > > >> >Regards, > >> >Mark > >> > > >> > > >> >> -----Original Message----- > >> >> From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf > Of > >> >> Arthur Tabachneck > >> >> Sent: Wednesday, October 27, 2010 11:52 PM > >> >> To: SAS-L@LISTSERV.UGA.EDU > >> >> Subject: Re: Collapsing date records with no gaps > >> >> > >> >> Shyam, > >> >> > >> >> If that is what the OP wants, a minor modification to one line of > >> Dan's > >> >> code will produce the same result. That is: > >> >> > >> >> data have ; > >> >> infile cards ; > >> >> input id (start_date stop_date) (mmddyy8.); > >> >> format start_date stop_date mmddyy8.; > >> >> cards; > >> >> 1 1/1/10 1/30/10 > >> >> 1 1/30/10 2/15/10 > >> >> 1 2/15/10 3/15/10 > >> >> 1 3/16/10 4/15/10 > >> >> 2 5/1/10 5/30/10 > >> >> 2 7/1/10 7/15/10 > >> >> ; > >> >> run ; > >> >> > >> >> data want (keep=id start_date stop_date); > >> >> do until(last.id); > >> >> set have(rename=(start_date=_start stop_date=_stop)); > >> >> by id; > >> >> format start_date stop_date mmddyy8.; > >> >> if first.id then do; > >> >> start_date=_start; > >> >> stop_date=_stop; > >> >> end; > >> >> else if stop_date eq _start or stop_date+1 eq _start then do; > >> >> stop_date = _stop; > >> >> if last.id then output; > >> >> end; > >> >> else do; > >> >> output; > >> >> start_date=_start; > >> >> stop_date=_stop; > >> >> if last.id then output; > >> >> end; > >> >> end; > >> >> run; > >> >> > >> >> HTH, > >> >> Art > >> >> -------- > >> >> On Wed, 27 Oct 2010 22:50:01 -0400, shyamprasad samisetti > >> >> <shyam.cbz@GMAIL.COM> wrote: > >> >> > >> >> >How about, if there is a record with a day after the stop date > >> >> too........? > >> >> >I am adding an extra record in addition to the record added by > >> Daniel > >> >> >Nordlund. The next start date is a day after stop date. > >> >> > > >> >> >Data Rec; > >> >> > Input Id Startdt :mmddyy10. Stopdt :mmddyy10.; > >> >> > Datalines; > >> >> >1 1/1/10 1/30/10 > >> >> >1 1/30/10 2/15/10 > >> >> >1 2/15/10 3/15/10 > >> >> >1 3/16/10 4/15/10 > >> >> >2 5/1/10 5/30/10 > >> >> >2 7/1/10 7/15/10 > >> >> >; > >> >> >run; > >> >> > > >> >> > Proc Sort; > >> >> > By Id; > >> >> > Format Startdt Stopdt Date9.; > >> >> >Run; > >> >> > > >> >> >Data Rec1; > >> >> > SEt Rec; > >> >> > by Id; > >> >> > Lagdate1 = Lag(Startdt); > >> >> > Lagdate2 = Lag(stopdt); > >> >> > Format Startdt Stopdt Newstart newend Lag: date9.; > >> >> > Retain newstart newend; > >> >> > if first.Id then > >> >> >Do; > >> >> > Lagdate1 = .; > >> >> > Lagdate2 = .; > >> >> > Newstart = Startdt; > >> >> > Newend = Stopdt; > >> >> >End; > >> >> > > >> >> >If lagdate2 = Startdt Then Newend = Stopdt; > >> >> >ELSE IF LAGDATE2 + 1 = STARTDT THEN NEWEND = STOPDT; /*looks > for > >> the > >> >> record > >> >> >with startdt > by 1 day for previous stopdt*/ > >> >> >Else do; > >> >> > Newstart = Startdt; > >> >> > Newend = Stopdt; > >> >> >End; > >> >> >Run;** > >> >> >Proc Sort Data = Rec1 > >> >> > out = Rec2; > >> >> > > >> >> > By id Newstart; > >> >> >Run; > >> >> >Data Final(Keep = id newstart newend); > >> >> > Set Rec2; > >> >> > by id newstart; > >> >> > If Last.Newstart; > >> >> >Run; > >> >> > > >> >> >Hope this helps. > >> >> >-Shyam. > >> >> > > >> >> >On Wed, Oct 27, 2010 at 8:14 PM, Nordlund, Dan (DSHS/RDA) < > >> >> >NordlDJ@dshs.wa.gov> wrote: > >> >> > > >> >> >> > -----Original Message----- > >> >> >> > From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On > >> Behalf > >> >> Of > >> >> >> > David Friedman > >> >> >> > Sent: Wednesday, October 27, 2010 4:10 PM > >> >> >> > To: SAS-L@LISTSERV.UGA.EDU > >> >> >> > Subject: Collapsing date records with no gaps > >> >> >> > > >> >> >> > I have a set of records and I want to collapse them by id > into > >> one > >> >> >> > record if > >> >> >> > the stop date of the preceding record = the start date of > the > >> next > >> >> >> > record. > >> >> >> > So in the case below only id=1 would be collapsed since the > >> >> preceding > >> >> >> > record > >> >> >> > has a stop date of 1/30/10 and the start date of the next > >> record > >> >> is > >> >> >> > also > >> >> >> > 1/30/10. id #2 would not collapse since this condition is > not > >> met, > >> >> i.e. > >> >> >> > 5/30/10 not equal to 7/1/10. > >> >> >> > > >> >> >> > Input: > >> >> >> > > >> >> >> > id start date stop date > >> >> >> > -- ---------- --------- > >> >> >> > 1 1/1/10 1/30/10 > >> >> >> > 1 1/30/10 2/15/10 > >> >> >> > 2 5/1/10 5/30/10 > >> >> >> > 2 7/1/10 7/15/10 > >> >> >> > > >> >> >> > Desired output: > >> >> >> > > >> >> >> > id start date stop date > >> >> >> > -- ---------- --------- > >> >> >> > 1 1/1/10 2/15/10 > >> >> >> > 2 5/1/10 5/30/10 > >> >> >> > 2 7/1/10 7/15/10 > >> >> >> > >> >> >> David, > >> >> >> > >> >> >> I added an additional record for id=1 in case you had more > than 2 > >> >> records > >> >> >> that needed to be collapsed. Something like the following > should > >> >> work. > >> >> >> > >> >> >> data have; > >> >> >> input id start_date :mmddyy10. stop_date :mmddyy10.; > >> >> >> cards; > >> >> >> 1 1/1/10 1/30/10 > >> >> >> 1 1/30/10 2/15/10 > >> >> >> 1 2/15/10 3/15/10 > >> >> >> 2 5/1/10 5/30/10 > >> >> >> 2 7/1/10 7/15/10 > >> >> >> ; > >> >> >> run; > >> >> >> > >> >> >> data want; > >> >> >> do until(last.id); > >> >> >> set have(rename=(start_date=_start stop_date=_stop)); > >> >> >> by id; > >> >> >> > >> >> >> if first.id then do; > >> >> >> start_date=_start; > >> >> >> stop_date=_stop; > >> >> >> end; > >> >> >> else if _start=stop_date then do; > >> >> >> stop_date = _stop; > >> >> >> if last.id then output; > >> >> >> end; > >> >> >> else do; > >> >> >> output; > >> >> >> start_date=_start; > >> >> >> stop_date=_stop; > >> >> >> if last.id then output; > >> >> >> end; > >> >> >> end; > >> >> >> run; > >> >> >> > >> >> >> Hope this is helpful, > >> >> >> > >> >> >> Dan > >> >> >> > >> >> >> Daniel J. Nordlund > >> >> >> Washington State Department of Social and Health Services > >> >> >> Planning, Performance, and Accountability > >> >> >> Research and Data Analysis Division > >> >> >> Olympia, WA 98504-5204 > >> >> >>


Back to: Top of message | Previous page | Main SAS-L page