LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (October 2010, week 5)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Sat, 30 Oct 2010 09:26:00 -0700
Reply-To:     Sterling Paramore <gnilrets@GMAIL.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Sterling Paramore <gnilrets@GMAIL.COM>
Subject:      Re: Collapsing date and coverage records with no gaps
In-Reply-To:  <201010301610.o9UAn43e021680@willow.cc.uga.edu>
Content-Type: text/plain; charset=UTF-8

I thought about converting coverage spans into distinct days of coverage like these solutions, but my enrollment data is already pretty large. I just estimated that if I did that, I'd end up having to process 2.2 billion records, rather than the 1.5 million that I have (500,000 members X 3 coverage types X 4 years). I look forward to trying your solution when I get back to work Monday.

-Sterling

On Sat, Oct 30, 2010 at 9:10 AM, Arthur Tabachneck <art297@netscape.net>wrote:

> Sterling, > > I definitely like Dan's approach better than the one that I suggested. > However, I'd preface it with a short proc sql seqment to get all of the > necessary pre-information into macro variables and change the code > accordingly. I.e.: > > data have; > length cov $1; > input id $ cov $ start_date :mmddyy10. stop_date :mmddyy10.; > format start_date stop_date yymmdd10.; > cards; > 1 M 1/1/10 6/30/10 > 1 M 7/1/10 12/31/10 > 1 X 4/1/10 6/30/10 > 1 X 7/1/10 12/31/10 > 2 M 1/1/10 8/30/10 > 2 X 7/1/10 12/30/10 > 3 M 1/1/10 12/30/10 > 3 D 2/1/10 6/30/10 > 3 X 4/1/10 12/30/10 > ; > > **----get all dates and values needed-----**; > proc sql noprint; > select distinct cov > into :cat > separated by '' > from have > ; > > select min(start_date), max(stop_date) > into :begin, :end > from have > ; > quit; > > data want(keep=id start_date stop_date multcov); > set have; > by id; > > **----temporary array is automatically retained----**; > array cov_a[&begin:&end] $%sysfunc(length(&cat.)) _temporary_ ; > > if first.id then do; > do _n_ = &begin to &end; > cov_a[_n_]=''; > do _i_=1 to %sysfunc(length(&cat.)); > cov_a[_n_]=cov_a[_n_]||' '; > end; > end; > end; > > do _n_ = start_date to stop_date; > substr(cov_a[_n_],index("&cat.",cov),1) = cov; > end; > > if last.id then do; > start_date = &begin ; > multcov = cov_a[start_date]; > do _n_ = &begin. to &end.; > if cov_a[_n_] NE multcov then do; > stop_date = _n_-1; > multcov=compress(multcov); > if multcov NE '' then output; > multcov = cov_a[_n_]; > start_date = _n_; > _n_ = stop_date; > end; > end; > if multcov NE '' then do; > multcov = compress(multcov); > stop_date = &end; > output; > end; > end; > format start_date stop_date mmddyy10.; > run; > > HTH, > Art > -------- > On Fri, 29 Oct 2010 23:23:15 -0700, Nordlund, Dan (DSHS/RDA) > <NordlDJ@DSHS.WA.GOV> wrote: > > >> -----Original Message----- > >> From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of > >> Arthur Tabachneck > >> Sent: Friday, October 29, 2010 8:48 PM > >> To: SAS-L@LISTSERV.UGA.EDU > >> Subject: Re: Collapsing date and coverage records with no gaps > >> > >> Sterling, > >> > >> I would NEVER suggest that anyone's code is bloated! If it does what > >> you > >> want, and you can easily maintain it, and you can't think of a better > >> or > >> more efficient solution, go with it! > >> > >> My suggested solution, below, may well be bloated by some standards. > >> It > >> is simply a variant of the code that Sören proposed for the previous > >> problem. > >> > >> If you provide the statement to call your macro, I'd be glad to provide > >> a > >> comparison of the two methods: > >> > >> data have; > >> length cov $1; > >> input id $ cov $ start_date :mmddyy10. stop_date :mmddyy10.; > >> format start_date stop_date yymmdd10.; > >> cards; > >> 1 M 1/1/10 6/30/10 > >> 1 M 7/1/10 12/31/10 > >> 1 X 4/1/10 6/30/10 > >> 1 X 7/1/10 12/31/10 > >> 2 M 1/1/10 8/30/10 > >> 2 X 7/1/10 12/30/10 > >> 3 M 1/1/10 12/30/10 > >> 3 D 2/1/10 6/30/10 > >> 3 X 4/1/10 12/30/10 > >> ; > >> > >> proc sort data=have; > >> by id start_date cov; > >> run; > >> > >> data temp (keep=id date cov); > >> set have; > >> format date yymmdd10.; > >> do date=start_date to stop_date; > >> output; > >> end; > >> run; > >> > >> proc sort data=temp; > >> by id date cov; > >> run; > >> > >> data temp2 (drop=cov); > >> set temp; > >> length coverage $3; > >> retain coverage; > >> by id date; > >> if first.date then do; > >> coverage=cov; > >> if last.date then output; > >> end; > >> else do; > >> coverage=catt(coverage,cov); > >> if last.date then output; > >> end; > >> run; > >> > >> proc sort data=temp2; > >> by id coverage date; > >> run; > >> > >> data want; > >> set temp2 (rename=(date=_start)); > >> by id coverage; > >> format start_date stop_date mmddyy8.; > >> retain start_date stop_date; > >> select; > >> when(first.coverage) do; > >> start_date=_start; > >> stop_date=_start; > >> end; > >> when (stop_date ge _start-1) do; > >> stop_date = max(_start,stop_date); > >> end; > >> otherwise do; > >> output; > >> start_date=_start; > >> stop_date=_start; > >> end; > >> end; > >> if last.coverage then output; > >> drop _:; > >> run; > >> > >> I think it would be easier to both run and maintain, but that is only > >> my > >> opinion. > >> > >> Art > > > >Sterling, > > > >I see that Art has provided one possible solution. I will offer a > different approach. It requires knowing the earliest and latest dates > that you are interested in (or at least setting &begin to a date earlier > than any possible start_date and &end after any possible stop_date). It > doesn't require that your data be sorted by date or coverage, only that > the data be grouped by id. If the records are grouped by id but not > sorted by id, you could add the NOTSORTED option to the by statement. It > uses an array to hold the coverages that the person (id) is eligible for > on each day. I moved the fourth record for id 1 to be the first record to > demonstrate that the data don't need to be sorted by date or coverage > within id. > > > >**----beginning and ending dates of interest----**; > >%let begin = %sysfunc(intnx(day,'01jan2010'd,0)); > >%let end = %sysfunc(intnx(day,'31dec2010'd,0)); > > > >data have; > > length cov $1; > > input id $ cov $ start_date :mmddyy10. stop_date :mmddyy10.; > > format start_date stop_date yymmdd10.; > >cards; > >1 X 7/1/10 12/31/10 > >1 M 1/1/10 6/30/10 > >1 M 7/1/10 12/31/10 > >1 X 4/1/10 6/30/10 > >2 M 1/1/10 8/30/10 > >2 X 7/1/10 12/30/10 > >3 M 1/1/10 12/30/10 > >3 D 2/1/10 6/30/10 > >3 X 4/1/10 12/30/10 > >; > >run; > >proc print; > >run; > > > >data want(keep=id start_date stop_date multcov); > > set have; > > by id; > > > > **----temporary array is automatically retained----**; > > array cov_a[&begin:&end] $3 _temporary_ ; > > > > if first.id then do; > > do _n_ = &begin to &end; > > cov_a[_n_]=' '; > > end; > > end; > > > > select(cov); > > when('D') do _n_ = start_date to stop_date; substr(cov_a[_n_],1,1) > = 'D'; end; > > when('M') do _n_ = start_date to stop_date; substr(cov_a[_n_],2,1) > = 'M'; end; > > when('X') do _n_ = start_date to stop_date; substr(cov_a[_n_],3,1) > = 'X'; end; > > otherwise; > > end; > > > > if last.id then do; > > start_date = &begin ; > > multcov = cov_a[start_date]; > > do _n_ = '01jan2010'd to '31dec2010'd; > > if cov_a[_n_] NE multcov then do; > > stop_date = _n_-1; > > multcov=compress(multcov); > > if multcov NE '' then output; > > multcov = cov_a[_n_]; > > start_date = _n_; > > _n_ = stop_date; > > end; > > end; > > if multcov NE '' then do; > > multcov = compress(multcov); > > stop_date = &end; > > output; > > end; > > end; > >format start_date stop_date mmddyy10.; > >run; > >proc print; > >run; > > > >Hope this is helpful, > > > >Dan > > > >Daniel J. Nordlund > >Washington State Department of Social and Health Services > >Planning, Performance, and Accountability > >Research and Data Analysis Division > >Olympia, WA 98504-5204 >


Back to: Top of message | Previous page | Main SAS-L page