Date: Sat, 30 Oct 2010 09:26:00 -0700
Reply-To: Sterling Paramore <gnilrets@GMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Sterling Paramore <gnilrets@GMAIL.COM>
Subject: Re: Collapsing date and coverage records with no gaps
In-Reply-To: <201010301610.o9UAn43e021680@willow.cc.uga.edu>
Content-Type: text/plain; charset=UTF-8
I thought about converting coverage spans into distinct days of coverage
like these solutions, but my enrollment data is already pretty large. I
just estimated that if I did that, I'd end up having to process 2.2 billion
records, rather than the 1.5 million that I have (500,000 members X 3
coverage types X 4 years). I look forward to trying your solution when I
get back to work Monday.
-Sterling
On Sat, Oct 30, 2010 at 9:10 AM, Arthur Tabachneck <art297@netscape.net>wrote:
> Sterling,
>
> I definitely like Dan's approach better than the one that I suggested.
> However, I'd preface it with a short proc sql seqment to get all of the
> necessary pre-information into macro variables and change the code
> accordingly. I.e.:
>
> data have;
> length cov $1;
> input id $ cov $ start_date :mmddyy10. stop_date :mmddyy10.;
> format start_date stop_date yymmdd10.;
> cards;
> 1 M 1/1/10 6/30/10
> 1 M 7/1/10 12/31/10
> 1 X 4/1/10 6/30/10
> 1 X 7/1/10 12/31/10
> 2 M 1/1/10 8/30/10
> 2 X 7/1/10 12/30/10
> 3 M 1/1/10 12/30/10
> 3 D 2/1/10 6/30/10
> 3 X 4/1/10 12/30/10
> ;
>
> **----get all dates and values needed-----**;
> proc sql noprint;
> select distinct cov
> into :cat
> separated by ''
> from have
> ;
>
> select min(start_date), max(stop_date)
> into :begin, :end
> from have
> ;
> quit;
>
> data want(keep=id start_date stop_date multcov);
> set have;
> by id;
>
> **----temporary array is automatically retained----**;
> array cov_a[&begin:&end] $%sysfunc(length(&cat.)) _temporary_ ;
>
> if first.id then do;
> do _n_ = &begin to &end;
> cov_a[_n_]='';
> do _i_=1 to %sysfunc(length(&cat.));
> cov_a[_n_]=cov_a[_n_]||' ';
> end;
> end;
> end;
>
> do _n_ = start_date to stop_date;
> substr(cov_a[_n_],index("&cat.",cov),1) = cov;
> end;
>
> if last.id then do;
> start_date = &begin ;
> multcov = cov_a[start_date];
> do _n_ = &begin. to &end.;
> if cov_a[_n_] NE multcov then do;
> stop_date = _n_-1;
> multcov=compress(multcov);
> if multcov NE '' then output;
> multcov = cov_a[_n_];
> start_date = _n_;
> _n_ = stop_date;
> end;
> end;
> if multcov NE '' then do;
> multcov = compress(multcov);
> stop_date = &end;
> output;
> end;
> end;
> format start_date stop_date mmddyy10.;
> run;
>
> HTH,
> Art
> --------
> On Fri, 29 Oct 2010 23:23:15 -0700, Nordlund, Dan (DSHS/RDA)
> <NordlDJ@DSHS.WA.GOV> wrote:
>
> >> -----Original Message-----
> >> From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
> >> Arthur Tabachneck
> >> Sent: Friday, October 29, 2010 8:48 PM
> >> To: SAS-L@LISTSERV.UGA.EDU
> >> Subject: Re: Collapsing date and coverage records with no gaps
> >>
> >> Sterling,
> >>
> >> I would NEVER suggest that anyone's code is bloated! If it does what
> >> you
> >> want, and you can easily maintain it, and you can't think of a better
> >> or
> >> more efficient solution, go with it!
> >>
> >> My suggested solution, below, may well be bloated by some standards.
> >> It
> >> is simply a variant of the code that Sören proposed for the previous
> >> problem.
> >>
> >> If you provide the statement to call your macro, I'd be glad to provide
> >> a
> >> comparison of the two methods:
> >>
> >> data have;
> >> length cov $1;
> >> input id $ cov $ start_date :mmddyy10. stop_date :mmddyy10.;
> >> format start_date stop_date yymmdd10.;
> >> cards;
> >> 1 M 1/1/10 6/30/10
> >> 1 M 7/1/10 12/31/10
> >> 1 X 4/1/10 6/30/10
> >> 1 X 7/1/10 12/31/10
> >> 2 M 1/1/10 8/30/10
> >> 2 X 7/1/10 12/30/10
> >> 3 M 1/1/10 12/30/10
> >> 3 D 2/1/10 6/30/10
> >> 3 X 4/1/10 12/30/10
> >> ;
> >>
> >> proc sort data=have;
> >> by id start_date cov;
> >> run;
> >>
> >> data temp (keep=id date cov);
> >> set have;
> >> format date yymmdd10.;
> >> do date=start_date to stop_date;
> >> output;
> >> end;
> >> run;
> >>
> >> proc sort data=temp;
> >> by id date cov;
> >> run;
> >>
> >> data temp2 (drop=cov);
> >> set temp;
> >> length coverage $3;
> >> retain coverage;
> >> by id date;
> >> if first.date then do;
> >> coverage=cov;
> >> if last.date then output;
> >> end;
> >> else do;
> >> coverage=catt(coverage,cov);
> >> if last.date then output;
> >> end;
> >> run;
> >>
> >> proc sort data=temp2;
> >> by id coverage date;
> >> run;
> >>
> >> data want;
> >> set temp2 (rename=(date=_start));
> >> by id coverage;
> >> format start_date stop_date mmddyy8.;
> >> retain start_date stop_date;
> >> select;
> >> when(first.coverage) do;
> >> start_date=_start;
> >> stop_date=_start;
> >> end;
> >> when (stop_date ge _start-1) do;
> >> stop_date = max(_start,stop_date);
> >> end;
> >> otherwise do;
> >> output;
> >> start_date=_start;
> >> stop_date=_start;
> >> end;
> >> end;
> >> if last.coverage then output;
> >> drop _:;
> >> run;
> >>
> >> I think it would be easier to both run and maintain, but that is only
> >> my
> >> opinion.
> >>
> >> Art
> >
> >Sterling,
> >
> >I see that Art has provided one possible solution. I will offer a
> different approach. It requires knowing the earliest and latest dates
> that you are interested in (or at least setting &begin to a date earlier
> than any possible start_date and &end after any possible stop_date). It
> doesn't require that your data be sorted by date or coverage, only that
> the data be grouped by id. If the records are grouped by id but not
> sorted by id, you could add the NOTSORTED option to the by statement. It
> uses an array to hold the coverages that the person (id) is eligible for
> on each day. I moved the fourth record for id 1 to be the first record to
> demonstrate that the data don't need to be sorted by date or coverage
> within id.
> >
> >**----beginning and ending dates of interest----**;
> >%let begin = %sysfunc(intnx(day,'01jan2010'd,0));
> >%let end = %sysfunc(intnx(day,'31dec2010'd,0));
> >
> >data have;
> > length cov $1;
> > input id $ cov $ start_date :mmddyy10. stop_date :mmddyy10.;
> > format start_date stop_date yymmdd10.;
> >cards;
> >1 X 7/1/10 12/31/10
> >1 M 1/1/10 6/30/10
> >1 M 7/1/10 12/31/10
> >1 X 4/1/10 6/30/10
> >2 M 1/1/10 8/30/10
> >2 X 7/1/10 12/30/10
> >3 M 1/1/10 12/30/10
> >3 D 2/1/10 6/30/10
> >3 X 4/1/10 12/30/10
> >;
> >run;
> >proc print;
> >run;
> >
> >data want(keep=id start_date stop_date multcov);
> > set have;
> > by id;
> >
> > **----temporary array is automatically retained----**;
> > array cov_a[&begin:&end] $3 _temporary_ ;
> >
> > if first.id then do;
> > do _n_ = &begin to &end;
> > cov_a[_n_]=' ';
> > end;
> > end;
> >
> > select(cov);
> > when('D') do _n_ = start_date to stop_date; substr(cov_a[_n_],1,1)
> = 'D'; end;
> > when('M') do _n_ = start_date to stop_date; substr(cov_a[_n_],2,1)
> = 'M'; end;
> > when('X') do _n_ = start_date to stop_date; substr(cov_a[_n_],3,1)
> = 'X'; end;
> > otherwise;
> > end;
> >
> > if last.id then do;
> > start_date = &begin ;
> > multcov = cov_a[start_date];
> > do _n_ = '01jan2010'd to '31dec2010'd;
> > if cov_a[_n_] NE multcov then do;
> > stop_date = _n_-1;
> > multcov=compress(multcov);
> > if multcov NE '' then output;
> > multcov = cov_a[_n_];
> > start_date = _n_;
> > _n_ = stop_date;
> > end;
> > end;
> > if multcov NE '' then do;
> > multcov = compress(multcov);
> > stop_date = &end;
> > output;
> > end;
> > end;
> >format start_date stop_date mmddyy10.;
> >run;
> >proc print;
> >run;
> >
> >Hope this is helpful,
> >
> >Dan
> >
> >Daniel J. Nordlund
> >Washington State Department of Social and Health Services
> >Planning, Performance, and Accountability
> >Research and Data Analysis Division
> >Olympia, WA 98504-5204
>
|