On Tue, 10 Jul 2007 22:15:30 +1000, David Johnson <d@DKVJ.BIZ> wrote:
>With a set of dated records from another industry, we found the only way to
>reliably adjust the dates and model the period (given we had a lot of
>missing information) was to build an array. Each record dealt with one
>individual and the array elements held the dates. Multiple arrays are
>needed for the start and end dates, an identifier and a flag to record when
>a record had been adjusted. By stepping across the array, it was possible
>then to impute the missing or incorrect values.
>Once that was done, the data was restored to a record-wise structure by
>deconstructing the array. I haven't got the code, it belongs to the
>but can give you some indicators.
>Firstly you need to know the maximum size of the array, so run some SQL to
>get the Max( Count( *) ) for any individual. That value stored in a
>macro symbol is used to define the maximum dimension of your array.
>Then in a data step you By-process the data, writing each record to the
>array positions, and when you get the end of the by group, you write the
>With the new table, apply your rules across the array elements and set the
>"fix" flag when you have adjusted a date so you don't chase the same thing
>again. You should be able to walk across the array and verify your date
>sequences are correct before stepping onto the next individual. If you
>issues with the final data, you can also write the record to a "PROBLEMS"
>table to review manually.
>From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU]On Behalf Of Ole
>Sent: Tuesday, 10 July 2007 7:47 PM
>Subject: "Aligning" (comparing and altering) information across
>Does anyone know of a good way to identify overlaps between time periods
>recorded in different lines of data, and manipulating these dates according
>to a set of rules?
>I have a large dataset containing work records (more than one million in
>total) for a group of individuals, and one individual may have several
>of data (different work records). Each work record has a start date and an
>end-date, but these are not always set accurately. As a result, they may
>overlap, in which case I want to apply a set of rules to change the
>dates. For instance, a date in the middle of a year (e.g., 1 june) is more
>credible than beginning or end of year (1 january og 31 december).
>The problems, as I see them, are for instance that
>-information in a later line may require an earlier line for the same
>individual to be recorded. This makes it difficult to use a
>single-run-through using an "if first.idnr" construction
>-The period in record 2 for an individual may overlap with record 3 even if
>record 1 does not overlap with either of them. This means that it is
>insufficient to compare the first period with all the later periods.
>-The period in a record may overlap with more than one later record, and
>overlap need not be in the same place.
>The immediate ideas I had for solving the problem were rather messy, so I
>would be interested in hearing if anyone else has come across a similar
>question and have found simple ways of dealing with it (specialized SAS
>functions or procedures, clever loops etc.).
Hei Ole, Hi david
I was thinking the same thing, see if this is a useful illustration:
length fnr $11;
input fnr $ start :date9. end :date9. blablabla $;
20037621021 01JAN2006 31MAR2006 blablabl
20037621021 02APR2006 13AUG2006 blablabl
20037621021 01AUG2006 31DEC2006 blablabl
20038021021 01AUG2006 31JAN2007 blablabl
20038021021 01JAN2007 31DEC2007 blablabl
array startdate ;
array enddate ;
array job ;
array notes  $100;
do until (last.fnr); *get dates on one line;
set workers (keep=fnr start end);
if first.fnr then jobnr=0;
jobnr+1; *adds a jobnumber;
do ii=1 to 10;
if startdate[ii]=. then leave;
/* rule 1*/
if 1<ii and startdate[ii] <= enddate[ii-1] then do; *
Sample rule: If startdate <= previous job's enddate, change it..;
notes[ii] = 'Startdate changed from '||put(startdate
[ii], date9.)|| ' to '||put(enddate[ii-1]+1, date9.);
startdate[ii] = enddate[ii-1]+1;
/* rule 2*/
do i=1 to 10; *bring it back;
if startdate[i]=. then leave;
start = startdate[i];
end = enddate[i];
note = notes[i];
proc print data=temp;
var fnr jobnr start end note;
format start end date9.;
*merge back dates on fnr og jobnr.