LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (July 2007, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Tue, 10 Jul 2007 08:41:26 -0400
Reply-To:   Arild S <sko@KLP.NO>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   Arild S <sko@KLP.NO>
Subject:   Re: "Aligning" (comparing and altering) information across different lines

On Tue, 10 Jul 2007 22:15:30 +1000, David Johnson <d@DKVJ.BIZ> wrote:

>With a set of dated records from another industry, we found the only way to >reliably adjust the dates and model the period (given we had a lot of >missing information) was to build an array. Each record dealt with one >individual and the array elements held the dates. Multiple arrays are >needed for the start and end dates, an identifier and a flag to record when >a record had been adjusted. By stepping across the array, it was possible >then to impute the missing or incorrect values. > >Once that was done, the data was restored to a record-wise structure by >deconstructing the array. I haven't got the code, it belongs to the client, >but can give you some indicators. > > >Firstly you need to know the maximum size of the array, so run some SQL to >get the Max( Count( *) ) for any individual. That value stored in a >macro symbol is used to define the maximum dimension of your array. > >Then in a data step you By-process the data, writing each record to the next >array positions, and when you get the end of the by group, you write the >record out. > >With the new table, apply your rules across the array elements and set the >"fix" flag when you have adjusted a date so you don't chase the same thing >again. You should be able to walk across the array and verify your date >sequences are correct before stepping onto the next individual. If you have >issues with the final data, you can also write the record to a "PROBLEMS" >table to review manually. > >Kind regards > >David > >-----Original Message----- >From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU]On Behalf Of Ole >Rogeberg >Sent: Tuesday, 10 July 2007 7:47 PM >To: SAS-L@LISTSERV.UGA.EDU >Subject: "Aligning" (comparing and altering) information across >different lines > > >Does anyone know of a good way to identify overlaps between time periods >recorded in different lines of data, and manipulating these dates according >to a set of rules? > >I have a large dataset containing work records (more than one million in >total) for a group of individuals, and one individual may have several lines >of data (different work records). Each work record has a start date and an >end-date, but these are not always set accurately. As a result, they may >overlap, in which case I want to apply a set of rules to change the recorded >dates. For instance, a date in the middle of a year (e.g., 1 june) is more >credible than beginning or end of year (1 january og 31 december). > >The problems, as I see them, are for instance that >-information in a later line may require an earlier line for the same >individual to be recorded. This makes it difficult to use a >single-run-through using an "if first.idnr" construction >-The period in record 2 for an individual may overlap with record 3 even if >record 1 does not overlap with either of them. This means that it is >insufficient to compare the first period with all the later periods. >-The period in a record may overlap with more than one later record, and the >overlap need not be in the same place. > > >The immediate ideas I had for solving the problem were rather messy, so I >would be interested in hearing if anyone else has come across a similar >question and have found simple ways of dealing with it (specialized SAS >functions or procedures, clever loops etc.). > >Sincerely, > >Ole Rogeberg

Hei Ole, Hi david

I was thinking the same thing, see if this is a useful illustration:

data workers; length fnr $11; input fnr $ start :date9. end :date9. blablabla $; cards; 20037621021 01JAN2006 31MAR2006 blablabl 20037621021 02APR2006 13AUG2006 blablabl 20037621021 01AUG2006 31DEC2006 blablabl 20038021021 01AUG2006 31JAN2007 blablabl 20038021021 01JAN2007 31DEC2007 blablabl ; run;

data temp; i=0; array startdate [10]; array enddate [10]; array job [10]; array notes [10] $100; do until (last.fnr); *get dates on one line; i+1; set workers (keep=fnr start end); by fnr; if first.fnr then jobnr=0; jobnr+1; *adds a jobnumber; job[i]=jobnr; startdate[i]=start; enddate[i]=end; end; do ii=1 to 10; if startdate[ii]=. then leave; /* rule 1*/

if 1<ii and startdate[ii] <= enddate[ii-1] then do; * Sample rule: If startdate <= previous job's enddate, change it..; notes[ii] = 'Startdate changed from '||put(startdate [ii], date9.)|| ' to '||put(enddate[ii-1]+1, date9.); startdate[ii] = enddate[ii-1]+1; end; /* rule 2*/ /* blablabla...*/ end; do i=1 to 10; *bring it back; if startdate[i]=. then leave; start = startdate[i]; end = enddate[i]; note = notes[i]; jobnr=job[i]; output; end; run;

proc print data=temp; var fnr jobnr start end note; format start end date9.; run;

*merge back dates on fnr og jobnr.


Back to: Top of message | Previous page | Main SAS-L page