Date: Tue, 30 May 2006 16:38:25 -0700 bluej "SAS(r) Discussion" bluej http://groups.google.com help on my lengthy codes To: sas-l@uga.edu text/plain; charset="iso-8859-1"

I have an unduplication procedure performed on a regular basis. The idea is to exclude a record from a patient if the time interval between the newly received record and the most recent one from the same patient (if present in database) is less than certain value, say 30 days, and include the newly received record into the database if otherwise. To this end I presort the dataset by patient name and date received (called ResultDate in the below codes). Below is an extract of the codes in the case of the maximum of duplicates being 4, that is, since certain predefined starting date, the maximum of records received from the same patient is 4:

data afterundup; set beforeundup; by Name1 ResultDate; if first.Name1 then num1=0; num1 +1; if num1=2 and dif(ResultDate)>=30 then num2=1; if num1=3 and lag(num2)=1 and dif(ResultDate)>=30 then num3=1; else if num1=3 and lag(num2)^=1 and dif2(ResultDate)>=30 then num3=1; if num1=4 and lag(num3)=1 and dif(ResultDate)>=30 then num4=1; else if num1=4 and lag(num3)^=1 and lag2(num2)=1 and dif2(ResultDate)>=30 then num4=1; else if num1=4 and lag(num3)^=1 and lag2(num2)^=1 and dif3(ResultDate)>=30 then num4=1;

It will go on depending on the maximum number of duplicates, and I found it rather time consuming and clumsy. Could any SAS expert give some pointers as to how to transform the above codes into something more efficient? Thanks a lot in advance!

Back to: Top of message | Previous page | Main SAS-L page