|
Here is what the beforeundup dataset looks like:
Name1 ResultDate (More columns)
aaa 1/1/2005 ...
aaa 1/8/2005 ...
aaa 3/1/2005 ...
aaa 3/5/2005 ...
bbb 2/2/2005 ...
bbb 3/3/2005 ...
ccc ... ...
After the unduplication the data should only include patient records
where the time interval is >=30 days if more than one records were
received from the same person. For example, if the first record for
patient aaa is included then the second should be excluded, and the
third is included since apart from the 1st for >=30 days. Comparisons
are then made with the most recent record already selected from the
same person. For patient bbb the first two records should both be
included and so forth. I forgot to add a few more codes in my previous
post, that is in the case of maximum 4 duplicates as the above example:
data afterundup;
set beforeundup;
by Name1 ResultDate;
if first.Name1 then
num1=0;
num1 +1;
if num1=2 and dif(ResultDate)>=30 then num2=1;
if num1=3 and lag(num2)=1 and dif(ResultDate)>=30 then num3=1;
else if num1=3 and lag(num2)^=1 and dif2(ResultDate)>=30 then num3=1;
if num1=4 and lag(num3)=1 and dif(ResultDate)>=30 then num4=1;
else if num1=4 and lag(num3)^=1 and lag2(num2)=1 and
dif2(ResultDate)>=30 then num4=1;
else if num1=4 and lag(num3)^=1 and lag2(num2)^=1 and
dif3(ResultDate)>=30 then num4=1;
if num1=1 or num2=1 or num3=1 or num4=1; /*this line picks out those
desired records*/
run;
Hope this explains what i want to achieve. When the maximum duplicates
is much larger than 4, say 100, i need a more efficient way to do the
unduplication than expanding the above codes, maybe some kind of
looping which i am struggling with...kicking biting...Help is very much
appreciated!
J S Huang wrote:
> Bluej:
>
> If you provide a small sample and the desired result, it will be easier for those who like to help.
>
> J S Huang
>
>
>
> -----Original Message-----
> From: bluej <fjing11@GMAIL.COM>
> To: SAS-L@LISTSERV.UGA.EDU
> Sent: Tue, 30 May 2006 16:38:25 -0700
> Subject: help on my lengthy codes
>
>
> I have an unduplication procedure performed on a regular basis. The
> idea is to exclude a record from a patient if the time interval between
> the newly received record and the most recent one from the same patient
> (if present in database) is less than certain value, say 30 days, and
> include the newly received record into the database if otherwise. To
> this end I presort the dataset by patient name and date received
> (called ResultDate in the below codes). Below is an extract of the
> codes in the case of the maximum of duplicates being 4, that is, since
> certain predefined starting date, the maximum of records received from
> the same patient is 4:
>
> data afterundup;
> set beforeundup;
> by Name1 ResultDate;
> if first.Name1 then
> num1=0;
> num1 +1;
> if num1=2 and dif(ResultDate)>=30 then num2=1;
> if num1=3 and lag(num2)=1 and dif(ResultDate)>=30 then num3=1;
> else if num1=3 and lag(num2)^=1 and dif2(ResultDate)>=30 then num3=1;
> if num1=4 and lag(num3)=1 and dif(ResultDate)>=30 then num4=1;
> else if num1=4 and lag(num3)^=1 and lag2(num2)=1 and
> dif2(ResultDate)>=30 then num4=1;
> else if num1=4 and lag(num3)^=1 and lag2(num2)^=1 and
> dif3(ResultDate)>=30 then num4=1;
>
> It will go on depending on the maximum number of duplicates, and I
> found it rather time consuming and clumsy. Could any SAS expert give
> some pointers as to how to transform the above codes into something
> more efficient? Thanks a lot in advance!
|