Date: Thu, 4 Mar 2004 12:46:39 -0500
Reply-To: "Droogendyk, Harry" <Harry.Droogendyk@CIBC.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Droogendyk, Harry" <Harry.Droogendyk@CIBC.COM>
Subject: Re: detecting changes in data
Content-Type: text/plain; charset="iso-8859-1"
Sorry, details.... :-)
%macro comp(prev=,curr=);
proc sql;
create table diffs&curr as
select id, race from month&prev
except
select id, race from month&curr
union
select id, race
from month&curr
where race not in ( select p.race
from month&prev p,
month&curr c
where p.id = c.id
)
;
quit;
%if &sqlobs = 0 %then
%put All is cool;
%else %do;
%put Something bad changed;
proc print data=diffs&curr;
run;
%end;
%mend comp;
%comp(prev=1,curr=2);
%comp(prev=2,curr=3);
-----Original Message-----
From: Quentin McMullen [mailto:quentin_mcmullen@BROWN.EDU]
Sent: March 4, 2004 12:23 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Re: detecting changes in data
Thanks Harry, that's nifty.
Unfortunately it doesn't do as much as I need to do. (and I don't think a
simple proc compare can do it either).
As I read below, it basically grabs stuff from from month2 and then
subsets out any records that don't match to records in month3. The tricky
issue is that it is okay for a new patient to appear in month3. But it is
NOT okay for a patient which existed in month2 to have an additional race
code added in month3.
That is my patient #3 who goes from:
>3 W
>3 B
To:
>3 W
>3 B
>3 H
So my comparison needs to allow new patients to be added, but not new
races for existing patients.
But thanks for steering me toward EXCEPT. I think it could be an
important piece of a solution. I think it's actually pretty close, maybe
just an issue of nesting the EXCEPT as a subquery....
Thanks,
--Quentin
On Thu, 4 Mar 2004 12:07:20 -0500, Droogendyk, Harry
<Harry.Droogendyk@CIBC.COM> wrote:
>The EXCEPT operator in SQL is helpful for these types of comparisons:
>
>%macro comp(prev=,curr=);
>
> proc sql;
> create table diffs&curr as
> select id, race from month&prev
> except
> select id, race from month&curr
> ;
> quit;
> %if &sqlobs = 0 %then
> %put All is cool;
> %else
> %put Something bad changed;
>%mend comp;
>
>%comp(prev=1,curr=2);
>%comp(prev=2,curr=3);
>
>-----Original Message-----
>From: Quentin McMullen [mailto:quentin_mcmullen@BROWN.EDU]
>Sent: March 4, 2004 11:58 AM
>To: SAS-L@LISTSERV.UGA.EDU
>Subject: detecting changes in data
>
>
>Hi All,
>
>I've got a situation where I receive demographic data every month as a
>study progresses and patients are accrued. The file has one record per
>patient-race (e.g. a biracial patient will have 2 records). It is
>acceptable for the study to add patients over time. However, it is
>unacceptable for a patient to be deleted from the dataset or a patient's
>race to change. Note that a change in patient race could mean a patient
>goes from being white to being black, or goes from being white to being
>white and black, or goes from being white and black to being white.
>
>Below is play data:
>
>data month1;
> input id race $;
> cards;
>1 W
>2 A
>3 W
>3 B
>;
>run;
>
>data month2;
> input id race $;
> cards;
>1 W
>2 A
>3 W
>3 B
>4 B
>;
>run;
>
>data month3;
> input id race $;
> cards;
>1 W
>2 B
>3 W
>3 B
>3 H
>5 A
>;
>run;
>
>From month 1 to month 2, the study accrued a new case, and other data
>remained the same. No problems.
>
>From month 2 to month 3, there were 3 suspicious changes I need to
>identify:
>1) Patient 2 went from Asian to Black
>2) Patient 3 went from white/black to white/black/hispanic
>3) Patient 4 disappeared from the data completely
>
>So the tricky case is when the patient has multiple races (Patient #3). I
>was wondering if folks had interesting ways to approach such a problem?
>Feels like there might be a good SQL solution?
>
>My current thought is to just collapse the data to one record per patient
>and have a single race variable containing all the race codes, and do a
>PROC COMPARE from that (code below). But I'd be interested in other
ideas.
>
>data month2r (keep=id racelist);
> set month2;
> retain racelist;
> by id;
> length racelist $10;
> if first.id then racelist=race;
> else racelist=trim(left(racelist))||','||race;
> if last.id;
>run;
>
>data month3r (keep=id racelist);
> set month3;
> retain racelist;
> by id;
> length racelist $10;
> if first.id then racelist=race;
> else racelist=trim(left(racelist))||','||race;
> if last.id;
>run;
>
>proc compare base=month2r compare=month3r listbaseobs;
> id id;
>run;
>
>Kind Regards,
>--Quentin
|