Date: Sun, 7 Mar 2004 18:49:09 -0500
Reply-To: Quentin McMullen <quentin_mcmullen@BROWN.EDU>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Quentin McMullen <quentin_mcmullen@BROWN.EDU>
Subject: Re: detecting changes in data
On Sun, 7 Mar 2004 21:31:31 +0000, Ian Whitlock <iw1junk@COMCAST.NET>
wrote:
>Quentin,
>
>If I understand the problem, you have ID and RACE at two times. You want
to
>report:
>
> Lost ID
> Lost Race within ID
> Gained or changed Race within ID
>
>Here is one place that I agree with Sigurd that the NOT IN construct is
to be
>preferred over EXCEPT. Sometimes I argue for EXCEPT since it is more
efficient
>in SAS. But here I want the flexibility unless data quantity becomes an
issue.
>
>Here is my code.
>
>data mon2 ;
> input id $ race $ ;
>cards ;
>1 W
>2 A
>3 W
>3 B
>4 B
>;
>data mon3 ;
> input id $ race $ ;
>cards ;
>1 W
>2 B
>3 W
>3 B
>3 H
>5 A
>;
>proc sql ;
> select "lost id " as prob
> , 2 as month
> , id
> , race
> from mon2
> where id not in ( select id from mon3 )
> union corr all
> select "lost race " as prob
> , 2 as month
> , id
> , race
> from mon2
> where id||race not in ( select id||race from mon3 )
> union corr all
> select "gained or changed race" as prob
> , 3 as month
> , id
> , race
> from mon3
> where id||race not in ( select id||race from mon2 )
> and id in ( select id from mon2 )
>
> order by month, id, race
> ;
>quit ;
>
>
>There is some duplication in that a lost ID always implies a lost race.
>However, a little tweaking to remove that should be easy if required. I
did
>not distinguish changes from gains. This might be done by asking whether
the
>total number of records within an ID increased. Yes, then something must
be
>gained; no something must have changed. This style is more extravagant in
CPU
>time but it pays in clarity of what you are getting, and can be easily
modified.
>
>What is the problem with PROC COMPARE after reducing to one record per
ID? It is efficient, but leaves the reader the task of identifying the
problem. So the real question is - how much work do you want to leave to
the reader in sorting out race problems. The classic programmer response -
leave anything that makes my code shorter and easier, and that I can get
away with.
>
>Ian_Whitlock@comcast.net
Thanks much Ian!
Looks nifty. Harry Droogendyk was also kind enough to send me a private
note with a SQL exclude solution. As I mentioned to him, despite my
affinity for the data step, I'm always impressed by how clearly SQL code
reads. This is a good example of how a non SQL-head, and perhaps even a
non-programmer, would be able to quickly understand the logic.
Thanks again,
--Quentin
|