Date: Fri, 3 Oct 2008 15:54:07 -0500
Reply-To: Mary <mlhoward@avalon.net>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Mary <mlhoward@AVALON.NET>
Subject: Re: Comparing 2 data sets (easy one again!)
Content-Type: text/plain; format=flowed; charset="iso-8859-1";
reply-type=original
Yes, clearly if one has too many results then my code could have also been
modified to put the results into a data set:
proc sql noprint;
create table ids as
select ID
from set1
where ID not in
(select ID from set2);
quit;
But yours is another way to do it; I wonder if it is faster since it needs
to do a left join and that's generally slower, but it is certainly an
alternative. The user was just asking for ID's but you could select all the
data with the *.
Thanks.
-Mary
----- Original Message -----
From: Toby Dunn
To: SAS-L@LISTSERV.UGA.EDU
Sent: Friday, October 03, 2008 3:32 PM
Subject: Re: Comparing 2 data sets (easy one again!)
Mary,
The method you posted will work only for small data sets where the unique
number of ID's will fit into the maximum length of a macro variable.
Proc Compare comes to mind as a solution.
If one wanted to do this via SQL then try:
Proc SQL ;
Create Table Need As
Select One.*
From One As One
Left Join
Two As Two
On One.ID = Two.ID
Where Missing( Two.ID ) ;
Quit ;
The above is called an Anti-Join and yes the Where clause is correct.
Actually one could have used a Having but in some testing I did the other
day it performed slighty worse than the Where clause. Either way they
both produce the same results.
On Fri, 3 Oct 2008 10:16:14 -0500, Mary <mlhoward@AVALON.NET> wrote:
>Using the Not in with a select on on the other data set can do this:
>
>proc sql noprint;
>select ID into: VarList
>separated by " "
>from set1
>where ID not in
>(select ID from set2);
>quit;
>%put &varlist;
>
>-Mary
>
>
>----- Original Message -----
>From: Mike Smith
>To: SAS-L@LISTSERV.UGA.EDU
>Sent: Friday, October 03, 2008 9:58 AM
>Subject: Comparing 2 data sets (easy one again!)
>
>
>Hi all SAS experts,
>
>I am trying to manage two big data sets and trying to figure out which
>caseids of one dataset could be found /not found in the other dataset.
>
>Is there like a procedure aside from just sorting both datasets and
>figuring out which ids arent there?
>
>
>Thanks a lot!