Date: Wed, 25 Oct 2006 15:46:41 -0700
Reply-To: "Choate, Paul@DDS" <pchoate@DDS.CA.GOV>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Choate, Paul@DDS" <pchoate@DDS.CA.GOV>
Subject: Re: Difference between nodup and nodupkey in sort procedure
In-Reply-To: <FA0F518F85A0CF40891CB7D09DA02DF67484F2@NCT0010CP3MB04.ds.irsnet.gov>
Content-Type: text/plain; charset="us-ascii"
Libin-
In v9 there is DUPOUT= option on the PROC SORT statement: "Specify the
output data set to which duplicate observations are written"
hth
Paul Choate
DDS Data Extraction
(916) 654-2160
-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Xu
Libin
Sent: Wednesday, October 25, 2006 1:43 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Re: Difference between nodup and nodupkey in sort procedure
I tried to figure out a way to isolate those deleted cases so that I can
compare, but don't know how this can be done.
Libin
-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
Arthur Tabachneck
Sent: Wednesday, October 25, 2006 4:05 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Re: Difference between nodup and nodupkey in sort procedure
Libin,
Can you post some sample data where this happens? If the file isn't
sorted by all variables I can see nodup NOT getting rid of duplicates,
but
I've never seen it delete records that actually don't match on all
variables.
Art
--------
On Wed, 25 Oct 2006 15:29:05 -0400, Xu Libin <Libin.Xu@IRS.GOV> wrote:
>I thought that nodup option in proc sort get rid of duplicate records
>and nodupkey get rid of duplicates of the by variable. When I ran the
>below syntax,
>
>Proc sort data=old out=new nodup;
> By id;
>Run;
>
>About 760 cases were deleted. But I was told that they are not
duplicate
>records. At least one variable has different values. Can anyone on the
>list provide an explanation for this? Thanks.
>
>Libin