```Date: Fri, 9 Dec 2005 14:46:34 +0000 Reply-To: toby dunn Sender: "SAS(r) Discussion" From: toby dunn Subject: Re: How to find duplicate records Comments: To: Number_42@CARAMAIL.COM In-Reply-To: <1134126512.559511.265050@z14g2000cwz.googlegroups.com> Content-Type: text/plain; format=flowed Anonymous, Method #1) data WithDups ; input a \$ b \$ c \$ ; datalines ; a1 b1 cat a1 b2 dog a1 b1 . a2 b1 dog ; run ; proc sort data = WithDups ; by A B ; run ; data Dups ; set WithDups ; by A B ; if not (first.B and Last.B) then output ; run ; proc print data = Dups ; run ; Method #2) data WithDups ; input a \$ b \$ c \$ ; datalines ; a1 b1 cat a1 b2 dog a1 b1 . a2 b1 dog ; run ; proc sort data = WithDups dupout = DupsB nodupkey ; by A B ; run ; proc print data = DupsB ; run ; This will give you a data set with only one record but it is the dup, the first occurance of the will be kept in the original sorted data set, Toby Dunn From: Anonymous user Reply-To: Anonymous user To: SAS-L@LISTSERV.UGA.EDU Subject: How to find duplicate records Date: Fri, 9 Dec 2005 03:08:32 -0800 Hello, i have no question this time. I found out myself a solution, but since the problem of finding duplicates is so basic and i have not found how to do it on this group, i post my source code for the next one: option mprint; /* ** tested on SAS 8.02 ** Description: ** - set_to_search: the set where to search for duplicates ** - key: the key to search duplicates for, can be a list of column names!!!! ** - duplicates_set: the result set, to put only the duplicates ** ** "key" can be composed of several columns (no first/last) ** "set_to_search" does not need a numeric column (like for proc means) */ %macro find_duplicates(set_to_search, key, duplicates_set); %local sorted_set filter_set; %local i key_column; %let sorted_set = __sorted_set; %let filter_set = __filter_set; proc sort data = &set_to_search out = &sorted_set; by &key; run; data &filter_set (keep=&key); set &sorted_set; if %let i = 1; %let key_column = %scan(&key, &i); %do %while ("&key_column" ne ""); /* first test ? */ %if &i > 1 %then %do; and %end; (&key_column = lag(&key_column)) %let i = %eval(&i + 1); %let key_column = %scan(&key, &i); %end; then output; run; /* remove duplicates "key of duplicates" */ proc sort data = &filter_set nodup; by &key; run; /* get the data with keys */ data &duplicates_set; merge &sorted_set (in=in1) &filter_set (in=in2); by &key; if in1 and in2; run; proc datasets nolist; delete &sorted_set; delete &filter_set; run; %mend find_duplicates; /* what i have */ data data; input a \$ b \$ c \$; datalines; a1 b1 cat a1 b2 dog a1 b1 . a2 b1 dog ; /* what i want */ /* a b c a1 b1 cat a1 b1 . */ %find_duplicates(data, a b, dups); HTH, regards ```

Back to: Top of message | Previous page | Main SAS-L page