LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (December 2005, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Fri, 9 Dec 2005 14:46:34 +0000
Reply-To:     toby dunn <tobydunn@HOTMAIL.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         toby dunn <tobydunn@HOTMAIL.COM>
Subject:      Re: How to find duplicate records
Comments: To: Number_42@CARAMAIL.COM
In-Reply-To:  <1134126512.559511.265050@z14g2000cwz.googlegroups.com>
Content-Type: text/plain; format=flowed

Anonymous,

Method #1)

data WithDups ; input a $ b $ c $ ; datalines ; a1 b1 cat a1 b2 dog a1 b1 . a2 b1 dog ; run ;

proc sort data = WithDups ; by A B ; run ;

data Dups ; set WithDups ; by A B ;

if not (first.B and Last.B) then output ; run ;

proc print data = Dups ; run ;

Method #2) <Me Being Lazy and really liking V9 tricks>

data WithDups ; input a $ b $ c $ ; datalines ; a1 b1 cat a1 b2 dog a1 b1 . a2 b1 dog ; run ;

proc sort data = WithDups dupout = DupsB nodupkey ; by A B ; run ;

proc print data = DupsB ; run ;

This will give you a data set with only one record but it is the dup, the first occurance of the will be kept in the original sorted data set,

Toby Dunn

From: Anonymous user <Number_42@CARAMAIL.COM> Reply-To: Anonymous user <Number_42@CARAMAIL.COM> To: SAS-L@LISTSERV.UGA.EDU Subject: How to find duplicate records Date: Fri, 9 Dec 2005 03:08:32 -0800 Hello,

i have no question this time. I found out myself a solution, but since the problem of finding duplicates is so basic and i have not found how to do it on this group, i post my source code for the next one:

option mprint;

/* ** tested on SAS 8.02 ** Description: ** - set_to_search: the set where to search for duplicates ** - key: the key to search duplicates for, can be a list of column names!!!! ** - duplicates_set: the result set, to put only the duplicates ** ** "key" can be composed of several columns (no first/last) ** "set_to_search" does not need a numeric column (like for proc means) */ %macro find_duplicates(set_to_search, key, duplicates_set); %local sorted_set filter_set; %local i key_column;

%let sorted_set = __sorted_set; %let filter_set = __filter_set;

proc sort data = &set_to_search out = &sorted_set; by &key; run;

data &filter_set (keep=&key); set &sorted_set;

if %let i = 1; %let key_column = %scan(&key, &i); %do %while ("&key_column" ne "");

/* first test ? */ %if &i > 1 %then %do; and %end;

(&key_column = lag(&key_column))

%let i = %eval(&i + 1); %let key_column = %scan(&key, &i); %end; then output; run;

/* remove duplicates "key of duplicates" */ proc sort data = &filter_set nodup; by &key; run;

/* get the data with keys */ data &duplicates_set; merge &sorted_set (in=in1) &filter_set (in=in2); by &key;

if in1 and in2; run;

proc datasets nolist; delete &sorted_set; delete &filter_set; run;

%mend find_duplicates;

/* what i have */ data data; input a $ b $ c $; datalines; a1 b1 cat a1 b2 dog a1 b1 . a2 b1 dog ;

/* what i want */ /* a b c

a1 b1 cat a1 b1 . */

%find_duplicates(data, a b, dups);

HTH, regards


Back to: Top of message | Previous page | Main SAS-L page