LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous (more recent) messageNext (less recent) messagePrevious (more recent) in topicNext (less recent) in topicPrevious (more recent) by same authorNext (less recent) by same authorPrevious page (February 2010, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Tue, 2 Feb 2010 04:55:05 -0800
Reply-To:   Arthur Tabachneck <art297@NETSCAPE.NET>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   Arthur Tabachneck <art297@NETSCAPE.NET>
Subject:   Re: duplicates
Comments:   To: Jeli <jeli0703@hotmail.co.uk>
In-Reply-To:   <a0166dd6-4200-47a4-a588-2c0c7f612e11@c4g2000yqa.googlegroups.com>
Content-Type:   text/plain; charset=ISO-8859-1

Jeli,

I'm not sure how this will help with your data problem, but you can accomplish what you asked with something like:

data want (drop=count); set have; by id; if first.id then do; count=0; new_id=id; end; else do; count+1; new_id=catx('-',id,count); end; run;

HTH, Art ----------- On Feb 2, 5:23 am, Jeli <jeli0...@hotmail.co.uk> wrote: > I have a dataset with thousands of records and variables. > > I have tried to select only destinct records, however some of then > might have a different industry or different spelling of the name. > This is clearly a data quality issue but what I need to do is have an > intrim solution which will fix this until I can sort out the source > data. > > i.e. > > ID NAME PD Industry Country > 123456 Record 1 0.25 Medical USA > 123456 Record 1 0.25 Cable USA > 999403 Record 99 0.25 Medical USA > 999403 Record 97 0.25 Medical USA > 999403 Record 0.25 Medical USA > > What I would like to do is if a record is duplicated keep the first > record as it is, but change the duplicate ID so that I know this is > the first duplicate or second etc so the new data would look like: > > NEW ID ID NAME PD Industry Country > 123456 123456 Record 1 0.25 Medical USA > 123456-1 123456 Record 1 0.25 Cable USA > 999403 999403 Record 99 0.25 Medical USA > 999403-1 999403 Record 97 0.25 Medical USA > 999403-2 999403 Record 0.25 Medical USA > > can this be done?


Back to: Top of message | Previous page | Main SAS-L page