| Date: | Tue, 2 Feb 2010 04:55:05 -0800 |
| Reply-To: | Arthur Tabachneck <art297@NETSCAPE.NET> |
| Sender: | "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> |
| From: | Arthur Tabachneck <art297@NETSCAPE.NET> |
| Subject: | Re: duplicates |
|
| In-Reply-To: | <a0166dd6-4200-47a4-a588-2c0c7f612e11@c4g2000yqa.googlegroups.com> |
| Content-Type: | text/plain; charset=ISO-8859-1 |
Jeli,
I'm not sure how this will help with your data problem, but you can
accomplish what you asked with something like:
data want (drop=count);
set have;
by id;
if first.id then do;
count=0;
new_id=id;
end;
else do;
count+1;
new_id=catx('-',id,count);
end;
run;
HTH,
Art
-----------
On Feb 2, 5:23 am, Jeli <jeli0...@hotmail.co.uk> wrote:
> I have a dataset with thousands of records and variables.
>
> I have tried to select only destinct records, however some of then
> might have a different industry or different spelling of the name.
> This is clearly a data quality issue but what I need to do is have an
> intrim solution which will fix this until I can sort out the source
> data.
>
> i.e.
>
> ID NAME PD Industry Country
> 123456 Record 1 0.25 Medical USA
> 123456 Record 1 0.25 Cable USA
> 999403 Record 99 0.25 Medical USA
> 999403 Record 97 0.25 Medical USA
> 999403 Record 0.25 Medical USA
>
> What I would like to do is if a record is duplicated keep the first
> record as it is, but change the duplicate ID so that I know this is
> the first duplicate or second etc so the new data would look like:
>
> NEW ID ID NAME PD Industry Country
> 123456 123456 Record 1 0.25 Medical USA
> 123456-1 123456 Record 1 0.25 Cable USA
> 999403 999403 Record 99 0.25 Medical USA
> 999403-1 999403 Record 97 0.25 Medical USA
> 999403-2 999403 Record 0.25 Medical USA
>
> can this be done?
|