Date: Sun, 14 Aug 2011 21:04:38 +0200
Reply-To: tony tony <tonysingsong@GMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: tony tony <tonysingsong@GMAIL.COM>
Subject: Re: Remove duplicates(keeping the max informaiton from other
fields)
In-Reply-To: <96A48C12697C420BB607E2735108BF29@D1871RB1>
Content-Type: text/plain; charset=ISO-8859-1
Hi Nat,
Thank you.
Your code produces the ouput
Obs code rank1 rank2
1 22
2 23 3 2
3 34
4 34 2 4
5 200 5
6 200 3
7 300 5
8 300 6 4
which has not got rid of the duplicates.I wan to get to rid off the
duplicates as well as infromation from rannk1 and rank2 where the they are
not missing.
On Sun, Aug 14, 2011 at 8:32 PM, Nat Wooding <nathani@verizon.net> wrote:
> Here is an alternative solution that involves more steps but automatically
> works with any number of rank variables. I was lazy so I used a Do Over
> instead of a fully specified Do statement.
>
> Nat Wooding
>
> Data test;
> input code rank1$ rank2$;
> cards;
> 300 5 .
> 300 6 4
> 200 5 .
> 200 . 3
> 34 . .
> 34 2 4
> 22 . .
> 23 3 2
> ;
> run;
>
> Data Alternate;
> set test;
> Drop rank: ;
> Array Bustup $ RANK:;
> do over Bustup;* old code style no longer documented;
> var = vname( bustup );
> value = bustup;
> output;
> end;
> run;
> Proc Sort data = Alternate nodupkey;
> by code var descending value;
> run;
> Data Alternate;
> set Alternate;
> by code var;
> if first.var ;
> drop = _name_
> run;
> Proc Transpose data = alternate out = alternate ;
> var value;
> by code;
> id var;
> run;
>
> -----Original Message-----
> From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
> Arthur
> Tabachneck
> Sent: Sunday, August 14, 2011 1:49 PM
> To: SAS-L@LISTSERV.UGA.EDU
> Subject: Re: Remove duplicates(keeping the max informaiton from other
> fields)
>
> Tony,
>
> Then use the code that Joe suggested. If you wanted to do it in a
> datastep,
> a little more code is needed. E.g.,
>
> Data test;
> input code rank1$ rank2$;
> cards;
> 300 5 .
> 300 6 4
> 200 5 .
> 200 . 3
> 34 . .
> 34 2 4
> 22 . .
> 23 3 2
> ;
> run;
>
>
> proc sort data=test out=noduplicates;
> by code;
> run;
>
> data noduplicates (drop last:);
> set noduplicates;
> by code;
> last_rank1=lag(rank1);
> last_rank2=lag(rank2);
> if not first.code then do;
> rank1=strip(max(rank1,last_rank1));
> rank2=strip(max(rank2,last_rank2));
> end;
> if last.code then output;
> run;
>
> HTH,
> Art
> ------
> On Sun, 14 Aug 2011 19:29:22 +0200, tony tony <tonysingsong@GMAIL.COM>
> wrote:
>
> >Hi Arthur,
> >
> >Thank you very much .
> >
> >sorry for the incomplet message i sent you in my last my e-mail.
> >
> >The above code produces the oupt given below
> >
> > Obs code rank1 rank2
> > 1 22
> > 2 23 3 2
> > 3 34 2 4
> > 4 200 3
> > 5 300 6 4
> >
> >But what I want is
> >
> > Obs code rank1 rank2
> > 1 22
> > 2 23 3 2
> > 3 34 2 4
> > 4 200 5 3
> > 5 300 6 4
> >
> >That is 5 within rank 1 also .
> >
> >
> >
> >Thank you once again .
> >
> >
> >
> >Tony
> >
> >
> >On Sun, Aug 14, 2011 at 7:22 PM, tony tony <tonysingsong@gmail.com>
> wrote:
> >
> >>
> >>
> >> On Sun, Aug 14, 2011 at 6:40 PM, Arthur Tabachneck
> <art297@rogers.com>wrote:
> >>
> >>> Tony,
> >>>
> >>> You didn't show the resulting file that you DO want, so one can only
> >>> guess.
> >>> Are you only asking about how to accomplish something like the
> following:
> >>>
> >>> Data test;
> >>> input code rank1$ rank2$;
> >>> cards;
> >>> 300 5 .
> >>> 300 6 4
> >>> 200 5 .
> >>> 200 . 3
> >>> 34 . .
> >>> 34 2 4
> >>> 22 . .
> >>> 23 3 2
> >>> ;
> >>> run;
> >>>
> >>> proc sort data=test out=noduplicates nodupkey;
> >>> by code rank1 rank2;
> >>> run;
> >>>
> >>> proc print data=noduplicates;
> >>> run;
> >>>
> >>> HTH,
> >>> Art
> >>> ------
> >>> On Sun, 14 Aug 2011 12:28:11 -0400, SUBSCRIBE SAS-L Anonymou
> >>> <tonysingsong@GMAIL.COM> wrote:
> >>>
> >>> >Hi, I am able to remove duplicates in the following data set but then
> I
> >>> also
> >>> >want to keep those code observations which contain rank1 and rank2
> >>> >information.To a greater extent option "descending" helps me but then
> for
> >>> >code field observation containg 200, I lose the informaiton from
> >>> rank1(i.e
> >>> >it gives me blank for it). So in nut shell how can I remove
> duplicate(in
> >>> >terms of code) by keeping maximum information in both other coulumns.
> >>> >Any help in this regard wouls highly be appreciated;
> >>> >
> >>> >With lots of thanks in advance.
> >>> >
> >>> >Regards,
> >>> >Tony
> >>> >
> >>> >
> >>> >Data test;
> >>> >input code rank1$ rank2$;
> >>> >cards;
> >>> >300 5 .
> >>> >300 6 4
> >>> >200 5 .
> >>> >200 . 3
> >>> >34 . .
> >>> >34 2 4
> >>> >22 . .
> >>> >23 3 2
> >>> >;
> >>> >run;
> >>> >proc sort data=test ;
> >>> >by descending rank1;
> >>> >run;
> >>> >proc sort data=test ;
> >>> >by descending rank2;
> >>> >run;
> >>> >proc sort data=test out=noduplicates nodupkey;
> >>> >by code;
> >>> >run;
> >>> >proc print data=noduplicates;
> >>> >run;
> >>>
> >> Obs code rank1 rank2
> >> 1 22
> >> 2 23 3 2
> >> 3 34 2 4
> >> 4 200 3
> >> 5 300 6 4
> >> What I want is
> >>
> >> Obs code rank1 rank2
> >> 1 22
> >> 2 23 3 2
> >> 3 34 2 4
> >> 4 200 3
> >> 5 300 6 4
> >>
> >>
> >>
>
|