Date: Sun, 20 Aug 2006 22:57:17 -0700
Reply-To: Abhijat <ursabhijat@GMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Abhijat <ursabhijat@GMAIL.COM>
Organization: http://groups.google.com
Subject: Re: URGENT: PROC SORT TO PROC SQL CONVERSION
Content-Type: text/plain; charset="iso-8859-1"
Hi All,
Thanks a tonne for the responses.
Let me try clarify as much as I can so that I may get more customized
answers :-) !
The code has to be in PROC SQL. I m using SAS 9.1.
Regarding Kenneth's question, I m afraid we have more variables that
mere A,B,C,D,E . We have about 20 extra variables!
I request you all SAS experts to keep your suggestions and answers
flowing in.
The question is very simple and is once again stated here:
*******************************************************************
Please suggest me a proc sql equivalent for the below mentioned proc
sort code in SAS 9.1.
PROC SORT DATA = abc.xyz OUT = abc.pqr NODUPKEY;
BY A B C D E;
RUN;
~ xyz has about 25 variables.
Thanks for the help.
Regards,
Abhijat.
ChrisW75 wrote:
> Yes, I'm aware that when Version 9 came out they included the DUPOUT
> option, which is a good option. My method though puts but the excluded
> dupliates AND the kept duplicates into the same table so that they can
> be examined side by side. This is often very useful when cleansing
> data.
>
>
>
> Gadi Ben-Avi wrote:
> > Chris,
> >
> > At a certain point an options was added to proc sort that lets you get the
> > 'deduped' observations in a separate dataset.
> >
> > The DUPOUT option will create a dataset with the deduped observations.
> >
> > Gadi
> > "ChrisW75" <9Squirrels@gmail.com> wrote in message
> > news:1156118064.027338.32280@75g2000cwc.googlegroups.com...
> > > Best practice for deduping is to not use either Proc SQL or Proc Sort I
> > > reckon. This is my method:-
> > >
> > > proc sort data=dset1;
> > > by A B C D E;
> > > run;
> > >
> > > data dset2
> > > dupes;
> > > set dset1;
> > > by A B C D E;
> > >
> > > if first.E then output dset2;
> > > if not(first.E and Last.E) then output dupes;
> > > run;
> > >
> > > This gives you a way to look at your duplicate records after the
> > > dedupe.
> > > ChrisW75
> > >
> > > Scott Bass wrote:
> > >> I agree with Ken, see this thread in the archive (read the entire
> > >> threaed):
> > >> http://www.listserv.uga.edu/cgi-bin/wa?A2=ind0605A&L=sas-l&D=0&P=1162
> > >>
> > >> <kenneth_m_lin@sbcglobal.net> wrote in message
> > >> news:rm1Gg.9471$%j7.1256@newssvr29.news.prodigy.net...
> > >> > Unless your dataset only contains the fields A through E, it'd be
> > >> > difficult to perform de-duping in PROC SQL.
> > >> >
> > >> > "Abhijat" <ursabhijat@gmail.com> wrote in message
> > >> > news:1156068234.842470.228350@i42g2000cwa.googlegroups.com...
> > >> >> Hi All,
> > >> >> Please someone suggest me a proc sql equivalent for the below
> > >> >> mentioned
> > >> >> proc sort code.
> > >> >>
> > >> >>
> > >> >> PROC SORT DATA = abc.xyz OUT = abc.pqr NODUPKEY;
> > >> >> BY A B C D E;
> > >> >> RUN;
> > >> >>
> > >> >>
> > >> >> Many Thanks ,
> > >> >> Abhijat.
> > >> >>
> > >> >
> > >> >
> > >
|