Date: Thu, 10 Jan 2008 01:10:35 +0000
Reply-To: Paul Dorfman <sashole@BELLSOUTH.NET>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Paul Dorfman <sashole@BELLSOUTH.NET>
Subject: Re: Help with data manipulation. Thanks!!!
Not sure it is applicable here since the option affects only the NODUPREC behavior. I know Ron mentioned it, however I would not recommend using it under any circumstances - NODUPKEY with key variables followed by _ALL_ takes care of eliminating full dupes, ordering by wanted key variables, and eliminating the idiosyncrasies accompanying NODUPREC from its [far from immaculate] inception.
But perhaps you were as well aiming at an option ensuring that the relative order of the unsorted records is not altered after the sorting, i.e. the EQUALS sort option, as in
proc sort EQUALS nodupkey data = a out = b ;
by x y z ;
Although it is usually a default, exceptions may occur due to overzealous emphasis on efficiency on part of whoever may have done the configuration (NOEQUALS, generally speaking, saves run time).
On the different note, let me piggyback on this post and remark to Ron Fehd that the extra DO UNTIL (EndoFile) in his DoW-loop code seems to be a superfluity; mere
DATA FirstOccurrence ;
do until (first.J) ;
set WhatIwant end = EndoFile ;
by Id I J ;
will do just as well for the puspose...
-------------- Original message ----------------------
From: Jack Hamilton <jfh@STANFORDALUMNI.ORG>
> Also, you want to make sure that you have set
> options sortdup=logical;
> On Wed, 9 Jan 2008 17:33:08 -0500, "Fehd, Ronald J. (CDC/CCHIS/NCPHI)"
> <rjf2@CDC.GOV> said:
> > > From: olivesec...@gmail.com wrote:
> > > > I have a dataset which include 3 variables: id, i, j. For
> > > each id, the
> > > > values for i and j may be 1~4. It is required that the
> > > combination of
> > > > these 3 variables to be unique, which means, for example, the
> > > > combination of id=2 i=3 j=1 can be on the dataset for only one time.
> > > > If the combination shows up for two or more times on the datset, I
> > > > need keep only the first obs this combination shows up and remove
> > > > other obs it shows up.
> > > >
> > > > To solve the problem, I sort the dataset according to id i j. Then I
> > > > only need to do iterations within each id. But I do not
> > > know how to do
> > > > it by SAS. Can any people help me do it?
> > > >
> > > > Thanks a lot!!!
> > > >
> > > > Olive
> > A Helpful Reader wrote:
> > > Try This!
> > >
> > > proc sort data=one out=WhatIWant nodupkey;
> > > by id i j;
> > > run;
> > Warning: RTFM!
> > you want to read the documentation on sort carefully
> > to see if nodupkey does indeed
> > save the -first- occurrence of your row.
> > compare: noduprecs;
> > if not:
> > proc sort data = one
> > out = WhatIWant;
> > by id i j;
> > DATA FirstOccurrence;
> > do until(EndoFile);
> > do until(first.J);
> > set WhatIwant end = EndoFile;
> > by Id I J;
> > end;
> > output;
> > end;
> > stop;
> > This "do until(first.by-var)" is known as a Do-Whitlock loop (DOW)
> > The Do-Whitlock loop is described in my paper:
> > Do Which? Loop, Until or While?
> > This file contains the examples in the paper:
> > http://www.sascommunity.org/wiki/Image:Fehd-Do-Which-Loop-Until-or-While
> > .zip
> > Ron Fehd the macro maven CDC Atlanta GA USA RJF2 at cdc dot gov
> Jack Hamilton
> Sacramento, California