LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (June 2006, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Mon, 19 Jun 2006 04:09:10 +0100
Reply-To:   tenny kurian <tennykurian@YAHOO.CO.IN>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   tenny kurian <tennykurian@YAHOO.CO.IN>
Subject:   Re: Reading unique records
Comments:   To: Arthur Tabachneck <art297@NETSCAPE.NET>
In-Reply-To:   <200606190301.k5J0ncNB001557@mailgw.cc.uga.edu>
Content-Type:   text/plain; charset=iso-8859-1

hi Arth,

Thank you arthur.

i tried that, but if there is a missing value in the any of the sort variable, it goes to top. IT SHOULD NOT HAPPEN. i dont want to change the order of the records in the external flat file.

i tried even noduprecs option. in that case also, duplicate records are removed after sorting. after sorting takes place, records are getting re arranged in order WHICH I DONT WANT.

Thank You Tenny Arthur Tabachneck <art297@NETSCAPE.NET> wrote: Tenny,

Why not just sort the file, using the nodupkey option, and all variables representing the by condition? For example,

proc sort data=have out=want nodupkey; by FIRSTNAME MIDDLENAME LASTNAME DOB BENSTARTDT BENENDDT; run;

Art ----------- On Mon, 19 Jun 2006 03:51:25 +0100, tenny kurian wrote:

>Hi Kevin, > > Thank you for your response, > > All the feilds in a record should be same to make it unique. > it is not a 2 or 3 common fields. All the information contained in the record are same, then i need to eliminate all the duplicate ones and keep one record. > > Say for eg. A record contains following info. > FIRSTNAME > MIDDLENAME > LASTNAME > DOB > BENSTARTDT > BENENDDT > > > if all the values contained in the above variables are repeated, then delete the duplicate ones. > > i am using Unix - SAS v8 > > Thank You, > Tenny. >Kevin Roland Viel wrote: > On Sun, 18 Jun 2006, tenny kurian wrote: > >> Hi, >> >> i would like to get help for the following problem. >> >> i am getting input records from a flat file. >> >> >> Each line in the external flat file corresponds to one record. >> i am reading the external flat file using infile statement and using coloumn pointers >> LRECL is 300 >> there are some duplicates records in the flat file. it need not to be in sequence. >> i want to read only unique records. that means if there is a replica of a record , then i want to read only the first occurence of that record. >> >> It would be really helpful if someone can help in resolving this issue. > >Tenny, > >You are not quite clear. You need to read a record to determine whether >it is unique. Is this a problem of a large flat file that you are trying >to make more efficient being reading on part of the record if it is not >another with the same identifier has already been read? > >Also, you should state which version of SAS you have and what >combination of fields make the record unique. > >I have assumed that only part of the record determines its ID and >whether it is unique. I have also taken advantage of the HASH object >available in v9: > >data _null_ ; > >file "C:\unique.txt" ; > >do x = 1 to 10 ; >do y = 1 to 2 ; >put x y ; >end ; >end ; >run ; > >data unique ( keep = ID y ) ; > >if _n_ = 1 then >do ; >dcl hash unique() ; >unique.Definekey ( "ID" ) ; >unique.Definedone ( ) ; >end ; > >infile "C:\unique.txt" ; >input ID y ; > >__rc = unique.CHECK() ; > >if __rc ne 0 then >do ; >output ; >__rc = unique.ADD() ; >end ; > >run ; > >proc print data = unique ; >run ; > >You could use the entire line as the ID, but you could start running into >RAM limitations and start paging. I personally am very hesitant to not >make a dataset from the original file and subset that. In the very least, >I would count the number of duplicates I have and write that to the log, >which I keep. > >This method is rather robust to the nature of the ID. If you do not have >v9, then you can accomplish the same thing using an array, but the index >can only be a number (in sharp contrast to the key of the hash). > >HTH, > >Kevin > >Kevin Viel >Department of Epidemiology >Rollins School of Public Health >Emory University >Atlanta, GA 30322 > > > >--------------------------------- > Yahoo! India Answers: Share what you know. Learn something new Click here > Send free SMS to your Friends on Mobile from your Yahoo! Messenger Download now

--------------------------------- Yahoo! India Answers: Share what you know. Learn something new Click here Send free SMS to your Friends on Mobile from your Yahoo! Messenger Download now


Back to: Top of message | Previous page | Main SAS-L page