|Date: ||Mon, 19 Jun 2006 04:09:10 +0100|
|Reply-To: ||tenny kurian <tennykurian@YAHOO.CO.IN>|
|Sender: ||"SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>|
|From: ||tenny kurian <tennykurian@YAHOO.CO.IN>|
|Subject: ||Re: Reading unique records|
|Content-Type: ||text/plain; charset=iso-8859-1|
Thank you arthur.
i tried that, but if there is a missing value in the any of the sort variable, it goes to top. IT SHOULD NOT HAPPEN. i dont want to change the order of the records in the external flat file.
i tried even noduprecs option. in that case also, duplicate records are removed after sorting. after sorting takes place, records are getting re arranged in order WHICH I DONT WANT.
Arthur Tabachneck <art297@NETSCAPE.NET> wrote:
Why not just sort the file, using the nodupkey option, and all variables
representing the by condition? For example,
proc sort data=have out=want nodupkey;
On Mon, 19 Jun 2006 03:51:25 +0100, tenny kurian
> Thank you for your response,
> All the feilds in a record should be same to make it unique.
> it is not a 2 or 3 common fields. All the information contained in the
record are same, then i need to eliminate all the duplicate ones and keep
> Say for eg. A record contains following info.
> if all the values contained in the above variables are repeated, then
delete the duplicate ones.
> i am using Unix - SAS v8
> Thank You,
>Kevin Roland Viel wrote:
> On Sun, 18 Jun 2006, tenny kurian wrote:
>> i would like to get help for the following problem.
>> i am getting input records from a flat file.
>> Each line in the external flat file corresponds to one record.
>> i am reading the external flat file using infile statement and using
>> LRECL is 300
>> there are some duplicates records in the flat file. it need not to be
>> i want to read only unique records. that means if there is a replica of
a record , then i want to read only the first occurence of that record.
>> It would be really helpful if someone can help in resolving this issue.
>You are not quite clear. You need to read a record to determine whether
>it is unique. Is this a problem of a large flat file that you are trying
>to make more efficient being reading on part of the record if it is not
>another with the same identifier has already been read?
>Also, you should state which version of SAS you have and what
>combination of fields make the record unique.
>I have assumed that only part of the record determines its ID and
>whether it is unique. I have also taken advantage of the HASH object
>available in v9:
>data _null_ ;
>file "C:\unique.txt" ;
>do x = 1 to 10 ;
>do y = 1 to 2 ;
>put x y ;
>data unique ( keep = ID y ) ;
>if _n_ = 1 then
>dcl hash unique() ;
>unique.Definekey ( "ID" ) ;
>unique.Definedone ( ) ;
>infile "C:\unique.txt" ;
>input ID y ;
>__rc = unique.CHECK() ;
>if __rc ne 0 then
>__rc = unique.ADD() ;
>proc print data = unique ;
>You could use the entire line as the ID, but you could start running into
>RAM limitations and start paging. I personally am very hesitant to not
>make a dataset from the original file and subset that. In the very least,
>I would count the number of duplicates I have and write that to the log,
>which I keep.
>This method is rather robust to the nature of the ID. If you do not have
>v9, then you can accomplish the same thing using an array, but the index
>can only be a number (in sharp contrast to the key of the hash).
>Department of Epidemiology
>Rollins School of Public Health
>Atlanta, GA 30322
> Yahoo! India Answers: Share what you know. Learn something new Click
> Send free SMS to your Friends on Mobile from your Yahoo! Messenger
Yahoo! India Answers: Share what you know. Learn something new Click here
Send free SMS to your Friends on Mobile from your Yahoo! Messenger Download now