Date: Tue, 29 May 2012 02:51:53 -0400
Reply-To: Søren Lassen <s.lassen@POST.TELE.DK>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Søren Lassen <s.lassen@POST.TELE.DK>
Subject: Re: How to clean such invalid raw data?
Content-Type: text/plain; charset=ISO-8859-1
Max,
A slight change in you original data step may do the trick:
data have;
infile cards dlm=':' missover;
/* Hold the input (@) so that we can reread in case of error */
input name $ age salary @;
if _error_ then do; /* Reread the line, try with last name inserted */
_error_=0;
/* Hold the input in case we want to apply more rules */
input @1 name $ lastname $ age salary @;
end;
cards;
Jack:23:20000:
Tom:Smith:20:10000:
Tim:30:3000:
;run;
But it will not work if someone har "13" as a last name - probably won't
happen, but other character fields may produce "fake" numeric data.
Regards,
Søren
On Tue, 22 May 2012 23:00:01 -0400, bbser 2009 <bbser2009@GMAIL.COM> wrote:
>Hi there,
>
>I am dealing with sort of large raw data stored as text document in Win 7.
>But there are rows having invalid data, in which there are more fields than
>in other rows, which results in having numeric variables read character
>string. Something like this simplified example data, where the second
>dataline is obviously invalid.
>
>data have;
>infile cards dlm=':' missover;
>input name age salary;
>cards;
>Jack:23:20000:
>Tom:Smith:20:10000:
>Tim:30:3000:
>;
>
>Any idea to automatically clean such invalid data? Thank you.
>
>
>Regards, Max
>(Maaxx)
|