Date: Mon, 20 Apr 2009 17:41:07 -0500
Reply-To: Joe Matise <snoopy369@GMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Joe Matise <snoopy369@GMAIL.COM>
Subject: Re: Parse log contained in one variable
In-Reply-To: <b7a7fa630904201512g1f4b76a0r5224fdf35f329193@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
Here is one solution using only PRX patterns (and please note I am not
particularly skilled at perl regexps, so don't laugh at my simplistic
patterns). I use different variables for every single element's
position/length, but that's not necessary, I don't think; I just did it for
clarity (to show what they hold). Make sure everythings' length is
appropriate to what it might hold, of course.
data test;
input str $200.;
infile datalines truncover;
datalines;
4/20/2009 15:46:13 John Smith: I processed transaction foo
4/19/2009 13:10:09 John Doe: Customer asked us to process transaction foo
;;;;
run;
data want;
set test;
format date DATE9.;
format time TIME8.;
format name $50.;
format text $450.;
patternid_date = prxparse('/\d\d?\/\d\d?\/\d\d\d\d/');
call prxsubstr(patternid_date,str,position_date,length_date);
date=input(substr(str,position_date,length_date),MMDDYY10.);
patternid_time = prxparse('/\d\d?\:\d\d\:\d\d/');
call prxsubstr(patternid_time,str,position_time,length_time);
time=input(substr(str,position_time,length_time),HHMMSS8.);
patternid_name = prxparse('/[A-Za-z]+ [A-Za-z]+\:/');
call prxsubstr(patternid_name,str,position_name,length_name);
name=substr(str,position_name,length_name-1);
text=substr(str,position_name+length_name+1);
run;
-Joe
On Mon, Apr 20, 2009 at 5:12 PM, Joe Matise <snoopy369@gmail.com> wrote:
> You should use either perl reg exps (PRX* functions) or some variation
> thereof.
>
> SPLIT in python is most similarly SCAN in SAS. SCAN lets you split a
> string up by a delimiter. In the strings you attach, SPLIT(str,1,' ') gives
> you the date, split(str,2,' ') gives you the time, and the rest should be
> read with substr or PRX functions. Unless you really have \n in there
> anyway, which would be useful.
>
> In general, see
> http://support.sas.com/onlinedoc/913/getDoc/en/lrdict.hlp/a002288677.htm
> for the PRX functions.
>
> -Joe
>
> On Mon, Apr 20, 2009 at 4:55 PM, Andrew Z. <ahz001@gmail.com> wrote:
>
>> I recently started to learn SAS (9.1/Windows), and now I need to parse
>> a log. Read from an ODBC source (with a poor design which I can't
>> change), each person has a single log with multiple events crammed in
>> a single variable. I want to break apart the log into multiple
>> observations and multiple variables, so I can use it like a database.
>> I've seen how to do this in SAS from a text file (INFILE/CARDS) but
>> not from ODBC. Please point me in the right direction.
>>
>> An example of the log for one person
>> 4/20/2009 15:46:13 John Smith: I processed transaction foo
>> 4/19/2009 13:10:09 John Doe: Customer asked us to process transaction
>> foo
>>
>> If I were using a general purpose language like Python, I would use
>> split('\n') to split the log into multiple variables. Then, I would
>> parse out the date, time, creator's name, and comment using a POSIX or
>> Perl regular expression. Then, I would store the parsed data in a new
>> database table.
>>
>>
>> Andrew
>>
>
>
|