LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (April 2009, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Mon, 20 Apr 2009 17:41:07 -0500
Reply-To:     Joe Matise <snoopy369@GMAIL.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Joe Matise <snoopy369@GMAIL.COM>
Subject:      Re: Parse log contained in one variable
Comments: To: "Andrew Z." <ahz001@gmail.com>
In-Reply-To:  <b7a7fa630904201512g1f4b76a0r5224fdf35f329193@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

Here is one solution using only PRX patterns (and please note I am not particularly skilled at perl regexps, so don't laugh at my simplistic patterns). I use different variables for every single element's position/length, but that's not necessary, I don't think; I just did it for clarity (to show what they hold). Make sure everythings' length is appropriate to what it might hold, of course.

data test; input str $200.; infile datalines truncover; datalines; 4/20/2009 15:46:13 John Smith: I processed transaction foo 4/19/2009 13:10:09 John Doe: Customer asked us to process transaction foo ;;;; run;

data want; set test; format date DATE9.; format time TIME8.; format name $50.; format text $450.; patternid_date = prxparse('/\d\d?\/\d\d?\/\d\d\d\d/'); call prxsubstr(patternid_date,str,position_date,length_date); date=input(substr(str,position_date,length_date),MMDDYY10.);

patternid_time = prxparse('/\d\d?\:\d\d\:\d\d/'); call prxsubstr(patternid_time,str,position_time,length_time); time=input(substr(str,position_time,length_time),HHMMSS8.);

patternid_name = prxparse('/[A-Za-z]+ [A-Za-z]+\:/'); call prxsubstr(patternid_name,str,position_name,length_name); name=substr(str,position_name,length_name-1);

text=substr(str,position_name+length_name+1); run;

-Joe

On Mon, Apr 20, 2009 at 5:12 PM, Joe Matise <snoopy369@gmail.com> wrote:

> You should use either perl reg exps (PRX* functions) or some variation > thereof. > > SPLIT in python is most similarly SCAN in SAS. SCAN lets you split a > string up by a delimiter. In the strings you attach, SPLIT(str,1,' ') gives > you the date, split(str,2,' ') gives you the time, and the rest should be > read with substr or PRX functions. Unless you really have \n in there > anyway, which would be useful. > > In general, see > http://support.sas.com/onlinedoc/913/getDoc/en/lrdict.hlp/a002288677.htm > for the PRX functions. > > -Joe > > On Mon, Apr 20, 2009 at 4:55 PM, Andrew Z. <ahz001@gmail.com> wrote: > >> I recently started to learn SAS (9.1/Windows), and now I need to parse >> a log. Read from an ODBC source (with a poor design which I can't >> change), each person has a single log with multiple events crammed in >> a single variable. I want to break apart the log into multiple >> observations and multiple variables, so I can use it like a database. >> I've seen how to do this in SAS from a text file (INFILE/CARDS) but >> not from ODBC. Please point me in the right direction. >> >> An example of the log for one person >> 4/20/2009 15:46:13 John Smith: I processed transaction foo >> 4/19/2009 13:10:09 John Doe: Customer asked us to process transaction >> foo >> >> If I were using a general purpose language like Python, I would use >> split('\n') to split the log into multiple variables. Then, I would >> parse out the date, time, creator's name, and comment using a POSIX or >> Perl regular expression. Then, I would store the parsed data in a new >> database table. >> >> >> Andrew >> > >


Back to: Top of message | Previous page | Main SAS-L page