Date: Tue, 23 Jan 2007 21:18:57 -0700
Reply-To: Alan Churchill <SASL001@SAVIAN.NET>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Alan Churchill <SASL001@SAVIAN.NET>
Subject: Re: substr, scan
In-Reply-To: <BAY123-F239EBBEA3C2E0BD796CA70DEAC0@phx.gbl>
Content-Type: text/plain; charset="iso-8859-1"
Here is a single pattern that will match all of the elements:
(\d{1,2}y)|(\d{1,2}d)|(\d{1,2}h)|(\d{1,2}m)|(\d{1,2}s)
1-2 Digits followed by a y in capture #1 or
1-2 Digits followed by a d in capture #2 or
1-2 Digits followed by a h in capture #3 or
1-2 Digits followed by a m in capture #4 or
1-2 Digits followed by a s in capture #5
Alan
Alan Churchill
Savian "Bridging SAS and Microsoft Technologies"
www.savian.net
-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of toby
dunn
Sent: Tuesday, January 23, 2007 9:07 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Re: substr, scan
Art ,
Ian leaves me confused quite a bit and I often find careful, thoughful, and
often long study of what he says yeilds some perl of wisdom. I would rather
think Ian is saying that this particular text parsing problem has been
solved using PRX. So why use something that is less robust to solve the
problem, well the level of skil in a shop is one valid reason I can see.
But pushing th limits every now and again ussually isnt a bad thing. Heck
if we never ventured out we would get no where in our development.
So to that end I offer this PRX solution that may very well make Perl heads
cringe...
data test;
input var $11. ;
datalines;
w2y
w34y4h
w38h45m23s
p5d23m
p61h
w56s
;
run;
Data Need ( Keep = Var Sec Min Hour ) ;
Set Test ;
Retain Pattern ;
If _N_ = 1 Then Pattern = PRXParse( "/\d+[smh]/" ) ;
Start = 1 ;
Stop = Length( Var ) ;
Call PrxNext( Pattern , Start , Stop , Var , Match , Length ) ;
Do I = 1 By 1 While( ( Match > 0 ) and ( I < Stop ) ) ;
Temp = Substr( Var , Match , Length ) ;
If Index( Temp , 's' ) Then Sec = Input( Compress( Temp , 's' ) , 8.
) ;
Else If Index( Temp , 'm' ) Then Min = Input( Compress( Temp , 'm' ) , 8.
) ;
Else If Index( Temp , 'h' ) Then Hour = Input( Compress( Temp , 'h' ) , 8.
) ;
Call PrxNext( Pattern , Start , Stop , Var , Match , Length ) ;
End ;
Run ;
Proc Print
data = Need ;
Run ;
Toby Dunn
To sensible men, every day is a day of reckoning. ~John W. Gardner
The important thing is this: To be able at any moment to sacrifice that
which we are for what we could become. ~Charles DuBois
Don't get your knickers in a knot. Nothing is solved and it just makes you
walk funny. ~Kathryn Carpenter
From: Arthur Tabachneck <art297@NETSCAPE.NET>
Reply-To: Arthur Tabachneck <art297@NETSCAPE.NET>
To: SAS-L@LISTSERV.UGA.EDU
Subject: Re: substr, scan
Date: Tue, 23 Jan 2007 20:41:57 -0500
Ian,
PRX functions may, or may not, be the be-all-end-all for text parsing.
I'm still not convinced, but quite open to learn, test and decide.
I do have to disagree that 60's solutions shouldn't be used, simply
because they were not developed in the 90's. And I have to believe that
you would not propose that all known working solutions should be replaced
with new potential solutions, simply because they have been offered.
If someone would be kind enough to offer a PRX solution for the current
problem, I'm sure that it would help all of us in our ultimate decision of
where and when such solutions may be most appropriate.
Art
-----------
On Tue, 23 Jan 2007 22:56:55 +0000, Ian Whitlock <iw1junk@COMCAST.NET>
wrote:
>Summary: Use PRX functions
>#iw-value=1
>
>Helen,
>
>Take a look at this piece of documentation copied from the help screens of
>version 9.1.2.
>
>----------------------------------------------
>Definition of Perl Regular Expression (PRX) Functions and CALL Routines
>
>
>Perl regular expression (PRX) functions and CALL routines refers to a
group
>of functions and CALL routines that uses a modified version of Perl as a
>pattern matching language to parse character strings. You can
>
>search for a pattern of characters within a string
>
>extract a substring from a string
>
>search and replace text with other text
>
>parse large amounts of text, such as Web logs or other text data, more
>quickly than with SAS regular expressions.
>
>Perl regular expressions are part of the character string matching
category
>for functions and CALL routines. For a short description of these
functions
>and CALL routines, see the Functions and CALL Routines by Category.
>--------------------------------------------------
>
>That "extract" line looks exactly like your problem. It is time for all
>SAS programmers to get with it and use regular expressions.
>
>Now does 'y' stand for years? What would you do with
>
> p_h3hs25h4m36s
>
>Perhaps the '3h' shouldn't match because the following 's' breaks the
>pattern. Can there be garbage on the end or does the string always end in
>'h', 'm', or 's' unless those 'y's get in the way?
>
>Using INDEX and SCAN is sort of applying a 60's solution to a problem
>solved in the 90's.
>
>Finally I have to note that people who put data in the variable names
>deserve the headaches they give programmers. Fortunately the solution
here
>is fairly straight forward once one has the rules and the right tools.
>
>Ian Whitlock
>==================
>Date: Tue, 23 Jan 2007 11:26:52 -0800
>Reply-To: chenghelen2000@YAHOO.COM
>Sender: "SAS(r) Discussion"
>From: chenghelen2000@YAHOO.COM
>Organization: http://groups.google.com
>Subject: substr, scan
>Comments: To: sas-l
>Content-Type: text/plain; charset="iso-8859-1"
>
>Hello All,
>
>I have a dataset "test" as below:
>
>data test;
> input var $10.;
>datalines;
>w2y
>w34y4h
>w38h45m23s
>p5d23m
>p61h
>w56s
>;
>run;
>
>(h is for hour, m is for minute and s for second. For example, w38h45m23s:
>38 hour, 45 minute, 23 second)
>
>I would like to create three variables for hour, minute, and second. What
>is best way to get the values from above variable "var" by using substr,
>scan?
>
>Thanks,
_________________________________________________________________
Get in the mood for Valentine's Day. View photos, recipes and more on your
Live.com page.
http://www.live.com/?addTemplate=ValentinesDay&ocid=T001MSN30A0701