LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (January 2007, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Tue, 23 Jan 2007 21:18:57 -0700
Reply-To:     Alan Churchill <SASL001@SAVIAN.NET>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Alan Churchill <SASL001@SAVIAN.NET>
Subject:      Re: substr, scan
Comments: To: toby dunn <tobydunn@HOTMAIL.COM>
In-Reply-To:  <BAY123-F239EBBEA3C2E0BD796CA70DEAC0@phx.gbl>
Content-Type: text/plain; charset="iso-8859-1"

Here is a single pattern that will match all of the elements:

(\d{1,2}y)|(\d{1,2}d)|(\d{1,2}h)|(\d{1,2}m)|(\d{1,2}s)

1-2 Digits followed by a y in capture #1 or 1-2 Digits followed by a d in capture #2 or 1-2 Digits followed by a h in capture #3 or 1-2 Digits followed by a m in capture #4 or 1-2 Digits followed by a s in capture #5

Alan

Alan Churchill Savian "Bridging SAS and Microsoft Technologies" www.savian.net

-----Original Message----- From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of toby dunn Sent: Tuesday, January 23, 2007 9:07 PM To: SAS-L@LISTSERV.UGA.EDU Subject: Re: substr, scan

Art ,

Ian leaves me confused quite a bit and I often find careful, thoughful, and often long study of what he says yeilds some perl of wisdom. I would rather think Ian is saying that this particular text parsing problem has been solved using PRX. So why use something that is less robust to solve the problem, well the level of skil in a shop is one valid reason I can see. But pushing th limits every now and again ussually isnt a bad thing. Heck if we never ventured out we would get no where in our development.

So to that end I offer this PRX solution that may very well make Perl heads cringe...

data test; input var $11. ; datalines; w2y w34y4h w38h45m23s p5d23m p61h w56s ; run;

Data Need ( Keep = Var Sec Min Hour ) ; Set Test ; Retain Pattern ;

If _N_ = 1 Then Pattern = PRXParse( "/\d+[smh]/" ) ;

Start = 1 ; Stop = Length( Var ) ;

Call PrxNext( Pattern , Start , Stop , Var , Match , Length ) ;

Do I = 1 By 1 While( ( Match > 0 ) and ( I < Stop ) ) ;

Temp = Substr( Var , Match , Length ) ;

If Index( Temp , 's' ) Then Sec = Input( Compress( Temp , 's' ) , 8. ) ; Else If Index( Temp , 'm' ) Then Min = Input( Compress( Temp , 'm' ) , 8. ) ; Else If Index( Temp , 'h' ) Then Hour = Input( Compress( Temp , 'h' ) , 8. ) ;

Call PrxNext( Pattern , Start , Stop , Var , Match , Length ) ; End ;

Run ;

Proc Print data = Need ; Run ;

Toby Dunn

To sensible men, every day is a day of reckoning. ~John W. Gardner

The important thing is this: To be able at any moment to sacrifice that which we are for what we could become. ~Charles DuBois

Don't get your knickers in a knot. Nothing is solved and it just makes you walk funny. ~Kathryn Carpenter

From: Arthur Tabachneck <art297@NETSCAPE.NET> Reply-To: Arthur Tabachneck <art297@NETSCAPE.NET> To: SAS-L@LISTSERV.UGA.EDU Subject: Re: substr, scan Date: Tue, 23 Jan 2007 20:41:57 -0500

Ian,

PRX functions may, or may not, be the be-all-end-all for text parsing. I'm still not convinced, but quite open to learn, test and decide.

I do have to disagree that 60's solutions shouldn't be used, simply because they were not developed in the 90's. And I have to believe that you would not propose that all known working solutions should be replaced with new potential solutions, simply because they have been offered.

If someone would be kind enough to offer a PRX solution for the current problem, I'm sure that it would help all of us in our ultimate decision of where and when such solutions may be most appropriate.

Art ----------- On Tue, 23 Jan 2007 22:56:55 +0000, Ian Whitlock <iw1junk@COMCAST.NET> wrote:

>Summary: Use PRX functions >#iw-value=1 > >Helen, > >Take a look at this piece of documentation copied from the help screens of >version 9.1.2. > >---------------------------------------------- >Definition of Perl Regular Expression (PRX) Functions and CALL Routines > > >Perl regular expression (PRX) functions and CALL routines refers to a group >of functions and CALL routines that uses a modified version of Perl as a >pattern matching language to parse character strings. You can > >search for a pattern of characters within a string > >extract a substring from a string > >search and replace text with other text > >parse large amounts of text, such as Web logs or other text data, more >quickly than with SAS regular expressions. > >Perl regular expressions are part of the character string matching category >for functions and CALL routines. For a short description of these functions >and CALL routines, see the Functions and CALL Routines by Category. >-------------------------------------------------- > >That "extract" line looks exactly like your problem. It is time for all >SAS programmers to get with it and use regular expressions. > >Now does 'y' stand for years? What would you do with > > p_h3hs25h4m36s > >Perhaps the '3h' shouldn't match because the following 's' breaks the >pattern. Can there be garbage on the end or does the string always end in >'h', 'm', or 's' unless those 'y's get in the way? > >Using INDEX and SCAN is sort of applying a 60's solution to a problem >solved in the 90's. > >Finally I have to note that people who put data in the variable names >deserve the headaches they give programmers. Fortunately the solution here >is fairly straight forward once one has the rules and the right tools. > >Ian Whitlock >================== >Date: Tue, 23 Jan 2007 11:26:52 -0800 >Reply-To: chenghelen2000@YAHOO.COM >Sender: "SAS(r) Discussion" >From: chenghelen2000@YAHOO.COM >Organization: http://groups.google.com >Subject: substr, scan >Comments: To: sas-l >Content-Type: text/plain; charset="iso-8859-1" > >Hello All, > >I have a dataset "test" as below: > >data test; > input var $10.; >datalines; >w2y >w34y4h >w38h45m23s >p5d23m >p61h >w56s >; >run; > >(h is for hour, m is for minute and s for second. For example, w38h45m23s: >38 hour, 45 minute, 23 second) > >I would like to create three variables for hour, minute, and second. What >is best way to get the values from above variable "var" by using substr, >scan? > >Thanks,

_________________________________________________________________ Get in the mood for Valentine's Day. View photos, recipes and more on your Live.com page. http://www.live.com/?addTemplate=ValentinesDay&ocid=T001MSN30A0701


Back to: Top of message | Previous page | Main SAS-L page