Date: Wed, 26 Oct 2011 20:17:01 +0000
Reply-To: toby dunn <tobydunn@HOTMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: toby dunn <tobydunn@HOTMAIL.COM>
Subject: Re: How to find Nth occurence (regular expression)
In-Reply-To: <CAM+YpE-Wi3vL6RqxyqoptGkr5xVswxojxA2QmFR+Fey70D_7-g@mail.gmail.com>
Content-Type: text/plain; charset="Windows-1252"
Joe, The RegEx would be: Pattern = PrxParse( '/Art(?=.*?(Art))/io' ) ;Match = PrxMatch( Pattern , Text ) ;Value = PrxPosn( Pattern , 1 , Text ) ; If you want a one line RegEx: Value = PrxChange( 's/.*?Art(?=.*?(Art)).*/\1/io' , 1 , Text ) ; Even though the (Art) is inside of a LookAhead the capture parens still hold the value.
Toby Dunn
If you get thrown from a horse, you have to get up and get back on, unless you landed on a cactus; then you have to roll around and scream in pain.
“Any idiot can face a crisis—it’s day to day living that wears you out”
~ Anton Chekhov
> Date: Wed, 26 Oct 2011 13:11:36 -0500
> Subject: Re: How to find Nth occurence (regular expression)
> From: snoopy369@gmail.com
> To: tobydunn@hotmail.com
> CC: SAS-L@listserv.uga.edu
>
> So is there no way to do it with a single line of (Art) [ie, without
> having (Art) (stuff) (Art)]? I was trying to play around with
> combining lookahead/lookbehind and \1 \2 etc., but that doesn't seem
> to work.
>
> -Joe
>
> On Wed, Oct 26, 2011 at 12:21 PM, toby dunn <tobydunn@hotmail.com> wrote:
> > Okay Art I remembered you are trying tp capture these values which means I also used PrxPosn. I also simplified it and dropped my Lookahead to make it easier to understand what is going on. Data _Null_ ;Text = 'Art is the man, if Art cant do it no one can" ; Pattern = PrxParse( '/(Art)(?:.*?)(Art)/io' ) ;Match = PrxMatch( Pattern , Text ) ;First = PrxPosn( Pattern , 1 , Text ) ;Second = PrxPosn( Pattern , 2 , Text ) ; Run ; Normally .* construct is a bad thing as it causes a lot of back tracking. However, the .*? causes it to step forward rather than eating up everything and be forced to backtrack. BTW, the :>retain _dt_pattern_num;
> >> if _n_ = 1 then do;
> >> _dt_pattern_num=prxparse(
> >> "/\d\d\d\d\:\d\d\:\d\d\ \d\d\:\d\d\:\d\d/");
> >> end;
> > Has been replaced with the /o modifier. So all you need is _dt_pattern_num=prxparse( "/\d\d\d\d\:\d\d\:\d\d\ \d\d\:\d\d\:\d\d/o" ) ;
> >
> > Call PrxNext is useful only when you dont know how many matches you will find in your target string.I also have concluded that PrxSubstr is next to useless, well other than to make an example to show how it works..
> > Toby Dunn
> >
> >
> > If you get thrown from a horse, you have to get up and get back on, unless you landed on a cactus; then you have to roll around and scream in pain.
> >
> >
> >
> > “Any idiot can face a crisis—it’s day to day living that wears you out”
> > ~ Anton Chekhov
> >
> > > Date: Wed, 26 Oct 2011 10:43:37 -0400
> >> From: art297@ROGERS.COM
> >> Subject: How to find Nth occurence (regular expression)
> >> To: SAS-L@LISTSERV.UGA.EDU
> >>
> >> There MUST be a more direct way of doing this. I am trying to find the 2nd occurrence of a particular pattern. The following works, but is there a
> >> more direct way?
> >>
> >> Here is an example of what I am trying to do:
> >>
> >> data have;
> >> informat stuff $80.;
> >> infile cards truncover;
> >> input stuff &;
> >> cards;
> >> aaa 2011:05:15 10:22:13 cc 2011:05:18 10:22:09 dddd
> >> aaa ccc 2011:05:29 10:30:14 cc
> >> ;
> >>
> >> data want (keep=edatepos2);
> >> set have;
> >> retain _dt_pattern_num;
> >> if _n_ = 1 then do;
> >> _dt_pattern_num=prxparse(
> >> "/\d\d\d\d\:\d\d\:\d\d\ \d\d\:\d\d\:\d\d/");
> >> end;
> >> start=1;
> >> stop=length(stuff);
> >> do i=1 to 2;
> >> CALL PRXNEXT(_dt_pattern_num, start, stop,
> >> stuff, edatepos2, dummy);
> >> end;
> >> run;
> >>
> >> Thanks in advance,
> >> Art
> >
|