LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (December 2010, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Mon, 27 Dec 2010 04:16:20 +0000
Reply-To:     toby dunn <tobydunn@HOTMAIL.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         toby dunn <tobydunn@HOTMAIL.COM>
Subject:      Re: Is there an easier way to solve this?
Comments: To: Nat Wooding <nathani@verizon.net>, art297@rogers.com
In-Reply-To:  <166ED9FA64FA4324866CDD5EE028E6A9@D1871RB1>
Content-Type: text/plain; charset="iso-8859-1"

I did say it was untested as I didn't have SAS around to run the code. I believe the whole PRX headache problem is akin to the whole Algebra is hard because it has letters in it rather than all numbers. You have to remember its a sub-language all unto itself. The heart of all RegEx is the Pattern: '/^(.*)\s+([A-Z]{2})\s+(\d{5})$/o' I used the line anchors ^ and $ so the RegEx engine can optimize the search. () = are nothing more than capturing parens. .* = says grab everything but a new line anchor and as much as you can. It basically eats the whole line but then so has to give back part of what it took so the rest the pattern can match. \s+ = says match at least 1 space but as many as it can find ([A-Z]{2}) = capture two capital letters ranging from A thru Z (\d{5}) = capture 5 numbers ranging from 0 thru 9. The PrxPosn returns the specified captured matched text to a variable. City = PrxPosn( Pattern , 1 , Text ) ; Says to return the text captured in the first set capture parens in the Pattern: (.*) If you want to work through a short but rather nasty working regex that actually does soemthing productive when dealing with a space seperated list try this one: 's/\b(\w+)\s(?=.*\1)//o' It dedupes a space seperated list of words. Since Perl LookBehinds cannot be variable length and lookaheads only look forward from the previous matched text the number of spaces there are specified characters in the subpattern. Well unless there is no previously match text, in which case it will scan the whole text string. So in order to search the remaining text you have to use a .* inside of the lookahead and then have it give back some of the matched text so the back reference \1 can be attempted. The cool thing I finally figired out is what everything (well the meat and potatoes) of the output from the PRXDebug spits out in the log. Definitly worth te effort as it shows how the Regex is being attempted step by step. Actual working code for the stated problem: Data Address ( Drop = Text Pattern Match ) ; Length City $ 40 State $ 2 Zip $ 5 Address $ 20 FirstName LastName $ 20 ;

Infile Cards ; Input FirstName LastName / Address / Text $Char20. ;

Pattern = PrxParse( '/^(.*)\s+([A-Z]{2})\s+(\d{5})$/o' ) ; Match = PrxMatch( Pattern , Strip( Text ) ) ;

Put Match= Text= ; City = PrxPosn( Pattern , 1 , Text ) ; State = PrxPosn( Pattern , 2 , Text ) ; Zip = PrxPosn( Pattern , 3 , Text ) ;

Cards ; Lee Athnos 1215 Raintree Circle New York NY 85044 Heidie Baker 1751 Diehl Road Vienna VA 22124 ; Run ;

Toby Dunn

"I'm a hell bent 100% Texan til I die"

"Don't touch my Willie, I don't know you that well"

> Date: Sun, 26 Dec 2010 18:32:38 -0500 > From: nathani@VERIZON.NET > Subject: Re: Is there an easier way to solve this? > To: SAS-L@LISTSERV.UGA.EDU > > Art > > Congratulations!! I made a couple weak attempts at it but never got > anywhere. Do I need to email you any Excedrin? > > At the moment, I trying to come up with a way of doing an enhanced search of > an on-line commercial site so that I can do a better job of filtering stuff > that my wife wants to look at. > > It must be New Years someone in the world by now. > > Nat > > -----Original Message----- > From: Arthur Tabachneck [mailto:art297@ROGERS.COM] > Sent: Sunday, December 26, 2010 6:24 PM > To: SAS-L@LISTSERV.UGA.EDU; Nat Wooding > Subject: Re: Is there an easier way to solve this? > > Nat, > > While I can't figure out exactly what Toby was suggesting (and I, too, get a > headache trying to figure out perl expressions), I was able to combine his > code with your idea of using infile magic, and came up with the following > solution: > > data Address ( Drop = Text Pattern Match ) ; > Length City $ 40 > State $ 2 > Zip $ 5 ; > infile cards; > input FirstName $ LastName $ / > Address $ 1 - 20 / > @; > _infile_ = prxchange('s/ ([A-Z]{2})/ $1/',-1,_infile_); > input city & state zip; > cards; > Lee Athnos > 1215 Raintree Circle > New York NY 85044 > Heidie Baker > 1751 Diehl Road > Vienna VA 22124 > ; > run; > > Is it New Years yet? > > Art > -------- > On Sun, 26 Dec 2010 13:51:57 -0500, Nat Wooding <nathani@VERIZON.NET> wrote: > > >Toby > > > >Here's part of a test: > > > >191 > >192 Pattern = ( '/^(.*)\s+([A-Z]{2})\s+(\d{5})$/o' ) ; > >193 Match = ( Pattern , Strip( Text ) ) ; > > - > > 22 > > 76 > >ERROR 22-322: Syntax error, expecting one of the following: (, [, {. > > > >ERROR 76-322: Syntax error, statement will be ignored > > > > > >Since this stuff gives me a headache and I avoid it, I'm not sure just what > >your code should look like other than there may be a missing function name. > > > >Nat > >-----Original Message----- > >From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of toby > >dunn > >Sent: Sunday, December 26, 2010 1:29 PM > >To: SAS-L@LISTSERV.UGA.EDU > >Subject: Re: Is there an easier way to solve this? > > > >Untested: > > > >Data Address ( Drop = Text Pattern Match ) ; > >Length City $ 40 > > State $ 2 > > Zip $ 5 > > Address $ 20 > > FirstName > > LastName $ 20 > >; > > > >Infile Cards ; > >Input FirstName $ LastName $ / > > Address $ / > > Text $ > >; > > > >Pattern = ( '/^(.*)\s+([A-Z]{2})\s+(\d{5})$/o' ) ; > >Match = ( Pattern , Strip( Text ) ) ; > > > >City = PrxPosn( Pattern , 1 , Text ) ; > >State = PrxPosn( Pattern , 2 , Text ) ; > >Zip = PrxPosn( Pattern , 3 , Text ) ; > > > >Cards ; > >Lee Athnos > >1215 Raintree Circle > >New York NY 85044 > >Heidie Baker > >1751 Diehl Road > >Vienna VA 22124 > >; > > > >Run ; > > > > > >If you want to expand the pattern to make things optional or to encopass > the > >possibility of the extended ZipCodes you can just jack with the pattern a > >little. > > > > > >Toby Dunn > > > > > >"I'm a hell bent 100% Texan til I die" > > > >"Don't touch my Willie, I don't know you that well" > > > > > > > > > > > >> Date: Sun, 26 Dec 2010 12:59:24 -0500 > >> From: art297@ROGERS.COM > >> Subject: Re: Is there an easier way to solve this? > >> To: SAS-L@LISTSERV.UGA.EDU > >> > >> Jack, > >> > >> Nice try but no cigar! The problem, in this case, is how to get SAS to > >read > >> from right to left, ONLY allowing embedded space in the left most field. > >> > >> Art > >> ------- > >> On Sun, 26 Dec 2010 09:41:01 -0800, Jack Hamilton > <jfh@STANFORDALUMNI.ORG> > >> wrote: > >> > >> >I don't have SAS on this machine, so I can't try it, but what about > >> > > >> >> input FirstName $ LastName $ / > >> >> Address $ 1 - 20 / > >> >> city $ @' ' state $ @' ' zip $; > >> > > >> > > >> >> > >> > > >> > > >> >On Dec 26, 2010, at 9:07 AM, Nat Wooding wrote: > >> > > >> >> Art > >> >> > >> >> I have approached this type of problem in a similar fashion in past. > >> >> Sometimes I don't bother with a new variable (_third_line) but simply > >use > >> >> _infile_. Just for grins, here is a slightly different way to extract > >the > >> >> city. > >> >> > >> >> Nat > >> >> > >> >> > >> >> data work.Address (drop=_:); > >> >> infile cards; > >> >> input FirstName $ LastName $ / > >> >> Address $ 1 - 20 / > >> >> _Third_Line & $80.; > >> >> LENGTH City $ 40 state $ 2 zip $5 ; > >> >> Zip=scan(_Third_Line,-1); > >> >> State=scan(_Third_Line,-2); > >> >> > >> >> > >> >> City=substr(_Third_Line,1,LENGTH( CATX(' ', ZIP, STATE)) + 1); > >> >> cards; > >> >> Lee Athnos > >> >> 1215 Raintree Circle > >> >> New York NY 85044 > >> >> Heidie Baker > >> >> 1751 Diehl Road > >> >> Vienna VA 22124 > >> >> ; > >> >> > >> >> > >> >> -----Original Message----- > >> >> From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of > >> Arthur > >> >> Tabachneck > >> >> Sent: Sunday, December 26, 2010 11:07 AM > >> >> To: SAS-L@LISTSERV.UGA.EDU > >> >> Subject: Is there an easier way to solve this? > >> >> > >> >> The following was a question that was raised on the SAS discussion > >forum. > >> >> You are confronted with data that has 3 lines per subject, but the > >third > >> >> line has variables that may contain embedded spaces, but there is only > >> one > >> >> space between variables. > >> >> > >> >> The only suggestion I could think of was the one shown below. Is there > >> an > >> >> easier way? > >> >> > >> >> data work.Address (drop=_:); > >> >> infile cards; > >> >> input FirstName $ LastName $ / > >> >> Address $ 1 - 20 / > >> >> _Third_Line & $80.; > >> >> format City $10.; > >> >> Zip=scan(_Third_Line,-1); > >> >> State=scan(_Third_Line,-2); > >> >> call scan(_Third_Line, -2, _position, _length); > >> >> City=substr(_Third_Line,1,_position-1); > >> >> cards; > >> >> Lee Athnos > >> >> 1215 Raintree Circle > >> >> New York NY 85044 > >> >> Heidie Baker > >> >> 1751 Diehl Road > >> >> Vienna VA 22124 > >> >> ; > >> >> > >> >> Thanks in advance, > >> >> Art


Back to: Top of message | Previous page | Main SAS-L page