LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (July 2010, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Wed, 7 Jul 2010 18:05:40 +0000
Reply-To:   toby dunn <tobydunn@HOTMAIL.COM>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   toby dunn <tobydunn@HOTMAIL.COM>
Subject:   Re: How to compress muliple decimal points
Comments:   To: david.brewer@uc.edu
In-Reply-To:   <201007071733.o67AmAE2020468@willow.cc.uga.edu>
Content-Type:   text/plain; charset="iso-8859-1"

NO worries Dave, I am slowly working on an outline for a SAS book covering Perl RegEx's and SAS. Lets hope I can get it started soon.

's/(\d+)\.+(\d+)\.?/$1.$2/io'

Everything inside of () is a captured buffer. It allows one to be able to one segregate hunks of the pattern to match and hold on to for later use.

\d is simply saying look for a number ie. 0-9

+ says to match the preceding character 1 or more times

\. says match a . (dot), you have to prefix the . with a \ because the . in regex language is a metacharacter and has a meaning, so one either has to use the escape character \ or put it in a character class such as [.]. I chose \. as it has less keystrokes.

? means to match the preceding character(s) 10 or more times.

putting it all together you get:

(\d+) => match some number(s) where you have at least one number to start the match on

\.+ => followed by some number of actual .'s

(\d+) followed by at least 1 or more numbers

\.? that may or may not be followed by a trailing .

Now that the pattern is set you can specify how you want to rearrange the captured buffers in PRXChange... /$1.$2/

This simply says to take the digits from the first captured buffer, add a ., then add the digits from the second caputured buffer.

In essence it merely grabs the digits from your examples and places a . between them.

It may or may not be correct for all your cases. To which I would need to see what you are dealing with to mod it out or if one really wanted to you could simply write a perl regex to delete all dots except the first one with a lookbehind. Also I know its not optimized as one would want to first run a lookahead and only do something if there is more than one . in your text. Which can be done if you also use a lookahead If-Then contruct within your RegEx.

So yes it does get very hairy very quickly when you move beyond simple matches.

The zero-assertion lookahead If-Then contruct I am starting to write a SUG paper on. Hopefully it will be ready in the near future, well it will be finished by September.

Toby Dunn

"Don't bail. The best gold is at the bottom of barrels of crap." Randy Pausch "Be prepared. Luck is where preparation meets opportunity." Randy Pausch

> Date: Wed, 7 Jul 2010 13:33:51 -0400 > From: david.brewer@UC.EDU > Subject: Re: How to compress muliple decimal points > To: SAS-L@LISTSERV.UGA.EDU > > Toby, > > I love your one liners! Now, if only I can understand Perl expressions. > Looks like it's time to RTFM :=) > > Thanks much. > Dave > > On Wed, 7 Jul 2010 16:35:51 +0000, toby dunn <tobydunn@HOTMAIL.COM> wrote: > > >Or just a One Line RegEx would do it: > > > > > > > >Z = PrxChange( 's/(\d+)\.+(\d+)\.?/$1.$2/io' , -1 , X ) ; > > > > > > > > > > > > > > > > > > > >Toby Dunn > > > >"Don't bail. The best gold is at the bottom of barrels of crap." > >Randy Pausch > > > >"Be prepared. Luck is where preparation meets opportunity." > >Randy Pausch > > > > > > > > > >> Date: Wed, 7 Jul 2010 10:48:32 -0400 > >> From: david.brewer@UC.EDU > >> Subject: Re: How to compress muliple decimal points > >> To: SAS-L@LISTSERV.UGA.EDU > >> > >> Hi Art, > >> > >> Your solution is definitely more inclusive than what I had in mind. > >> > >> I was using the following to solve the decimal point at the end of a > >> string (ex, 37.2.) > >> > >> l = length(x); > >> if substr(x,l,1) = "." then x = substr(x,1,l-1); > >> > >> I would then use Tom's solution to handle the double decimal point (ex, > >> 55..8). > >> > >> I think I will stick with yours! > >> > >> Thanks again. > >> Dave > >> > >> > >> On Wed, 7 Jul 2010 09:41:59 -0400, Arthur Tabachneck > <art297@NETSCAPE.NET> > >> wrote: > >> > >> >Dave, > >> > > >> >Tom's suggestion is definitely a lot less brute looking than the > >> >following, but doesn't correct for one of your examples. There is > >> >probably a much easier way, but the following will at least accomplish > the > >> >task: > >> > > >> >data have; > >> > input (x y) ($); > >> > got_dot=0; > >> > do i=1 to length(x); > >> > if substr(x,i,1) eq '.' then do; > >> > if not(got_dot) then do; > >> > x=substr(x,1); > >> > got_dot=1; > >> > end; > >> > else do; > >> > x=catt(substr(x,1,i-1),substr(x,i+1)); > >> > end; > >> > end; > >> > end; > >> > want=input(x,best12.); > >> > cards; > >> >37..2 s/b > >> >37.2 but > >> >56.2. s/b > >> >56.2 s/b > >> >; > >> > > >> >Art > >> >---------- > >> >On Wed, 7 Jul 2010 08:57:40 -0400, Dave Brewer <david.brewer@UC.EDU> > >> wrote: > >> > > >> >>Tom, > >> >> > >> >>Thanks for your solution. I was thinking of TRANWRD, but for some dumb > >> >>reason, I didn't think I could put in two decimal points. > >> >> > >> >>Thanks again. > >> >>Dave > >> >> > >> >>On Wed, 7 Jul 2010 08:36:29 -0400, Tom Abernathy > >> <tom.abernathy@GMAIL.COM> > >> >>wrote: > >> >> > >> >>>Dave - > >> >>> A simple thing is to use TRANWRD function. > >> >>> clean = tranwrd( old , '..' , '.' ); > >> >>> > >> >>> Another trick I have used to eliminate dups is to switch the > character > >> >>>with the space character and use the COMPBL function, then switch > back. > >> >>>This will handle 2, 3 or more characters in a row in one pass. > >> >>> > >> >>> new = translate( old , ' .' , '. ' ); > >> >>> new = compbl( new ); > >> >>> new = translate( new , ' .' , '. ' ); > >> >>> > >> >>>- Tom > >> >>> > >> >>>On Wed, 7 Jul 2010 07:07:23 -0400, Dave Brewer <david.brewer@UC.EDU> > >> >>wrote: > >> >>> > >> >>>>Hi All, > >> >>>> > >> >>>>I'm sure there is a simple answer to my problem, but I can't see the > >> >>>>forest through the trees! > >> >>>> > >> >>>>How do I replace multiple decimal points with one when the decimals > are > >> >>>>back-to-back? Ex) 37..2 s/b 37.2 but 56.2. s/b 56.2 > >> >>>> > >> >>>>I am trying to clean up dirty data and keep as many values as > possible > >> >>>>without making them missing. > >> >>>> > >> >>>>Thanks. > >> >>>>Dave > > > >_________________________________________________________________ > >Hotmail has tools for the New Busy. Search, chat and e-mail from your > inbox. > >http://www.windowslive.com/campaign/thenewbusy? > ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_1 _________________________________________________________________ Hotmail has tools for the New Busy. Search, chat and e-mail from your inbox. http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_1


Back to: Top of message | Previous page | Main SAS-L page