LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (October 2008, week 5)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Thu, 30 Oct 2008 10:19:07 -0400
Reply-To:   Toby Dunn <tobydunn@HOTMAIL.COM>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   Toby Dunn <tobydunn@HOTMAIL.COM>
Subject:   Re: how to clean characters from number data
Comments:   To: "./ ADD NAME=data _null_;" <iebupdte@GMAIL.COM>

Data _Null_,

Yeah I didnt think that far ahead to worry with the decimals. But as you showed its easy to keep them in. Id agree that the Compress function isnt the best for all situations but in this case id stick my my statement of droping the Perl RegEx and sticking to the compress function as it is more readable ans definitly scales up better.

Toby Dunn

On Thu, 30 Oct 2008 07:28:02 -0500, ./ ADD NAME=Data _null_, <iebupdte@GMAIL.COM> wrote:

>'DK' will toss the decimal, could be specified in arg2 see below. >However I'm not sure either compress method will produce desirable >results in all situations. > >1181 data _null_; >1182 input cWeight :$20.; >1183 weight = input(compress(cWeight,,'AI'),F8.); >1184 wt = Input(Compress(cWeight,,'dk'),Best.); >1185 wt2 = Input(Compress(cWeight,'.','dk'),Best.); >1186 put / 'NOTE: Results from INFILE ' _infile_ @; >1187 put (_all_)(/= 'NOTE- '); >1188 cards; > > >NOTE: Results from INFILE 112.1 > cWeight=112.1 > weight=112.1 > wt=1121 > wt2=112.1 > >NOTE: Results from INFILE 130kg > cWeight=130kg > weight=130 > wt=130 > wt2=130 > >NOTE: Results from INFILE 100.5 kg > cWeight=100.5 > weight=100.5 > wt=1005 > wt2=100.5 > >NOTE: Results from INFILE 123kilogams > cWeight=123kilogams > weight=123 > wt=123 > wt2=123 > >NOTE: Results from INFILE any99non99digit > cWeight=any99non99digit > weight=9999 > wt=9999 > wt2=9999 > >NOTE: Results from INFILE any99.non99digit > cWeight=any99.non99digit > weight=99.99 > wt=9999 > wt2=99.99 > > >On 10/29/08, toby dunn <tobydunn@hotmail.com> wrote: >> Mike , >> >> >> Well glad Data _Null_ got to the compress function modifiers. >> >> This will get you there with few strokes: >> >> X = Input( Compress( Y , , 'dk' ) , Best. ) ; >> >> >> >> >> Toby Dunn "Don't bail. The best gold is at the bottom of barrels of crap." >> Randy Pausch >> >> "Be prepared. Luck is where preparation meets opportunity." >> Randy Pausch >> >> ---------------------------------------- >> > Date: Wed, 29 Oct 2008 21:39:26 -0400 >> > From: msz03@ALBANY.EDU >> > Subject: Re: how to clean characters from number data >> > To: SAS-L@LISTSERV.UGA.EDU >> > >> > hi ... well, that's really a 'no letter of the alphabet solution' not a 'no non-digit solution' >> > >> > even better, based on data _null_'s idea of "hey, don't forget the new capabilities of COMPRESS" >> > use a 3rd argument ("k") to KEEP the characters in the list, just digits >> > so it's another "any digit" solution >> > >> > data have; >> > input x $20.; >> > datalines; >> > 112 >> > 130kg >> > 100 kg >> > 123kilograms >> > any99non99digit >> > 123(kg) >> > 123kg. >> > this->123<-is kilograms >> > so are these (123) >> > ; >> > run; >> > >> > data want; >> > set have (rename=(x=y)); >> > x = input(compress(y,'0123456789',"k"),best.); >> > keep x; >> > run; >> > >> > >> > Obs x >> > 1 112 >> > 2 130 >> > 3 100 >> > 4 123 >> > 5 9999 >> > 6 123 >> > 7 123 >> > 8 123 >> > 9 123 >> > >> > -- >> > Mike Zdeb >> > U@Albany School of Public Health >> > One University Place >> > Rensselaer, New York 12144-3456 >> > P/518-402-6479 F/630-604-1475 >> > >> > > Not too hard for compress, just a bit long:-) >> > > >> > > data want; >> > > set have; >> > > y = input(compress(upcase(x),'ABCDEFGHIJKLMNOPQRSTUVWXYZ '),best.); >> > > run; >> > > >> > > >> > > On Wed, 29 Oct 2008 17:16:29 -0400, Mike Zdeb <msz03@ALBANY.EDU> wrote: >> > > >> > >>hi ... agreed, but ... >> > >> >> > >>#1 I wanted an 'any digit' solution and that's hard (possible?) with one >> > > COMPRESS or one TRANWRD >> > >>#2 I wanted a numeric X >> > >> >> > >>so, making PRXPARSE 'sparse' ... >> > >> >> > >>data have; >> > >>input x $15.; >> > >>datalines; >> > >>112 >> > >>130kg >> > >>100 kg >> > >>123kilogams >> > >>any99non99digit >> > >>; >> > >>run; >> > >> >> > >>data want; >> > >>set have (rename=(x=y)) end=done; >> > >>x = input(prxchange("s/\D//",-1,y),best.); >> > >>keep x; >> > >>run; >> > >> >> > >>-- >> > >>Mike Zdeb >> > >>U@Albany School of Public Health >> > >>One University Place >> > >>Rensselaer, New York 12144-3456 >> > >>P/518-402-6479 F/630-604-1475 >> > >> >> > >>> Mike , >> > >>> >> > >>> It gets simplier than that with RegEx: >> > >>> >> > >>> Untest: >> > >>> >> > >>> Data Need ; >> > >>> Set Have ; >> > >>> >> > >>> X = PrxChange( 's/^(.*)kg$/$1/io' , 1 , X ) ; >> > >>> >> > >>> Run ; >> > >>> >> > >>> >> > >>> However, since compress or tranwrd could do this and they are SAS >> > >>> functions, Id go with those given the overhead that calling Perl would >> > >>> take. SAS functions work fast than RegEx, up to the point where one >> > > would >> > >>> have to call mulitple SAS functions to accomplish the same task as one >> > >>> call of Perl RegEx. >> > >>> >> > >>> >> > >>> >> > >>> >> > >>> On Wed, 29 Oct 2008 16:29:54 -0400, Mike Zdeb <msz03@ALBANY.EDU> wrote: >> > >>> >> > >>>>hi ... >> > >>>>from my limited knowledge of Perl regular expressions >> > >>>>a way to remove any non-digits >> > >>>> >> > >>>>data have; >> > >>>>input x : $15.; >> > >>>>datalines; >> > >>>>112 >> > >>>>130kg >> > >>>>100 kg >> > >>>>123kilogams >> > >>>>any99non99digit >> > >>>>; >> > >>>>run; >> > >>>> >> > >>>>data want; >> > >>>>* change non-digits to blanks; >> > >>>>fix = prxparse("s/\D//"); >> > >>>>do until (done); >> > >>>> set have (rename=(x=y)) end=done; >> > >>>>* create a numeric variable from the cleaned up original value; >> > >>>> x = input(prxchange(fix,-1,y),best.); >> > >>>> output; >> > >>>>end; >> > >>>>keep x; >> > >>>>run; >> > >>>> >> > >>>>gives ... >> > >>>> >> > >>>>Obs x >> > >>>> 1 112 >> > >>>> 2 130 >> > >>>> 3 100 >> > >>>> 4 123 >> > >>>> 5 9999 >> > >>>> >> > >>>> >> > >>>>-- >> > >>>>Mike Zdeb >> > >>>>U@Albany School of Public Health >> > >>>>One University Place >> > >>>>Rensselaer, New York 12144-3456 >> > >>>>P/518-402-6479 F/630-604-1475 >> > >>>> >> > >>>>> Hi SASLs, >> > >>>>> >> > >>>>> I have a large datasets. One variable is weight. Most data were input >> > >>> as number, but some data were input as number with 'kg'. So the data are >> > >>> saved >> > >>>>> as character data. How could I clean all data with 'kg', and convert >> > >>> them to numeric data? >> > >>>>> >> > >>>>> Data have, >> > >>>>> 112 >> > >>>>> 130kg >> > >>>>> 100 kg >> > >>>>> >> > >>>>> data want, >> > >>>>> 112 >> > >>>>> 130 >> > >>>>> 100 >> > >>>>> >> > >>>>> Thanks much! >> > >>>>> >> > >>>>> Jane >> > >>>>> >> > >>>>> >> > >>>>> >> > >>>>> >> > >>>>> >> > >>> >> > > >> >> _________________________________________________________________ >> Store, manage and share up to 5GB with Windows Live SkyDrive. >> http://skydrive.live.com/welcome.aspx?provision=1? ocid=TXT_TAGLM_WL_skydrive_102008 >>


Back to: Top of message | Previous page | Main SAS-L page