| Date: | Thu, 30 Oct 2008 10:19:07 -0400 |
| Reply-To: | Toby Dunn <tobydunn@HOTMAIL.COM> |
| Sender: | "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> |
| From: | Toby Dunn <tobydunn@HOTMAIL.COM> |
| Subject: | Re: how to clean characters from number data |
|
Data _Null_,
Yeah I didnt think that far ahead to worry with the decimals. But as you
showed its easy to keep them in. Id agree that the Compress function isnt
the best for all situations but in this case id stick my my statement of
droping the Perl RegEx and sticking to the compress function as it is more
readable ans definitly scales up better.
Toby Dunn
On Thu, 30 Oct 2008 07:28:02 -0500, ./ ADD NAME=Data _null_,
<iebupdte@GMAIL.COM> wrote:
>'DK' will toss the decimal, could be specified in arg2 see below.
>However I'm not sure either compress method will produce desirable
>results in all situations.
>
>1181 data _null_;
>1182 input cWeight :$20.;
>1183 weight = input(compress(cWeight,,'AI'),F8.);
>1184 wt = Input(Compress(cWeight,,'dk'),Best.);
>1185 wt2 = Input(Compress(cWeight,'.','dk'),Best.);
>1186 put / 'NOTE: Results from INFILE ' _infile_ @;
>1187 put (_all_)(/= 'NOTE- ');
>1188 cards;
>
>
>NOTE: Results from INFILE 112.1
> cWeight=112.1
> weight=112.1
> wt=1121
> wt2=112.1
>
>NOTE: Results from INFILE 130kg
> cWeight=130kg
> weight=130
> wt=130
> wt2=130
>
>NOTE: Results from INFILE 100.5 kg
> cWeight=100.5
> weight=100.5
> wt=1005
> wt2=100.5
>
>NOTE: Results from INFILE 123kilogams
> cWeight=123kilogams
> weight=123
> wt=123
> wt2=123
>
>NOTE: Results from INFILE any99non99digit
> cWeight=any99non99digit
> weight=9999
> wt=9999
> wt2=9999
>
>NOTE: Results from INFILE any99.non99digit
> cWeight=any99.non99digit
> weight=99.99
> wt=9999
> wt2=99.99
>
>
>On 10/29/08, toby dunn <tobydunn@hotmail.com> wrote:
>> Mike ,
>>
>>
>> Well glad Data _Null_ got to the compress function modifiers.
>>
>> This will get you there with few strokes:
>>
>> X = Input( Compress( Y , , 'dk' ) , Best. ) ;
>>
>>
>>
>>
>> Toby Dunn "Don't bail. The best gold is at the bottom of barrels of
crap."
>> Randy Pausch
>>
>> "Be prepared. Luck is where preparation meets opportunity."
>> Randy Pausch
>>
>> ----------------------------------------
>> > Date: Wed, 29 Oct 2008 21:39:26 -0400
>> > From: msz03@ALBANY.EDU
>> > Subject: Re: how to clean characters from number data
>> > To: SAS-L@LISTSERV.UGA.EDU
>> >
>> > hi ... well, that's really a 'no letter of the alphabet solution' not
a 'no non-digit solution'
>> >
>> > even better, based on data _null_'s idea of "hey, don't forget the
new capabilities of COMPRESS"
>> > use a 3rd argument ("k") to KEEP the characters in the list, just
digits
>> > so it's another "any digit" solution
>> >
>> > data have;
>> > input x $20.;
>> > datalines;
>> > 112
>> > 130kg
>> > 100 kg
>> > 123kilograms
>> > any99non99digit
>> > 123(kg)
>> > 123kg.
>> > this->123<-is kilograms
>> > so are these (123)
>> > ;
>> > run;
>> >
>> > data want;
>> > set have (rename=(x=y));
>> > x = input(compress(y,'0123456789',"k"),best.);
>> > keep x;
>> > run;
>> >
>> >
>> > Obs x
>> > 1 112
>> > 2 130
>> > 3 100
>> > 4 123
>> > 5 9999
>> > 6 123
>> > 7 123
>> > 8 123
>> > 9 123
>> >
>> > --
>> > Mike Zdeb
>> > U@Albany School of Public Health
>> > One University Place
>> > Rensselaer, New York 12144-3456
>> > P/518-402-6479 F/630-604-1475
>> >
>> > > Not too hard for compress, just a bit long:-)
>> > >
>> > > data want;
>> > > set have;
>> > > y = input(compress(upcase(x),'ABCDEFGHIJKLMNOPQRSTUVWXYZ '),best.);
>> > > run;
>> > >
>> > >
>> > > On Wed, 29 Oct 2008 17:16:29 -0400, Mike Zdeb <msz03@ALBANY.EDU>
wrote:
>> > >
>> > >>hi ... agreed, but ...
>> > >>
>> > >>#1 I wanted an 'any digit' solution and that's hard (possible?)
with one
>> > > COMPRESS or one TRANWRD
>> > >>#2 I wanted a numeric X
>> > >>
>> > >>so, making PRXPARSE 'sparse' ...
>> > >>
>> > >>data have;
>> > >>input x $15.;
>> > >>datalines;
>> > >>112
>> > >>130kg
>> > >>100 kg
>> > >>123kilogams
>> > >>any99non99digit
>> > >>;
>> > >>run;
>> > >>
>> > >>data want;
>> > >>set have (rename=(x=y)) end=done;
>> > >>x = input(prxchange("s/\D//",-1,y),best.);
>> > >>keep x;
>> > >>run;
>> > >>
>> > >>--
>> > >>Mike Zdeb
>> > >>U@Albany School of Public Health
>> > >>One University Place
>> > >>Rensselaer, New York 12144-3456
>> > >>P/518-402-6479 F/630-604-1475
>> > >>
>> > >>> Mike ,
>> > >>>
>> > >>> It gets simplier than that with RegEx:
>> > >>>
>> > >>> Untest:
>> > >>>
>> > >>> Data Need ;
>> > >>> Set Have ;
>> > >>>
>> > >>> X = PrxChange( 's/^(.*)kg$/$1/io' , 1 , X ) ;
>> > >>>
>> > >>> Run ;
>> > >>>
>> > >>>
>> > >>> However, since compress or tranwrd could do this and they are SAS
>> > >>> functions, Id go with those given the overhead that calling Perl
would
>> > >>> take. SAS functions work fast than RegEx, up to the point where
one
>> > > would
>> > >>> have to call mulitple SAS functions to accomplish the same task
as one
>> > >>> call of Perl RegEx.
>> > >>>
>> > >>>
>> > >>>
>> > >>>
>> > >>> On Wed, 29 Oct 2008 16:29:54 -0400, Mike Zdeb <msz03@ALBANY.EDU>
wrote:
>> > >>>
>> > >>>>hi ...
>> > >>>>from my limited knowledge of Perl regular expressions
>> > >>>>a way to remove any non-digits
>> > >>>>
>> > >>>>data have;
>> > >>>>input x : $15.;
>> > >>>>datalines;
>> > >>>>112
>> > >>>>130kg
>> > >>>>100 kg
>> > >>>>123kilogams
>> > >>>>any99non99digit
>> > >>>>;
>> > >>>>run;
>> > >>>>
>> > >>>>data want;
>> > >>>>* change non-digits to blanks;
>> > >>>>fix = prxparse("s/\D//");
>> > >>>>do until (done);
>> > >>>> set have (rename=(x=y)) end=done;
>> > >>>>* create a numeric variable from the cleaned up original value;
>> > >>>> x = input(prxchange(fix,-1,y),best.);
>> > >>>> output;
>> > >>>>end;
>> > >>>>keep x;
>> > >>>>run;
>> > >>>>
>> > >>>>gives ...
>> > >>>>
>> > >>>>Obs x
>> > >>>> 1 112
>> > >>>> 2 130
>> > >>>> 3 100
>> > >>>> 4 123
>> > >>>> 5 9999
>> > >>>>
>> > >>>>
>> > >>>>--
>> > >>>>Mike Zdeb
>> > >>>>U@Albany School of Public Health
>> > >>>>One University Place
>> > >>>>Rensselaer, New York 12144-3456
>> > >>>>P/518-402-6479 F/630-604-1475
>> > >>>>
>> > >>>>> Hi SASLs,
>> > >>>>>
>> > >>>>> I have a large datasets. One variable is weight. Most data were
input
>> > >>> as number, but some data were input as number with 'kg'. So the
data are
>> > >>> saved
>> > >>>>> as character data. How could I clean all data with 'kg', and
convert
>> > >>> them to numeric data?
>> > >>>>>
>> > >>>>> Data have,
>> > >>>>> 112
>> > >>>>> 130kg
>> > >>>>> 100 kg
>> > >>>>>
>> > >>>>> data want,
>> > >>>>> 112
>> > >>>>> 130
>> > >>>>> 100
>> > >>>>>
>> > >>>>> Thanks much!
>> > >>>>>
>> > >>>>> Jane
>> > >>>>>
>> > >>>>>
>> > >>>>>
>> > >>>>>
>> > >>>>>
>> > >>>
>> > >
>>
>> _________________________________________________________________
>> Store, manage and share up to 5GB with Windows Live SkyDrive.
>> http://skydrive.live.com/welcome.aspx?provision=1?
ocid=TXT_TAGLM_WL_skydrive_102008
>>
|