LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (March 2005, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Tue, 15 Mar 2005 16:24:55 -0500
Reply-To:     "Richard A. DeVenezia" <radevenz@IX.NETCOM.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "Richard A. DeVenezia" <radevenz@IX.NETCOM.COM>
Subject:      Re: What is the regular expression for SAS format name?
Comments: To: "Chang Y. Chung" <chang_y_chung@HOTMAIL.COM>
Content-Type: text/plain; charset="Windows-1252"

Chang Y. Chung wrote: > lzhang9830@yahoo.com wrote: >> Does anyone can show me a right regular expression for all kinds >> of SAS format names or tell me where I can find it? Thanks a lot. > > > Richard A. DeVenezia <radevenz@IX.NETCOM.COM> wrote: >> In v9, a Perl regex pattern would have ^[_a-zA-Z]\w{0,31}$ . Be >> aware that SAS character variable values are padded with spaces, so >> in SAS those have to be accounted for with a \s*$. > > Hi, > > I think the question was about checking valid format names. In 9.1.3. > the format name can be up to 32 character long, starting with an > underscore or a letter. If the format is of a character type, then it > should start with a dollar sign followed by up to 31 letters, digits > or underscores. > > In addition, a format name cannot end with a digit because: > > proc format; > value A3222222222234f low-high = "OK"; > value A32322222222222 low-high = "NOT OK"; > run; > /* on log -- modified > NOTE: Format A3222222222234F has been output. > ERROR: The format name A32322222222222 ends in a number, which is > invalid. */ > > In PRX, this will be expressed as: > > /([A-Za-z_])\w(\w{0,29}[A-Za-z_]) | \$\1\2/ > > With \1 refers back to ([A-Za-z_]) and \2 (\w{0,29}[A-Za-z_]), like > below. > > Cheers, > Chang > > data _null_; > if _n_ = 1 then do; > retain prxFmt; > prxFmt = prxparse('/([A-Za-z_])(\w{0,30}[A-Za-z_]) | \$\1\2/'); > end; > > input name :$50.; > > if prxmatch(prxFmt, name) then do; > put " OK: " name=; > end; else do; > put "NOT OK: " name=; > end; > > cards; > _ > thisis > t3 > _3 > 1234 > A3222222222234 > ; > run; > /* > NOT OK: name=_ > OK: name=thisis > NOT OK: name=t3 > NOT OK: name=_3 > NOT OK: name=1234 > NOT OK: name=A3222222222234 > */

Chang:

I think your pattern is missing some endpoint conditions, specifically one character format names. The following is more robust, but for the sake of me, I could not get the \1 backticking to perform as I supposed it would. (e.g. when I replace a later ([A-Za-z_]) with \1 to resuse the first subpattern, the matches dont come out as expected)

------------------- proc format ;

value _ 0='Zero'; value __ 0='Zero'; value $__ 'z'='Zero'; value _a 0='Zero'; value a_ 0='Zero'; value _1a 0='Zero'; value a1a 0='Zero'; value _234567890123456789012345678901_ 0 = 'zero'; value $_23456789012345678901234567890_ 0 = 'zero'; run;

data _null_;

format a _234567890123456789012345678901_12.3; do a = -1 to 1; put a=; end;

run;

data _null_;

pattern= '/' || '^([A-Za-z_])\s*$' || '|' || '^([A-Za-z_])\w{0,30}([A-Za-z_])\s*$' || '|' || '^\$([A-Za-z_])\s*$' || '|' || '^\$([A-Za-z_])\w{0,29}([A-Za-z_])\s*$' || '/' ;

put pattern=;

rx = prxparse (pattern);

length name $40; do until (0);

infile cards truncover eof=done; input name $char40.;

valid = prxmatch (rx, name);

put valid= name;

end;

done: stop;

datalines;;;; _ 1_ a_ a_1 a_a a1a a1a1 huh? 1a1a $_ $1_ $a_ $a_1 $a_a $a1a $a1a1 sauce $1a1a _234567890123456789012345678901_z _234567890123456789012345678901_ ;;;; run; -------------------

Richard A. DeVenezia


Back to: Top of message | Previous page | Main SAS-L page