Date: Tue, 15 Mar 2005 16:24:55 -0500
Reply-To: "Richard A. DeVenezia" <radevenz@IX.NETCOM.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Richard A. DeVenezia" <radevenz@IX.NETCOM.COM>
Subject: Re: What is the regular expression for SAS format name?
Content-Type: text/plain; charset="Windows-1252"
Chang Y. Chung wrote:
> lzhang9830@yahoo.com wrote:
>> Does anyone can show me a right regular expression for all kinds
>> of SAS format names or tell me where I can find it? Thanks a lot.
>
>
> Richard A. DeVenezia <radevenz@IX.NETCOM.COM> wrote:
>> In v9, a Perl regex pattern would have ^[_a-zA-Z]\w{0,31}$ . Be
>> aware that SAS character variable values are padded with spaces, so
>> in SAS those have to be accounted for with a \s*$.
>
> Hi,
>
> I think the question was about checking valid format names. In 9.1.3.
> the format name can be up to 32 character long, starting with an
> underscore or a letter. If the format is of a character type, then it
> should start with a dollar sign followed by up to 31 letters, digits
> or underscores.
>
> In addition, a format name cannot end with a digit because:
>
> proc format;
> value A3222222222234f low-high = "OK";
> value A32322222222222 low-high = "NOT OK";
> run;
> /* on log -- modified
> NOTE: Format A3222222222234F has been output.
> ERROR: The format name A32322222222222 ends in a number, which is
> invalid. */
>
> In PRX, this will be expressed as:
>
> /([A-Za-z_])\w(\w{0,29}[A-Za-z_]) | \$\1\2/
>
> With \1 refers back to ([A-Za-z_]) and \2 (\w{0,29}[A-Za-z_]), like
> below.
>
> Cheers,
> Chang
>
> data _null_;
> if _n_ = 1 then do;
> retain prxFmt;
> prxFmt = prxparse('/([A-Za-z_])(\w{0,30}[A-Za-z_]) | \$\1\2/');
> end;
>
> input name :$50.;
>
> if prxmatch(prxFmt, name) then do;
> put " OK: " name=;
> end; else do;
> put "NOT OK: " name=;
> end;
>
> cards;
> _
> thisis
> t3
> _3
> 1234
> A3222222222234
> ;
> run;
> /*
> NOT OK: name=_
> OK: name=thisis
> NOT OK: name=t3
> NOT OK: name=_3
> NOT OK: name=1234
> NOT OK: name=A3222222222234
> */
Chang:
I think your pattern is missing some endpoint conditions, specifically one
character format names.
The following is more robust, but for the sake of me, I could not get the \1
backticking to perform as I supposed it would. (e.g. when I replace a later
([A-Za-z_]) with \1 to resuse the first subpattern, the matches dont come
out as expected)
-------------------
proc format ;
value _ 0='Zero';
value __ 0='Zero';
value $__ 'z'='Zero';
value _a 0='Zero';
value a_ 0='Zero';
value _1a 0='Zero';
value a1a 0='Zero';
value _234567890123456789012345678901_ 0 = 'zero';
value $_23456789012345678901234567890_ 0 = 'zero';
run;
data _null_;
format a _234567890123456789012345678901_12.3;
do a = -1 to 1;
put a=;
end;
run;
data _null_;
pattern=
'/'
|| '^([A-Za-z_])\s*$'
|| '|' || '^([A-Za-z_])\w{0,30}([A-Za-z_])\s*$'
|| '|' || '^\$([A-Za-z_])\s*$'
|| '|' || '^\$([A-Za-z_])\w{0,29}([A-Za-z_])\s*$'
|| '/'
;
put pattern=;
rx = prxparse (pattern);
length name $40;
do until (0);
infile cards truncover eof=done;
input name $char40.;
valid = prxmatch (rx, name);
put valid= name;
end;
done: stop;
datalines;;;;
_
1_
a_
a_1
a_a
a1a
a1a1 huh?
1a1a
$_
$1_
$a_
$a_1
$a_a
$a1a
$a1a1 sauce
$1a1a
_234567890123456789012345678901_z
_234567890123456789012345678901_
;;;;
run;
-------------------
Richard A. DeVenezia
|