Date: Wed, 4 Jul 2007 15:08:44 -0400
Reply-To: Sigurd Hermansen <HERMANS1@WESTAT.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Sigurd Hermansen <HERMANS1@WESTAT.COM>
Subject: Re: Reducing a set of regex patterns used for matching?
In-Reply-To: <yacii.17$Tt.102501@news.sisna.com>
Content-Type: text/plain; charset="us-ascii"
Richard:
Interesting question ... The degree of sensitivity of a pattern i
relative to pattern j in these examples shows up in whether pattern i
matches to pattern j or vice-versa; for example, pattern 5 dominates the
other patterns on sensitivity, while pattern 0 dominates the other
patterns on specificity:
data test;
input @1 ptri 1. @5 string $char29. ;
cards;
0 the three little pigs (title)
1 the.*?three.*?pig
2 little.*?pig
3 the.*?pig
4 e.*?little
5 e.*?i
;
run;
%macro testPattern(__index);
%put index=&__index;
proc sql noprint;
select trim(string) into :__string from test where ptri=&__index
;
quit;
%put &__string;
data patternMatch ;
retain rxid;
pattern="&__index";
rxid=prxParse("/%trim(&__string)/");
if missing(rxid) then do;
putlog 'ERROR: malformed regex';
stop;
end;
set test;
if (ptri^=&__index) and prxMatch(rxid,trim(string)) then output;
run;
%mend testPattern;
%testPattern(5)
I have to wonder whether this test works correctly in general for perl
regular expressions. It would probably fail to match some equivalent
patterns.
S
-----Original Message-----
From: owner-sas-l@listserv.uga.edu [mailto:owner-sas-l@listserv.uga.edu]
On Behalf Of Richard A. DeVenezia
Sent: Monday, July 02, 2007 3:14 PM
To: sas-l@uga.edu
Subject: Reducing a set of regex patterns used for matching?
Suppose you are given a set of simplistic regex patterns (case
insensitive and contain only .*? wildcarding) that are used for postive
assertion in a broader associative mapping context.
0 the three little pigs (title)
1 the.*?three.*?pig
2 little.*?pig
3 the.*?pig
4 e.*?little
Is there a way to programmatically prune the set of filters 1-4 ?
For instance 1 could be removed because 3 would match everything 1
would.
--
Richard A. DeVenezia