Date: Thu, 23 Dec 1999 11:34:01 -0500
Reply-To: WHITLOI1 <WHITLOI1@WESTAT.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: WHITLOI1 <WHITLOI1@WESTAT.COM>
Subject: Re: Number of words in a string ?
Content-Type: text/plain; charset=US-ASCII
Subject: Re: Number of words in a string ?
Summary: Macro function to count words in string without looping
and without 200 byte limit in V6.12.
Respondent: Ian Whitlock <whitloi1@westat.com>
I was goaded by Fabrizio Carinci's <carinci@CMNS.MNEGRI.IT> response
to this common question.
> Dear Paul and SAS-Lers,
> I do agree with your statements, but to the best of my knowledge
> such a
> question still remains unsolved: how to count the number of words
> in a
> string.....by:
> 1) avoiding 200 length limitation of character variables in a
> data step
> 2) avoiding a do loop in macro code (may be very slow if called
> multiple times)
> 3) allowing one **and more** spaces as simultaneous valid
> delimiters.
>
> I did search in SAS-L archives (gopher and deja-vu) and there
> seems to be
> no such solution, but I would be glad if someone will prove me
> wrong !
Paul answered the common question for a word count function with
> 1 %macro nwords(s);
> 2 (compress(&s) ne ' ') *
> 3 (length(left(compbl(&s)))-length(compress(&s))+1)
> 4 %mend nwords;
> 5
> 6 data _null_;
> 7 string = "dasj h 07y lk0 - ldsmv";
> 8 nwords = %nwords(string);
> 9 put nwords=;
> 10 run;
> NWORDS=6
(I think several other SAS-L people helped Paul along the road to
this solution, on the last round of this question, but I didn't
check it.)
In a private response to me Fabrizo explained:
> Ian,
> that's fine to fix the length of a macro variable.
> To be clear, let us imagine to have a macro situation where we
> have:
> %RUNMAK(V_A V_B V_C V_D V_E V_F V_G V_H V_J V_K V_L V_M
> V_A V_B V_C V_D V_E V_F V_G V_H V_J V_K V_L V_M
> V_A V_B V_C V_D V_E V_F V_G V_H V_J V_K V_L V_M
> .......)
> (the letters are to underline there is no numbered list)
>
> I would like to have:
>
> %MACRO RUNMAK(var);
> %let n_vars=%NWORDS(&var);
> %MEND RUNMAK;
>
> a macro by SI suggests a way to do it by a %do loop, which is
> exremely slow for very long lists !
I replied, add %SYSFUNCs to Paul's solution, but the answer is not
as simple as that. V6.12 will allow DATA step functions to work on
character strings up to 32K long in macro, but it will not allow
a single word longer than 200 bytes. Hence one cannot use the
COMPRESS function to take out separators on long strings.
Here is the macro
%macro nwords(s);
%local onesp len ;
%if &s = %str() %then 0 ;
%else
%do ;
%let onesp =
%qsysfunc(trim(%qsysfunc(left(%qsysfunc(compbl(&s)))))) ;
%let len =
%qsysfunc(compress(&onesp,ABCDEFGHIJKLMNOPQRSTUVWXYZ012345679_)) ;
%if &len = &onesp %then 0 ;
%else %eval(%length(&len)+1) ;
%end ;
%mend nwords;
For comparison's sake I used
%macro wordcnt(s) ;
%local i ;
%let i = 1 ;
%do %while (%qscan(&s,&i)^=%str()) ;
%let i = %eval(&i+1) ;
%end ;
%eval(&i-1)
%mend wordcnt ;
Here is the log showing a time factor of 73 between the two methods.
726 %let beg = %sysfunc(time());
727 %RUNMAK(&vlist)
n_vars=564
728 %put %sysevalf(%sysfunc(time())-&beg) ;
0.15999999990162
729
730 %let beg = %sysfunc(time());
731 %RUNMAK2(&vlist)
n_vars=564
732 %put %sysevalf(%sysfunc(time())-&beg) ;
11.75
In version 8 I could use code closer Paul's original suggestion
%macro nwords(s);
%local crect onesp nosp ;
%if %length(&s) = 0 %then 0 ;
%else
%do ;
%let crect = %eval(%qsysfunc(compress(&s)) ne %str( )) ;
%let onesp =
%qsysfunc(trim(%qsysfunc(left(%qsysfunc(compbl(&s)))))) ;
%let nosp = %qsysfunc(compress(&s)) ;
%eval(&crect*(%length(&onesp)-%length(&nosp))+1)
%end ;
%mend nwords;
Here is the log.
1350 %let beg = %sysfunc(time());
1351 %RUNMAK(&vlist)
n_vars=564
1352 %put %sysevalf(%sysfunc(time())-&beg) ;
0.1099998951031
1353
1354 %let beg = %sysfunc(time());
1355 %RUNMAK2(&vlist)
n_vars=564
1356 %put %sysevalf(%sysfunc(time())-&beg) ;
6.53999996179482
Note the big improvement in the loop method for V8, but it still loses
by a factor of 60. Also note the need for the TRIM function to handle
macro quoted strings because %LENGTH counts blanks, and the need to
protect the DATA step function from the empty string.
Ian Whitlock