Date: Mon, 12 Aug 2002 10:18:24 -0400
Reply-To: Quentin McMullen <QuentinMcMullen@WESTAT.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Quentin McMullen <QuentinMcMullen@WESTAT.COM>
Subject: Re: how does word scanner tokenize single vs double quotes?
Content-Type: text/plain; charset="iso-8859-1"
John Whittington [mailto:John.W@MEDISCIENCE.CO.UK] wrote (in part):
> At 17:46 09/08/02 -0400, Quentin McMullen wrote (in part):
>
> >[snip]
> >101 data a;
> >102 state1="I Live in &state";
> >103 state2='I Live in &state';
> >104 put state1=;
> >105 put state2=;
> >106 run;
> >
> >state1=I Live in DC
> >state2=I Live in &state
> >NOTE: The data set WORK.A has 1 observations and 2 variables.
> >
> >I'm wondering *how* folks think this mechanism works. My
> understanding is
> >that the word scanner is reading in the code, character by character,
> >building tokens, and passing each token off to either the
> SAS compiler or
> >the macro processor. An & or % triggers the macro processor.
> >
> >So how does the word scanner see the first & but not the second? One
> >explanation could be that the word scanner has a rule, don't
> look inside
> >single quotes for macro triggers.
> >
> >An alternative explanation, and I think this is the one I
> like better, is
> >that it's all about tokenization. By this explanation, a
> double quote is
> >itself a token. A single quote determines the start and
> end of a token.
> >
> >So by this reasoning ...
> >[snip] ....
> >As mentioned in my earlier post, I like the idea of a simple
> word scanner.
> >To me, it seems that this second explanation .... is more
> parsimonious
> >than the first ....
>
> Quentin, whilst, at first sight, what you wrote sounds very
> reasonable, on
> deeper consideration, I'm not so sure that these two possible
> explanations
> are quite as different as you imply. It seems that you would
> be less than
> totally happy with the idea of a rule which said "...don't look inside
> single quotes for macro triggers" ('the first explanation'), yet your
> second possible explanation seems to involve an implicit rule which
> effectively says "...don't look inside single quotes for
> ANYTHING - merely
> pass the entire quoted string as a single token". I'm not
> convinced they
> are conceptually all that different.
>
<snip>
Hi Dr. John,
No, conceptually not all that different, but I still liked the simplicity of
my imagined (and I am now learning incorrectly imagined), second rule.
I guess I had imagined the first rule as:
a) tokenize the string
b) a token that is an & or % followed by a alpha character triggers
macro
c) don't look for & or % inside single quoted strings
The second rule as
a) tokenize the string
b) a token that is an & or % followed by a alpha character triggers
macro
So rather than having a separate rule for single quotes, the process could
be defined in terms of the tokenization rules alone. I guess it was just
shifting the rules for single vs double quotes from being a macro trigger
rule to being a tokenization rule.
But as I say, I'm coming to think that in fact the first rule is what
happens, and my confusion results from not understanding tokenizing. A kind
*birdie* tells me that "As for & and % in quoted strings, there are a bunch
of flags that tell the word scanner whether to suppress a call" to the macro
processor.
This leads me to more questions about tokenization of strings with double
quotes.
In a private reply, Christoph Edel pointed out that SAS (and Rick Aster : )
define a quoted string (single or double quotes) to be a single token. So
If I submit "x = &state", what do we think happens?
One option would be for the word scanner to build the entire token "x =
&state" then pass it to the macro processor, it then resolves &state, and
then places "x = DC" on top of the input stack. The second option is that
the word scanner starts building the token, when it sees the & followed by a
character it stops building the token and triggers the macro processor.
Then after the macro processor has resolved &state it continues reading from
the input stack, which now has DC siting on top, and finishes building the
token.
It seems to me that this second option matches better with my understanding.
And the ability of the word scanner to stop building tokens "mid-stream" can
be showwn by the successful execution of the following:
data x; x=1; run;
%let mvar=oc pr;
pr&mvar.int data=x;
run;
I suppose one mistake in my thinking was thinking of the word scanner as
tokenizing the code from the input stack (which may combine SAS code and
macro code). In which case I would want it to tokenize the above as
something like:
[pr][&mvar.][int] [data][=][x][;]
Which doesn't look good. I suppose I'd be better off thinking of the word
scanner as tokenizing the SAS code. And when it hits a macro trigger it
happily waits to see if any SAS code is produced, then just keeps tokenizing
SAS code.
Kind Regards,
--Quentin