Date: Mon, 12 Aug 2002 09:31:20 -0400
Reply-To: "Fehd, Ronald J. (PHPPO)" <rjf2@CDC.GOV>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Fehd, Ronald J. (PHPPO)" <rjf2@CDC.GOV>
Subject: Re: how does word scanner tokenize single vs double quotes?
Content-Type: text/plain; charset="iso-8859-1"
> From: Quentin McMullen [mailto:QuentinMcMullen@WESTAT.COM]
> Another Word Scanner question:
> In answer to a common question,
> people often write something like
> "macro variables do not resolve inside single quotes",
> which explains the following:
> 100 %let state=DC; /*subliminal message pro DC Statehood! No
> Taxation w/o
> 101 data a;
> 102 state1="I Live in &state";
> 103 state2='I Live in &state';
> 104 put state1=;
> 105 put state2=;
> 106 run;
> state1=I Live in DC
> state2=I Live in &state
> I'm wondering *how* folks think this mechanism works.
> My understanding is that the word scanner is reading in the code,
> character by character, building tokens,
> and passing each token
you're using the work 'token' to mean both macro and SAS tokens.
macro variables and macro calls are macro tokens.
each may contain, i.e. resolve to,
either part of a, a whole, or many, SAS tokens.
> off to either the SAS compiler or the macro processor.
ultimately macro resolution is going to come back
and be placed on the stack going to the SAS compiler.
consider the verb: side-tracked.
> An & or % triggers the macro processor.
Whether you want to visualize that the input stream
is switching back and forth between the two,
or consider it as a two-pass operation:
all macro tokenization and resolution first
> So how does the word scanner see the first & but not the second? One
> explanation could be that the word scanner has a rule, don't
> look inside single quotes for macro triggers.
that's a negative way of stating that a string is enclosed in quotes,
where quotes means either squotes or dquotes.
See RTFM discussion of Character Constants.
> An alternative explanation, and I think this is the one I
> like better, is that it's all about tokenization.
> By this explanation, a double quote is itself a token.
uh, yes, but how about consider it as a trigger or flag?
> A single quote determines the start and end of a token.
you can remove the word 'single' from the last statement.
> So by this reasoning the word scanner would tokenize line 102
> as (brackets around tokens):
> 102 [state1][=]["][I][ ][Live][ ][in][ ][&][state]["][;]
this may seem conceptually accurate;
but I'll say: "Too many tokens!"
see next comment
> The first 9 tokens would be sent to the SAS compiler one-by-one.
9? don't think so. a character constant is a token.
Consider deleting the adjective phrase 'first 9'
> The & token would trigger the macro processor
> The macro variable named state would be resolved.
> The text DC would be written,
... into the character constant
and passed forward to the SAS tokenizer,
which would then go
> to the top of the input stack
you're using tokenization to mean two different actions.
1. macro tokenization and resolution
2. SAS tokenization, and placement on compiler stack
> Line 102 would be tokenized
**** emphasis on the word 'line'
[by the macro processor]
> 102 [state2 = "I Live in ] [&state] [";]
and the macro processor would return:
> 102 [state2 = "I Live in DC";]
then we would have no problem with the four tokens:
assignment operator (=)
closure of assignment statement (;)
> Line 103 would be tokenized as:
> 103 [state2][=]['I Live in &state'][;]
> Since there is no & token, all four tokens would be sent to
> the SAS compiler one-by-one.
> As mentioned in my earlier post, I like the idea of a simple
> word scanner.
> To me, it seems that this second explanation (word scanner
> just tokenizes
> and passes tokens off, an & or % token triggers macro
> processor), is more
> parsimonious than the first (word scanner tokenizes, an & or % token
> triggers macro processor, but scanner doesn't look for macro
> triggers inside single quotes).
> But I don't know anything about tokenization.
Ron Fehd the macro maven CDC Atlanta GA USA RJF2@cdc.gov
By using your intelligence
you can sometimes make your problems twice as complicated.
-- Ashleigh Brilliant
Bureaucracy at its worst is better than bureaucracy at its best
-- Plato, in Beetle Bailey comic strip, 1985Mar01
... similar to tokenization ...