LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous (more recent) messageNext (less recent) messagePrevious (more recent) in topicNext (less recent) in topicPrevious (more recent) by same authorNext (less recent) by same authorPrevious page (August 2002, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Mon, 12 Aug 2002 10:18:24 -0400
Reply-To:     Quentin McMullen <QuentinMcMullen@WESTAT.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Quentin McMullen <QuentinMcMullen@WESTAT.COM>
Subject:      Re: how does word scanner tokenize single vs double quotes?
Comments: To: John Whittington <John.W@MEDISCIENCE.CO.UK>
Content-Type: text/plain; charset="iso-8859-1"

John Whittington [mailto:John.W@MEDISCIENCE.CO.UK] wrote (in part): > At 17:46 09/08/02 -0400, Quentin McMullen wrote (in part): > > >[snip] > >101 data a; > >102 state1="I Live in &state"; > >103 state2='I Live in &state'; > >104 put state1=; > >105 put state2=; > >106 run; > > > >state1=I Live in DC > >state2=I Live in &state > >NOTE: The data set WORK.A has 1 observations and 2 variables. > > > >I'm wondering *how* folks think this mechanism works. My > understanding is > >that the word scanner is reading in the code, character by character, > >building tokens, and passing each token off to either the > SAS compiler or > >the macro processor. An & or % triggers the macro processor. > > > >So how does the word scanner see the first & but not the second? One > >explanation could be that the word scanner has a rule, don't > look inside > >single quotes for macro triggers. > > > >An alternative explanation, and I think this is the one I > like better, is > >that it's all about tokenization. By this explanation, a > double quote is > >itself a token. A single quote determines the start and > end of a token. > > > >So by this reasoning ... > >[snip] .... > >As mentioned in my earlier post, I like the idea of a simple > word scanner. > >To me, it seems that this second explanation .... is more > parsimonious > >than the first .... > > Quentin, whilst, at first sight, what you wrote sounds very > reasonable, on > deeper consideration, I'm not so sure that these two possible > explanations > are quite as different as you imply. It seems that you would > be less than > totally happy with the idea of a rule which said "...don't look inside > single quotes for macro triggers" ('the first explanation'), yet your > second possible explanation seems to involve an implicit rule which > effectively says "...don't look inside single quotes for > ANYTHING - merely > pass the entire quoted string as a single token". I'm not > convinced they > are conceptually all that different. > <snip>

Hi Dr. John,

No, conceptually not all that different, but I still liked the simplicity of my imagined (and I am now learning incorrectly imagined), second rule.

I guess I had imagined the first rule as: a) tokenize the string b) a token that is an & or % followed by a alpha character triggers macro c) don't look for & or % inside single quoted strings

The second rule as a) tokenize the string b) a token that is an & or % followed by a alpha character triggers macro

So rather than having a separate rule for single quotes, the process could be defined in terms of the tokenization rules alone. I guess it was just shifting the rules for single vs double quotes from being a macro trigger rule to being a tokenization rule.

But as I say, I'm coming to think that in fact the first rule is what happens, and my confusion results from not understanding tokenizing. A kind *birdie* tells me that "As for & and % in quoted strings, there are a bunch of flags that tell the word scanner whether to suppress a call" to the macro processor.

This leads me to more questions about tokenization of strings with double quotes.

In a private reply, Christoph Edel pointed out that SAS (and Rick Aster : ) define a quoted string (single or double quotes) to be a single token. So If I submit "x = &state", what do we think happens?

One option would be for the word scanner to build the entire token "x = &state" then pass it to the macro processor, it then resolves &state, and then places "x = DC" on top of the input stack. The second option is that the word scanner starts building the token, when it sees the & followed by a character it stops building the token and triggers the macro processor. Then after the macro processor has resolved &state it continues reading from the input stack, which now has DC siting on top, and finishes building the token.

It seems to me that this second option matches better with my understanding. And the ability of the word scanner to stop building tokens "mid-stream" can be showwn by the successful execution of the following:

data x; x=1; run; %let mvar=oc pr;

pr&mvar.int data=x; run;

I suppose one mistake in my thinking was thinking of the word scanner as tokenizing the code from the input stack (which may combine SAS code and macro code). In which case I would want it to tokenize the above as something like:

[pr][&mvar.][int] [data][=][x][;]

Which doesn't look good. I suppose I'd be better off thinking of the word scanner as tokenizing the SAS code. And when it hits a macro trigger it happily waits to see if any SAS code is produced, then just keeps tokenizing SAS code.

Kind Regards, --Quentin


Back to: Top of message | Previous page | Main SAS-L page