LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous (more recent) messageNext (less recent) messagePrevious (more recent) in topicNext (less recent) in topicPrevious (more recent) by same authorNext (less recent) by same authorPrevious page (December 2001, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Mon, 10 Dec 2001 22:37:04 -0500
Reply-To:   Jay Stevens <jay@MEDIASHOWER.COM>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   Jay Stevens <jay@MEDIASHOWER.COM>
Subject:   Re: upper(trim(left(myvar)))
Comments:   To: "Dorfman, Paul" <Paul.Dorfman@bcbsfl.com>
In-Reply-To:   <100B8759152313-01@mail.bcbsfl.com>
Content-Type:   text/plain; charset="iso-8859-1"

Really and truly if we can just get that STRTRIM=YES in place, all of my issues re: this topic would be resolved. I understand all the internal issues you've covered here, I just want my STRTRIM=YES option.

heh.

Good discussion though.

Jay Stevens jay@whitehurst-associates.com

-----Original Message----- From: Dorfman, Paul [mailto:Paul.Dorfman@bcbsfl.com] Sent: Monday, December 10, 2001 7:43 PM To: 'Jay Stevens'; SAS-L@LISTSERV.UGA.EDU Subject: RE: upper(trim(left(myvar)))

> Ack! Man, Paul you can take an inch of yarn and turn it into > a sweater, socks and matching tophat can't you?

Jay,

No, I wish I could...

> Obviously I need to clarify or restate my original point: > > * I NEVER said that char vars should be default upper or > lower. I made the case pretty clearly that keeping case > sensitivity is necessary for data integrity (never made any > mention of syntax case sensitivity either).

I never said you did and agreed with you entirely on this point.

>> ... Paul's General Theory of Everything ...

> I think this sums up the misunderstanding of my point. My > issue IS one of syntax and usage and NOT of underlying > design.

Let us define clearly what we are talking about. The issue of trimming by default or not can be solved by introducing a system option, say, STRTRIM=YES/NO. That I guess would satisfy both default- and non-default-trimming camps by enabling them to set the option to their likings. SAS would simply execute TRIM implicitly with STRTRIM=YES, and should not raise any redesign issues, while the syntax/usage complaint that started the thread would be exhausted.

However, you questioned the rational of padding character variables with blanks, and this is quite different a story.

> I'm not implying that SAS needs to create a varchar > data type (although wouldn't that be extremely nice and > useful and spacesaving?).

Yeah! and that would provide a way to maintain the current string-handling functionality (maybe more) without the necessity having strings padded - see below.

> I don't care how SAS stores the data internally, it can internally pad out > those blanks all it wants

The trouble is, a SAS text variable is fixed-length and hence (unlike a number), it is a WYSIWIG; you see it the way it is stored. Unless a string is implemented as a varchar, there is no way you can hide its tail internally and show only the head externally. So, it is either to pad the tail with something, or pad it with nothing and let it contain any spurious bytes the memory is populated with when the computer is turned on. COBOL designers have chosen the latter way, but I yet have to meet a COBOL programmer who would leave his fields uninitialized. SAS designers to do it for us at the compile time. In fact, it would be more correct to say that SAS text variables are initialized with blanks rather than padded.

> I don't care what value it chooses for a missing value (also not sure what the > mini-presentation on missing values above is supposed to prove re: this issue)

The necessity to supply a value recognized by the system as missing is simply another reason for the default initialization (padding, if one will). A blank sheet of paper naturally appears more empty than one written all over it; and it is easier to discern new text on the former rather than the latter. Using the same character to initialize a string and make it missing achieves two goals at once.

> ... Although I can't see how the LENGTH function requires the fully padded > version of a variable when it actually returns the TRIMMED length, again, > it doesn't matter to me what SAS does internally for whatever reasons required

Since all the information about a string is provided by the text contained therein, and by nothing else stored aside (as it would be in the case of varchar), SAS, in order to return the TRIMmed length, cannot just grab it from memory - it is not there. It is to determined from the shape of the string itself, that is, by marching from right to left until the first non-blank character is found.

> I would just like SAS to stop treating these trailing blanks as EXPLICIT pieces > of the DATA, but rather treat them as IMPLICIT characteristics of the data TYPE.

It can't, for a fixed-length text variable knows no difference between the implicit and explicit. In order to provide its string-handling functionality, SAS must rely on some initializing character. It happens to be blank.

> I guess that's the core issue. SAS' imposition of trailing blanks in manipulating > character values does violence to the data. It actually modifies the value I assign > to a character variable. If I assign "123456789" to a $200 char variable, SAS takes > it upon itself to add an extra 191 bytes of data to the variable.

Not really. When you declare a $200 char var, either through the LENGTH statement or otherwise, you were the one who asked for all these bytes. You know that the variable is fixed-length, and none of those bytes can be hidden. They will contain something. By default, SAS gives you blanks. You are at liberty to initialize the string to anything you want. In those non-frequent applications where I need to treat all blanks as data, I initialize the string to something else, usually low-values. SAS did not give you and extra 191 bytes, it gave you a blank sheet, on which you wrote "1234567890".

> Those are 191 bytes that I never assigned to that variable.

Did not you declare the variable as $200 having only 8 bytes to write :-) ?

> > data b ; > > set a ; > > array cc _character_ ; > > do over cc ; > > cc = trim (cc) ; > > end ; > > run ; > >

> I'm not sure if this even helps the problem. Even after doing > this little batch trim, if I want to concatenate cc with > another character variable I still have to do the TRIM() > because those annoying trailing blanks have reasserted themselves.

Exactly - and that was the entire point of this example. TRIM only tells SAS to start writing in the leftmost non-blank position, having no effect upon the variable althogether. It can't. The length is fixed.

> I still don't think anyone has answered my original > fundamental question. In what PROGRAMMING or BUSINESS > context (not for "under the cover" or "behind-the-scenes" > purposes), is it useful to have SAS explicitly pad out all of > the trailing blanks? To put it a different way, how would us > po' old SAS programmers be hurt by having SAS behave the same > as every other programming languge (that I've had experience > with at least) when concatenating two character variables?

I perceive those as two different issues. First: Why would initializing character variables to blanks would be useful?

1) Initialization is a common programming practice intended to have a predetermined character in a field instead of a computer junk. SAS uses blanks automatically. COBOL's INITIALIZE instruction does the same. A programmer can initialize to anything he fancies. 2) Padding aids in coding custom byte-by-byte string-manipulating routines by providing a natural sentinel on the right. 3) Padding is necessary when comparing strings of unequal length. SAS also provides, for those who like fast programs, the colon modifier annihilating comparisons between padded blanks if desired.

Second: "having SAS behave the same as every other programming langauge". I assume you mean languages that trim strings by default before concatenation, so that the trimming does not have to be programmed explicitly. IMO, it has litte, if anything at all, to do with the initialization/padding. A system option switching the default behavior will do the trick by burying TRIM in the underlying code.

Kind regards, ===================== Paul M. Dorfman Jacksonville, FL =====================

> > > Jay Stevens > jay@whitehurst-associates.com > >

Blue Cross Blue Shield of Florida, Inc., and its subsidiary and affiliate companies are not responsible for errors or omissions in this e-mail message. Any personal comments made in this e-mail do not reflect the views of Blue Cross Blue Shield of Florida, Inc.


Back to: Top of message | Previous page | Main SAS-L page