| Date: | Mon, 10 Dec 2001 22:37:04 -0500 |
| Reply-To: | Jay Stevens <jay@MEDIASHOWER.COM> |
| Sender: | "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> |
| From: | Jay Stevens <jay@MEDIASHOWER.COM> |
| Subject: | Re: upper(trim(left(myvar))) |
|
| In-Reply-To: | <100B8759152313-01@mail.bcbsfl.com> |
| Content-Type: | text/plain; charset="iso-8859-1" |
Really and truly if we can just get that STRTRIM=YES in place, all of my
issues re: this topic would be resolved. I understand all the internal
issues you've covered here, I just want my STRTRIM=YES option.
heh.
Good discussion though.
Jay Stevens
jay@whitehurst-associates.com
-----Original Message-----
From: Dorfman, Paul [mailto:Paul.Dorfman@bcbsfl.com]
Sent: Monday, December 10, 2001 7:43 PM
To: 'Jay Stevens'; SAS-L@LISTSERV.UGA.EDU
Subject: RE: upper(trim(left(myvar)))
> Ack! Man, Paul you can take an inch of yarn and turn it into
> a sweater, socks and matching tophat can't you?
Jay,
No, I wish I could...
> Obviously I need to clarify or restate my original point:
>
> * I NEVER said that char vars should be default upper or
> lower. I made the case pretty clearly that keeping case
> sensitivity is necessary for data integrity (never made any
> mention of syntax case sensitivity either).
I never said you did and agreed with you entirely on this point.
>> ... Paul's General Theory of Everything ...
> I think this sums up the misunderstanding of my point. My
> issue IS one of syntax and usage and NOT of underlying
> design.
Let us define clearly what we are talking about. The issue of trimming by
default or not can be solved by introducing a system option, say,
STRTRIM=YES/NO. That I guess would satisfy both default- and
non-default-trimming camps by enabling them to set the option to their
likings. SAS would simply execute TRIM implicitly with STRTRIM=YES, and
should not raise any redesign issues, while the syntax/usage complaint that
started the thread would be exhausted.
However, you questioned the rational of padding character variables with
blanks, and this is quite different a story.
> I'm not implying that SAS needs to create a varchar
> data type (although wouldn't that be extremely nice and
> useful and spacesaving?).
Yeah! and that would provide a way to maintain the current string-handling
functionality (maybe more) without the necessity having strings padded - see
below.
> I don't care how SAS stores the data internally, it can internally pad out
> those blanks all it wants
The trouble is, a SAS text variable is fixed-length and hence (unlike a
number), it is a WYSIWIG; you see it the way it is stored. Unless a string
is implemented as a varchar, there is no way you can hide its tail
internally and show only the head externally. So, it is either to pad the
tail with something, or pad it with nothing and let it contain any spurious
bytes the memory is populated with when the computer is turned on. COBOL
designers have chosen the latter way, but I yet have to meet a COBOL
programmer who would leave his fields uninitialized. SAS designers to do it
for us at the compile time. In fact, it would be more correct to say that
SAS text variables are initialized with blanks rather than padded.
> I don't care what value it chooses for a missing value (also not sure what
the
> mini-presentation on missing values above is supposed to prove re: this
issue)
The necessity to supply a value recognized by the system as missing is
simply another reason for the default initialization (padding, if one
will). A blank sheet of paper naturally appears more empty than one written
all over it; and it is easier to discern new text on the former rather than
the latter. Using the same character to initialize a string and make it
missing achieves two goals at once.
> ... Although I can't see how the LENGTH function requires the fully padded
> version of a variable when it actually returns the TRIMMED length, again,
> it doesn't matter to me what SAS does internally for whatever reasons
required
Since all the information about a string is provided by the text contained
therein, and by nothing else stored aside (as it would be in the case of
varchar), SAS, in order to return the TRIMmed length, cannot just grab it
from memory - it is not there. It is to determined from the shape of the
string itself, that is, by marching from right to left until the first
non-blank character is found.
> I would just like SAS to stop treating these trailing blanks as EXPLICIT
pieces
> of the DATA, but rather treat them as IMPLICIT characteristics of the data
TYPE.
It can't, for a fixed-length text variable knows no difference between the
implicit and explicit. In order to provide its string-handling
functionality, SAS must rely on some initializing character. It happens to
be blank.
> I guess that's the core issue. SAS' imposition of trailing blanks in
manipulating
> character values does violence to the data. It actually modifies the value
I assign
> to a character variable. If I assign "123456789" to a $200 char variable,
SAS takes
> it upon itself to add an extra 191 bytes of data to the variable.
Not really. When you declare a $200 char var, either through the LENGTH
statement or otherwise, you were the one who asked for all these bytes. You
know that the variable is fixed-length, and none of those bytes can be
hidden. They will contain something. By default, SAS gives you blanks. You
are at liberty to initialize the string to anything you want. In those
non-frequent applications where I need to treat all blanks as data, I
initialize the string to something else, usually low-values. SAS did not
give you and extra 191 bytes, it gave you a blank sheet, on which you wrote
"1234567890".
> Those are 191 bytes that I never assigned to that variable.
Did not you declare the variable as $200 having only 8 bytes to write :-) ?
> > data b ;
> > set a ;
> > array cc _character_ ;
> > do over cc ;
> > cc = trim (cc) ;
> > end ;
> > run ;
> >
> I'm not sure if this even helps the problem. Even after doing
> this little batch trim, if I want to concatenate cc with
> another character variable I still have to do the TRIM()
> because those annoying trailing blanks have reasserted themselves.
Exactly - and that was the entire point of this example. TRIM only tells SAS
to start writing in the leftmost non-blank position, having no effect upon
the variable althogether. It can't. The length is fixed.
> I still don't think anyone has answered my original
> fundamental question. In what PROGRAMMING or BUSINESS
> context (not for "under the cover" or "behind-the-scenes"
> purposes), is it useful to have SAS explicitly pad out all of
> the trailing blanks? To put it a different way, how would us
> po' old SAS programmers be hurt by having SAS behave the same
> as every other programming languge (that I've had experience
> with at least) when concatenating two character variables?
I perceive those as two different issues. First: Why would initializing
character variables to blanks would be useful?
1) Initialization is a common programming practice intended to have a
predetermined character in a field instead of a computer junk. SAS uses
blanks automatically. COBOL's INITIALIZE instruction does the same. A
programmer can initialize to anything he fancies.
2) Padding aids in coding custom byte-by-byte string-manipulating routines
by providing a natural sentinel on the right.
3) Padding is necessary when comparing strings of unequal length. SAS also
provides, for those who like fast programs, the colon modifier annihilating
comparisons between padded blanks if desired.
Second: "having SAS behave the same as every other programming langauge". I
assume you mean languages that trim strings by default before concatenation,
so that the trimming does not have to be programmed explicitly. IMO, it has
litte, if anything at all, to do with the initialization/padding. A system
option switching the default behavior will do the trick by burying TRIM in
the underlying code.
Kind regards,
=====================
Paul M. Dorfman
Jacksonville, FL
=====================
>
>
> Jay Stevens
> jay@whitehurst-associates.com
>
>
Blue Cross Blue Shield of Florida, Inc., and its subsidiary and
affiliate companies are not responsible for errors or omissions in this
e-mail message. Any personal comments made in this e-mail do not reflect the
views of Blue Cross Blue Shield of Florida, Inc.
|