Date: Thu, 30 Jan 2003 13:09:15 -0500
Reply-To: Ian Whitlock <WHITLOI1@WESTAT.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Ian Whitlock <WHITLOI1@WESTAT.COM>
Subject: Re: Missing Values, Primary Keys, and Unique Indexes
Content-Type: text/plain; charset="iso-8859-1"
Kevin,
In part you write:
>I don't have a problem with the *default* implementation of something being
>the normal "safe" way. But there should be a way to turn that off when the
>safety net gets in the way.
I agree, and in this case you should either turn off the indexing or create
your own index.
After my suggestion that you use "FF"x to stand for all blank you wrote
>>>>>>>
Now, let's say for example that I run PROC SUMMARY with NWAY against a
table, and some of the class columns contain blank values. Furthermore,
let's say that I want to store that information in another table, possibly
along with some additional data that is relevant to each specific
combination of class column values. Now in this scenario, the class columms
are the OBVIOUS primary keys for my new table. BUT, since SAS won't let me
create a unique index on columns that include blank values, I can't use my
class columns as the primary keys for the table!!!!! Now what other options
might I have...? Well let's see, how about a surrogate key??? Oh darn,
SAS doesn't support those either, and the relational purist types hate 'em
anyway!
>>>>>>>
So it appears that your character index can contain all possible values of a
byte. Well then let me suggest another way to create your own index.
Suppose X is the character classification variable then add a DATA step
after the summary making a viable _X which has one extra byte in it.
if x = " " then _x = <your choice(not blank)> || x ;
else _x = <second choice> || x ;
Yes, I suppose you will loose a byte on 32K character strings. If it is
important then perhaps 32K character strings should not be used as indexes
or you can partition the variable into two character strings and replace the
one index with two indices.
Now that I think about, it probably makes better sense to put the extra byte
at the end instead of the beginning. It could lead to inefficiencies, but
you get the advantage that you could always control what is printed with a
format or create the orignal variable by simply changing the length.
IanWhitlock@westat.com
|