Date: Thu, 30 Jan 2003 11:38:29 -0600
Reply-To: Kevin Myers <KevinMyers@AUSTIN.RR.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Kevin Myers <KevinMyers@AUSTIN.RR.COM>
Subject: Re: Missing Values, Primary Keys, and Unique Indexes
Content-Type: text/plain; charset="iso-8859-1"
Let me try to sum up what I am saying somewhat differently:
Real world databases are FULL of columns that contain blanks. Anyone
experienced in a reasonable variety of data warehousing and data mining
applications would tell you that blank values are in many cases JUST as
important as non-blank values. They are just another value that can be
analyzed and reported on. By preventing the use of columns that may contain
blank (or missing) values as primary indexes, the implication is that SAS
doesn't believe that blank values have any importance. I humbly beg to
differ in the strongest possible terms!
An option to merely bypass the present restriction would address this need,
should be quite simple to implement in SAS, and wouldn't impact anyone who
didn't choose to use it. The same cannot be said of DBMS products from
other vendors whose NULL implementations are more complex. SAS ought to
allow users to use this difference to their ADVANTAGE, rather than sticking
to the limits imposed by their competitors.
s/KAM
----- Original Message -----
From: "Kevin Myers" <KevinMyers@AUSTIN.RR.COM>
Newsgroups: bit.listserv.sas-l
To: <SAS-L@LISTSERV.UGA.EDU>
Sent: Thursday, January 30, 2003 11:13 AM
Subject: Re: Missing Values, Primary Keys, and Unique Indexes
> Please see below...
>
> ----- Original Message -----
> From: "Ian Whitlock" <WHITLOI1@WESTAT.com>
> To: "'Kevin Myers'" <KevinMyers@AUSTIN.RR.COM>; <SAS-L@LISTSERV.UGA.EDU>
> Sent: Thursday, January 30, 2003 10:15 AM
> Subject: RE: Missing Values, Primary Keys, and Unique Indexes
>
>
> > Kevin,
> >
> > Databases do not allow NULL values in unique indexes.
>
> I'm well aware of that.
>
> >
> > Now SAS is a fixed length string language. Thus every string has a
value.
> > Wisely or unwisely SAS chose "all blank" as the NULL value for character
> > variables. They had to choose some value, 0 length strings was not an
> > option. Perhaps they should have chosen something like all bytes "00"x,
> but
> > they didn't.
> >
> > Now why do you have to choose the SAS NULL value for blank? Perhaps you
> > could choose all "FF"x for blank and make a format where the value of
> blank
> > is "SAS NULL" and the values of strings of "FF"x are "MY BLANK". It
> > probably will mean all sorts of extra calculations throughout your
system,
> > but that is the price paid when you want to work against the system.
>
> I am summarizing and indexing data that is created by other systems. I
must
> keep ALL distinct combinations of certain identifying values that are
> generated by these other systems. I am NOT in control of the values that
> are being generated by the other systems. Much of what I am indexing is
> text. Ask any user who has ever run proc summary, or proc freq, or used
> PROC MDDB on a real world data warehouse, and they will tell you that most
> real world data is chock FULL of many fields that include blank values.
>
> Now, let's say for example that I run PROC SUMMARY with NWAY against a
> table, and some of the class columns contain blank values. Furthermore,
> let's say that I want to store that information in another table, possibly
> along with some additional data that is relevant to each specific
> combination of class column values. Now in this scenario, the class
columms
> are the OBVIOUS primary keys for my new table. BUT, since SAS won't let
me
> create a unique index on columns that include blank values, I can't use my
> class columns as the primary keys for the table!!!!! Now what other
options
> might I have...? Well let's see, how about a surrogate key??? Oh darn,
> SAS doesn't support those either, and the relational purist types hate 'em
> anyway!
>
> >
> > Normally indexes have some sort of structure which would preclude blanks
> > from being a good choice for an index value. So what in your system
> > requires a blank in an index?
>
> See above.
>
> >
> > From the questions you are asking, it looks like you need a database
> system,
> > so why did you choose SAS? SAS is a very nice data manipulation
language,
> > and it can even serve pretty well as a database, but it is not a
database
> > and does not have the features a database should have (example, variable
> > length character strings).
>
> I chose SAS because it's what I know the best, and it is VERY good at most
> of the types of data manipulations that I need to perform. Yes, I have
> considered interfacing to another database, but that should NOT be
necessary
> from a cost and complexity standpoint. Imagine if I go to a customer and
> say I have this whiz bang application that they REALLY need, but oh by the
> way, they'll have to license SAS to run it, and oh yeah, you're gonna have
> to license Oracle too. They're gonna look at me like I'm nuts, especially
> the CEO and CIO types who don't know SAS worth beans, and who won't
> understand why you might really need a combination like that. They're
just
> gonna think, forget this guy, and forget SAS, let's just do it all in
Oracle
> instead! They won't realize how that will kill them in relative app
> development time. Or at least it ought to, if SAS didn't keep throwing
just
> a few very irritating and unnecessary roadblocks in my path!!!
>
> >
> > A database system by design must make it difficult to produce wrong
data.
> > SAS by design makes it easy to manipulate data into any form. (Reminds
me
> > of the distinction between good mathematicians and great mathematicians.
> > Good ones can prove any true theorem. Great ones can prove any
theorem.)
> > So now your question should be, "Which is more important to me the
> security
> > of a straight jacket or the easy manipulation of a 'you are welcome
shoot
> > yourself in the foot' language?".
>
> Unfortunately, SAS is NOT letting me shoot myself in the foot here, which
is
> what I would ALWAYS strongly prefer that it do. If I shoot myself, that's
> my problem. I know that lots of people would argue that point, but as a
> real world analyst and application developer who is CONSTANTLY running up
> against artificially imposed limits, I'll argue that point till I'm in the
> grave. There is a huge difference between not being able to get my job
> done, versus someone else's preconceived notions of what I *shouldn't* do.
> I don't have a problem with the *default* implementation of something
being
> the normal "safe" way. But there should be a way to turn that off when
the
> safety net gets in the way. One relevant example here: The MISSING
option
> in PROC MEANS and PROC SUMMARY. People would absolutely KILL if SAS took
> that away, and I need primary keys that include missing values for exactly
> the same reasons.
>
> >
> > For many the best compromise is put the data in a database, and
manipulate
> > it with SAS.
>
> But that still wouldn't solve this specific problem!
>
>
> >
> > IanWhitlock@westat.com
> >
> > -----Original Message-----
> > From: Kevin Myers [mailto:KevinMyers@AUSTIN.RR.COM]
> > Sent: Thursday, January 30, 2003 6:07 AM
> > To: SAS-L@LISTSERV.UGA.EDU
> > Subject: Missing Values, Primary Keys, and Unique Indexes
> >
> >
> > Can anyone tell my why SAS refuses to allow blank character and missing
> > numeric values to be loaded into a table that has a primary integrity
> > constraint definition???
> >
> > Apparantly this is because defining a primary key causes SAS to build a
> > unique index on the corresponding table, and that apparantly causes SAS
to
> > enforce a NOT NULL requirement on any column that is part of the index.
> But
> > my question is why SAS enforces that NOT NULL requirement for all
columns
> > that are part of a unique index???
> >
> > Missing and blank values are just as unique as any other value. And in
my
> > present case, there are lots of missing and blank values that need to to
> be
> > indexed and used as part of primary keys right along with any other
> values.
> >
> > The enforcement of NOT NULL here seems completely arbitrary, probably
> based
> > on someone's philosophy that a primary key *shouldn't* include any
values
> > that are missing or blank, rather than some underlying reason that it
must
> > be this way.
> >
> > In the present case where I need this capability, the values to be
> included
> > as part of the primary key that need to be indexed are ALL of the values
> > coming in from another data source, which I do NOT have control over.
It
> > would completely change the meaning and use of the data if I were to try
> to
> > subsitute some arbitary values to replace the blank and missing values
in
> > the externally supplied data. The values in these fields are completely
> > arbitrary text, so there is no value that I could substitute (especially
> in
> > some of the shorter length fields) which might not be actual values from
> the
> > original data source instead.
> >
> > I'd appreciate any comments or suggestions. Even you relational purists
> can
> > feel free to knock me on the head. :-)
> >
> > Thanks,
> > s/KAM
> >
> > P.S. - Since this is SAS, arguments that might apply in other DBMS
> regarding
> > special handling for true NULL values really don't wash here.
|