| Date: | Wed, 21 May 2008 18:35:26 -0400 |
| Reply-To: | Sigurd Hermansen <HERMANS1@WESTAT.COM> |
| Sender: | "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> |
| From: | Sigurd Hermansen <HERMANS1@WESTAT.COM> |
| Subject: | Re: Nested Data Accesses? |
| In-Reply-To: | <6c9d07d8-9bb0-42ba-a2b3-4def9d0342c3@w7g2000hsa.googlegroups.com> |
| Content-Type: | text/plain; charset="us-ascii" |
Let me say up front that I would not agree to perpetuate such a
contrived legacy system for creating database keys. While related
programming may continue indefinitely, projects such as these too often
lead to deadend careers. While I understand that many database
programmers have to learn how to convert legacy systems to viable
databases, it really doesn't make sense to prop up an inferior database
scheme with tricky programs.
In general I see far too many questions on SAS-L on how to use SAS to
"abnormalize" databases. I see bad database design as a practice as bad
as, if not worse, than what Art Carpenter satirizes in his lectures as
programming with job security in mind. In one common form of bad
database design, a programmer asks how to create and manage a number of
data artifacts, such as the key values that we see in the Nested Data
Accesses problem. Other variants include hierarchic data in sequential
rows of a dataset, variable values that point to other variables in the
same row of a dataset, and different blocks of rows in one dataset.
Often a database programmer claims that these data artifacts improve
system performance, or have to be that way for various local or
regulatory reasons, or make programming easier and more intuitive. In
fact, none of these claims has merit. SAS and other database programming
environments provide way fast access to related datasets and make
programming simpler and easier to extend and maintain. The same
programmers that claim to be working in special situations requiring
data artifacts more likely or not will learn the advantages of good
database designs and drop the artifacts one by one.
I spent some time last evening staring at this problem. I was trying to
figure out why anyone would come up with this scheme in the first place,
and why isn't it obvious to everyone that this analog of division in
Roman numerals would be more trouble than it's worth. Perhaps it appears
to have an "autokey generator" function and many database programmers
believe (falsely) that database systems should create unique primary
keys whenever alternative (candidate) keys contain duplicated values in
a dataset.
So what's so wrong here? First, the key values combine an odd mixture
of sequential numbers (dates?), what appears to be a day value, and a
constant with a sequential number appended to it. We have no idea what
the elements of the keys actually represent, making it more difficult to
understand their role in a database, and we can only guess how they
relate to attributes of relations in a database. The first SQL query in
a basic program (below) shows how one might increment a character string
representing a sequential number in a key value so that the sequential
number exceeds by one the maximum value in a dataset (OLDIDS). The
second query concatenates the string into the middle of a string that
would replace a NULL string in a key value. All of this would work
acceptably well except when operating on more than one tuple (row) with
NULL strings in key values. The yield of the query has more than one
tuple (row). Fortunately a SQL UPDATE statement requires a yield of only
one value per attribute to be updated; that is, the query must yield the
correct "relvar type". Testing the UPDATE with sample data shows it not
to be the case in general or when operating on this sample of data.
Data OLDIDS;
input
id $ 1-6 dat $ 7-10 matchfield $ 11-15
;
cards;
070100da1tmatc1
070101da2tmatc1
070102da3tmatc1
070200da4tmatc2
070201da5tmatc2
070202da6tmatc3
070203da7tmatc3
070300da8tmatc4
;
run;
/*
NEWDATA (underscore indicates no value)
cards;
______da2tmatc1
______da4tmatc2
______da9tmatc3
______da0tmatc3
*/
/*
Initial Match (done already):
*/
Data unmatched;
input
id $ 1-6 dat $ 7-10 matchfield $ 11-15
;
cards;
070101da2tmatc1
070200da4tmatc2
da9tmatc3
da0tmatc3
;
run;
proc sql undo_policy=NONE;
select distinct put(input(
(select distinct
max(substr(id,3,4))
from OLDIDS
)
,2.)+1
,2.)
from unmatched as R1 inner join OLDIDS as R2
on R1.matchfield=R2.matchfield
where R1.id IS NULL
;
quit;
proc sql undo_policy=NONE;
select distinct
max(substr(R2.id,1,2))
||put(input(
(select distinct max(substr(id,3,4))
from OLDIDS
)
,2.)+1
,2.)
||max(substr(R2.id,5,6)) as id,
R1.dat as dat,R1.matchfield as matchfield
from unmatched as R1 inner join OLDIDS as R2
on R1.matchfield=R2.matchfield
where R1.id IS NULL
group by substr(R2.id,1,2),substr(R2.id,5,6)
;
quit;
proc sql undo_policy=NONE;
insert into OLDIDS
select id,dat,matchfield
from (select distinct
trim(max(substr(R2.id,1,2)))
||put(input(
(select distinct max(substr(id,3,4))
from OLDIDS group by substr(id,3,4)
)
,2.)+1
,2.)
||max(substr(R2.id,5,6)) as id,
R1.dat as dat,R1.matchfield as matchfield
from unmatched as R1 inner join OLDIDS as R2
on R1.matchfield=R2.matchfield
where R1.id IS NULL
group by substr(R2.id,1,2),substr(R2.id,5,6)
)
;
quit;
Yes, looping through the UPDATES one key value at a time may solve the
problem. Perhaps that solution makes sense as a stopgap measure. I'd
recommend better database design as a better solution and one that may
save time and effort beginning now.
S
-----Original Message-----
From: owner-sas-l@listserv.uga.edu [mailto:owner-sas-l@listserv.uga.edu]
On Behalf Of Laptop765@gmail.com
Sent: Wednesday, May 21, 2008 12:23 PM
To: sas-l@uga.edu
Subject: Re: Nested Data Accesses?
Thanks everyone for your help... I finally figured out a way to do it.
I wrote a macro with a for loop. In each iteration, I saved the
matchfield value and some unique identifiers so I could close the
dataset. I ran this through a modified version of the SQL to get both
values. I then used the unique identifiers to add the two values to the
field.
I appreciate all of the information everyone has given me, I learned a
great deal through all of this.
|