Date: Thu, 15 Mar 2007 16:03:50 -0400
Reply-To: Sigurd Hermansen <HERMANS1@WESTAT.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Sigurd Hermansen <HERMANS1@WESTAT.COM>
Subject: Re: SQL doubt....
In-Reply-To: <BAY123-F25FCFD0CAC1B8A430BF1B5DE720@phx.gbl>
Content-Type: text/plain; charset="us-ascii"
Toby:
In essence I am arguing that restructuring of data into a proper
relational database makes sense whether done within a single program as
an expedient or done once and for all the good reasons. Jack's PROC
SUMMARY solution restructures behind the scenes, for example, but why
not work with a database that makes sense in the first place. Bad
database design has much the same result as mixing chicken noodle, beef
barley, and wonton soup in one pot. With a lot of work it may be
possible to extract something that looks like one of the three different
types of soup, but why go about it that way?
By extension I am also arguing something more radical: that solving an
immediate problem merely delays a better response, database redesign,
and sets up the programmer for a crisis in the future. A skilled
engineer may be able to figure out how to control flows of red, white,
and blue paint through pipe so that very few of the colors mix; a better
engineer recognizes that the process requires three smaller pipes to
work correctly. Forcing a three-dimensional problem into a
two-dimensional structure risks distortion and error. I see no need to
do that.
S
-----Original Message-----
From: owner-sas-l@listserv.uga.edu [mailto:owner-sas-l@listserv.uga.edu]
On Behalf Of toby dunn
Sent: Thursday, March 15, 2007 3:23 PM
To: Sigurd Hermansen; SAS-L@LISTSERV.UGA.EDU
Subject: Re: SQL doubt....
Sig ,
Hmmmm disagree ehhh... well lets see if we assum the data is already
sorted, then one pass of teh data vs. I think 3 for the SQL solution.
The crapy data structure is what makes this a bad choice for a SQL
solution. The data
step handles these structures I believe a little better. Had the Data
Structure been in a normalized I would think I would opt for a SQL
solution.
It is my belief that the business rules, desired product, and data
structure dictate the best tool for the job.
Toby Dunn
To sensible men, every day is a day of reckoning. ~John W. Gardner
The important thing is this: To be able at any moment to sacrifice that
which we are for what we could become. ~Charles DuBois
Don't get your knickers in a knot. Nothing is solved and it just makes
you walk funny. ~Kathryn Carpenter
From: Sigurd Hermansen <HERMANS1@WESTAT.COM>
Reply-To: Sigurd Hermansen <HERMANS1@WESTAT.COM>
To: SAS-L@LISTSERV.UGA.EDU
Subject: Re: SQL doubt....
Date: Thu, 15 Mar 2007 15:12:01 -0400
Cherish:
I'll have to agree with Jack and Richard (no surprise there) and
disagree with Toby (in this case, the voice of the SAS programmer who is
trying to help you out of a difficult situation). While I agree that a
SAS Data step can work around even more perverse database designs (and
you do have a database design even if you don't recognize it!), I've
seen too many bad situations evolve when the file from hell becomes the
official source of data and everyone has to use Bob's program, the one
he left behind when he joined the circus in 1994, to generate the Master
Report, and everyone's reports have to match up to the numbers in Bob's
report, because, even though no one has checked them lately, management
believes those numbers.
Why not first put data in structures that make sense to almost anyone
but Bob. Then programs might make sense as well?
My single query SQL solution has the same semantics as Jack's program.
Of more interest, whether a single query or multiple queries, any
set-logic solution has to have almost has to have the same structure.
The structure has to match up to a properly normalized (3NF) database.
Note that this query nested three levels down,
(select distinct SSN,Appl_No,Amt_Lent from test) ,
separates out the SSN and Amt_Lent attributes and gets rid of the
redundant tuples that cause the Amt_Lent attribute in the original table
to add up to a multiple of the true total. From there it becomes a
simple matter to sum up values of attributes and join the sums by SSN.
For practicing relational database design at home, a good design would
not only decompose the original file into two relations, it would also
include a foreign key constraint on SSN in a Payments table. That way an
INNER JOIN ON SSN will always find an SSN in the LoanAmts relation to
match any SSN on the Payments relation. S ______________________
data test;
input SSN Appl_No Income Amt_Lent Payment_Number;
cards;
1 100 25 1000 1
1 100 50 1000 2
1 101 75 2000 3
1 102 50 1000 4
1 102 25 1000 5
2 103 65 1500 6
2 103 25 1500 7
2 104 75 2000 8
;
run;
/* Single query SQL solution. */
proc sql;
create table SQLsolution as
select t1.SSN as SSN,totalIncome,totalAmtLent from
(select SSN,sum(Income) as totalIncome
from test group by SSN
) as t1
inner join
(select SSN,sum(Amt_Lent) as totalAmtLent
from (select distinct SSN,Appl_No,Amt_Lent from test)
group by SSN
) as t2
on t1.SSN=t2.SSN
;
quit;
-----Original Message-----
From: owner-sas-l@listserv.uga.edu [mailto:owner-sas-l@listserv.uga.edu]
On Behalf Of Cherish
Sent: Thursday, March 15, 2007 8:35 AM
To: SAS-L@LISTSERV.UGA.EDU
Subject: SQL doubt....
I have a dataset which is duplicated by SSN, duplicated by application
number, duplicated by loan amounts lent, income obtained from each
payment. (Dataset is unique by payment_number). I want to have a final
dataset unqiue by SSN, total income paid towards all loans, total amount
lent w.r.t all loans.
Example
SSN Appl_No Income Amt_Lent Payment_Number
1 100 25 1000 1
1 100 50 1000 2
1 101 75 2000 3
1 102 50 1000 4
1 102 25 1000 5
2 103 65 1500 6
2 103 25 1500 7
2 104 75 2000 8
I want the final dataset to be like this:
SSN Total_Income Amt_Lent
1 225 4000
2 165 3500
I know it can be done in two sql queries. But can we do it in one step?
Currently am doing like this
proc sql;
create table test as
select SSN, sum(Income) as Total_Income from xxxx
group by SSN order by SSN, Payment_Number;
quit;
data test1;
set test;
by SSN;
if first.SSN then output;
run;
After this am dping another query where I retain records unique by
Appl_no, then sum the amount, then make it unique by SSN, then merge
back.
Can distinct of application numer be used inside the group statement?
_________________________________________________________________
Play Flexicon: the crossword game that feeds your brain. PLAY now for
FREE.
http://zone.msn.com/en/flexicon/default.htm?icid=flexicon_hmtagline