LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (March 2007, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 15 Mar 2007 16:03:50 -0400
Reply-To:     Sigurd Hermansen <HERMANS1@WESTAT.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Sigurd Hermansen <HERMANS1@WESTAT.COM>
Subject:      Re: SQL doubt....
Comments: To: toby dunn <tobydunn@hotmail.com>
In-Reply-To:  <BAY123-F25FCFD0CAC1B8A430BF1B5DE720@phx.gbl>
Content-Type: text/plain; charset="us-ascii"

Toby: In essence I am arguing that restructuring of data into a proper relational database makes sense whether done within a single program as an expedient or done once and for all the good reasons. Jack's PROC SUMMARY solution restructures behind the scenes, for example, but why not work with a database that makes sense in the first place. Bad database design has much the same result as mixing chicken noodle, beef barley, and wonton soup in one pot. With a lot of work it may be possible to extract something that looks like one of the three different types of soup, but why go about it that way?

By extension I am also arguing something more radical: that solving an immediate problem merely delays a better response, database redesign, and sets up the programmer for a crisis in the future. A skilled engineer may be able to figure out how to control flows of red, white, and blue paint through pipe so that very few of the colors mix; a better engineer recognizes that the process requires three smaller pipes to work correctly. Forcing a three-dimensional problem into a two-dimensional structure risks distortion and error. I see no need to do that. S

-----Original Message----- From: owner-sas-l@listserv.uga.edu [mailto:owner-sas-l@listserv.uga.edu] On Behalf Of toby dunn Sent: Thursday, March 15, 2007 3:23 PM To: Sigurd Hermansen; SAS-L@LISTSERV.UGA.EDU Subject: Re: SQL doubt....

Sig ,

Hmmmm disagree ehhh... well lets see if we assum the data is already sorted, then one pass of teh data vs. I think 3 for the SQL solution. The crapy data structure is what makes this a bad choice for a SQL solution. The data step handles these structures I believe a little better. Had the Data Structure been in a normalized I would think I would opt for a SQL solution. It is my belief that the business rules, desired product, and data structure dictate the best tool for the job.

Toby Dunn

To sensible men, every day is a day of reckoning. ~John W. Gardner

The important thing is this: To be able at any moment to sacrifice that which we are for what we could become. ~Charles DuBois

Don't get your knickers in a knot. Nothing is solved and it just makes you walk funny. ~Kathryn Carpenter

From: Sigurd Hermansen <HERMANS1@WESTAT.COM> Reply-To: Sigurd Hermansen <HERMANS1@WESTAT.COM> To: SAS-L@LISTSERV.UGA.EDU Subject: Re: SQL doubt.... Date: Thu, 15 Mar 2007 15:12:01 -0400

Cherish: I'll have to agree with Jack and Richard (no surprise there) and disagree with Toby (in this case, the voice of the SAS programmer who is trying to help you out of a difficult situation). While I agree that a SAS Data step can work around even more perverse database designs (and you do have a database design even if you don't recognize it!), I've seen too many bad situations evolve when the file from hell becomes the official source of data and everyone has to use Bob's program, the one he left behind when he joined the circus in 1994, to generate the Master Report, and everyone's reports have to match up to the numbers in Bob's report, because, even though no one has checked them lately, management believes those numbers.

Why not first put data in structures that make sense to almost anyone but Bob. Then programs might make sense as well?

My single query SQL solution has the same semantics as Jack's program. Of more interest, whether a single query or multiple queries, any set-logic solution has to have almost has to have the same structure. The structure has to match up to a properly normalized (3NF) database. Note that this query nested three levels down,

(select distinct SSN,Appl_No,Amt_Lent from test) ,

separates out the SSN and Amt_Lent attributes and gets rid of the redundant tuples that cause the Amt_Lent attribute in the original table to add up to a multiple of the true total. From there it becomes a simple matter to sum up values of attributes and join the sums by SSN.

For practicing relational database design at home, a good design would not only decompose the original file into two relations, it would also include a foreign key constraint on SSN in a Payments table. That way an INNER JOIN ON SSN will always find an SSN in the LoanAmts relation to match any SSN on the Payments relation. S ______________________

data test; input SSN Appl_No Income Amt_Lent Payment_Number; cards; 1 100 25 1000 1 1 100 50 1000 2 1 101 75 2000 3 1 102 50 1000 4 1 102 25 1000 5 2 103 65 1500 6 2 103 25 1500 7 2 104 75 2000 8 ; run; /* Single query SQL solution. */ proc sql; create table SQLsolution as select t1.SSN as SSN,totalIncome,totalAmtLent from (select SSN,sum(Income) as totalIncome from test group by SSN ) as t1 inner join (select SSN,sum(Amt_Lent) as totalAmtLent from (select distinct SSN,Appl_No,Amt_Lent from test) group by SSN ) as t2 on t1.SSN=t2.SSN ; quit;

-----Original Message----- From: owner-sas-l@listserv.uga.edu [mailto:owner-sas-l@listserv.uga.edu] On Behalf Of Cherish Sent: Thursday, March 15, 2007 8:35 AM To: SAS-L@LISTSERV.UGA.EDU Subject: SQL doubt....

I have a dataset which is duplicated by SSN, duplicated by application number, duplicated by loan amounts lent, income obtained from each payment. (Dataset is unique by payment_number). I want to have a final dataset unqiue by SSN, total income paid towards all loans, total amount lent w.r.t all loans.

Example SSN Appl_No Income Amt_Lent Payment_Number 1 100 25 1000 1 1 100 50 1000 2 1 101 75 2000 3 1 102 50 1000 4 1 102 25 1000 5 2 103 65 1500 6 2 103 25 1500 7 2 104 75 2000 8

I want the final dataset to be like this:

SSN Total_Income Amt_Lent 1 225 4000 2 165 3500

I know it can be done in two sql queries. But can we do it in one step?

Currently am doing like this

proc sql; create table test as select SSN, sum(Income) as Total_Income from xxxx group by SSN order by SSN, Payment_Number; quit;

data test1; set test; by SSN; if first.SSN then output; run;

After this am dping another query where I retain records unique by Appl_no, then sum the amount, then make it unique by SSN, then merge back.

Can distinct of application numer be used inside the group statement?

_________________________________________________________________ Play Flexicon: the crossword game that feeds your brain. PLAY now for FREE. http://zone.msn.com/en/flexicon/default.htm?icid=flexicon_hmtagline


Back to: Top of message | Previous page | Main SAS-L page