LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (January 2003, week 5)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Wed, 29 Jan 2003 11:36:56 -0500
Reply-To:   "Goldman, Brad (AT-Atlanta)" <Brad.Goldman@AUTOTRADER.COM>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   "Goldman, Brad (AT-Atlanta)" <Brad.Goldman@AUTOTRADER.COM>
Subject:   Re: Equivalent of NODUPKEY in PROQ SQL
Content-Type:   text/plain; charset="iso-8859-1"

> I have a question about SQL here. I was recently trying to > determine the > best way to eliminate duplicate records and decided to use PROC SORT > NODUPKEY. There were three key variables I wanted to check, > but I also > wanted to keep the remaining variables. In the SQL version I > was playing > with, it worked well (select distinct(var1, var2, var3)...) > but I couldn't > figure out how to get the other variables included in the > resulting dataset. > Could someone post such an example?

John, I can't quite get which way didn't work. Here are two general forms (syntax not guaranteed):

proc sort data=in out=out nodupkey; by key1 key2 key3; run; <should keep all variables>

proc sql; create table out as select distinct(var1,var2,var3),<othervarinfo> from table in group by var1 var2 var3; quit;

The kicker is the othervarinfo. Unless you somehow tell SQL which row to pick, what can it do?

Example Table In: State Zipcode GA 30329 GA 30341

Now I ask SQL to give me distinct(state), zipcode. Which row should it give back? Who knows. It won't. That is why Peter's example used max(zipcode), there must be some way to determine which of the rows to return. Note that this is not a problem if you just select distinct(state), it is only when you combine this with asking for the nonkey field that is non-unique across that key.

Basically, it is a logical dilemna. SAS solves this by making it's own assumptions, which are documented in proc sort. SQL does not make these assumptions, so the user must supply his/her own assumptions.

Brad


Back to: Top of message | Previous page | Main SAS-L page