LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (March 2009, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Mon, 2 Mar 2009 12:19:56 -0500
Reply-To:     Gerhard Hellriegel <gerhard.hellriegel@T-ONLINE.DE>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Gerhard Hellriegel <gerhard.hellriegel@T-ONLINE.DE>
Subject:      Re: Difference between PROC SORT NODUPKEY and PROC SQL?

right! Think the following SQL code might do it better:

create table clean as select distinct * from step1 order id ; Gerhard

On Mon, 2 Mar 2009 10:58:25 -0600, Tim Kynerd <tkynerd@ECD.ORG> wrote:

>Hey Sally, > >I may be wrong, but it looks to me as if your PROC SQL is getting rid of *all* rows where the key occurs multiple times. PROC SORT NODUPKEY will keep one observation for every value of the key, and only discard subsequent duplicate values for the same key. It looks like you have 197 key values with one row each, 1 value with two rows and 1 value with three rows. > >If you want to verify what is happening, use the DUPOUT= option to PROC SORT to outpu the duplicate observations to a data set that you can then inspect. > >Best, >Tim > > >Tim Kynerd >Computer Programmer/Analyst >ECD/HOPE >4 Old River Place, Suite A >Jackson, MS 39202 >P: (601) 944-9308 >F: (601) 944-0808 >tkynerd@ecd.org > >-----Original Message----- >From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Sally Muller >Sent: Monday, March 02, 2009 10:32 AM >To: SAS-L@LISTSERV.UGA.EDU >Subject: Difference between PROC SORT NODUPKEY and PROC SQL? > >Greetings! As they say, rumors of my demise have been greatly exaggerated. >In fact if all goes according to plan will be at SAS Users Forum, so look >forward to seeing everyone there! > >Down to business. I have been madly reading Cody's "Data Cleaning >Techniques Using SAS" (wonderful book) as I am assisting teaching a class in >data management. Cody points out, apparently something Mike Zdeb >discovered, that when trying to get rid of duplicate ids or records there is >a problem because the observations have to be sequential or SAS will just >rid you of the ones that are sequential and other dups will remain. > >So for a homework we asked them to get rid of duplicate ids. Problem is >that SQL and PROC SORT don't produce the same results and I would like to >tweak one or the other so they do produce same results. Otherwise will be >quite difficult to explain to class why they get different results if they >ran both SQL and PROC SORT, in which case I need to invite a guest lecturer. > > >/* this code gets rid of 3 records, obs go from 202 to 199 */ > >proc sort DATA=STEP1 OUT=STEP2 nodupkeys; > by id; >run; > >/* this code (table clean) gets rid of 5 obs from 202 to 197 */ > >proc sql; > create table messy as > select * > from step1 > group by id > having count(*)>1; > > create table clean as > select * > from step1 > group by id > having count(*)=1; >quit; > >Thanks for any suggestions.... > > >Sally Muller > >This transmission is intended only for the use of the addressee and may contain information that is privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, or the employee or agent responsible for delivering the message to the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately via e-mail at support@ecd.org.


Back to: Top of message | Previous page | Main SAS-L page