LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (June 2005, week 5)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 30 Jun 2005 14:59:22 -0700
Reply-To:     "Pardee, Roy" <pardee.r@GHC.ORG>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "Pardee, Roy" <pardee.r@GHC.ORG>
Subject:      Re: Deleting an enumerated list of records from a huge table--how
              to do it efficiently?
Comments: To: "Dorfman, Paul" <paul.dorfman@FCSO.COM>
Content-Type: text/plain; charset="us-ascii"

Nice! Thanks very much. I need to stop being such a prisoner of sql...

Thanks!

-Roy

-----Original Message----- From: Dorfman, Paul [mailto:paul.dorfman@FCSO.COM] Sent: Thursday, June 30, 2005 2:44 PM To: SAS-L@LISTSERV.UGA.EDU; Pardee, Roy Subject: Re: Deleting an enumerated list of records from a huge table--how to do it efficiently?

Roy,

I do not have a 3G file, but here is an experiment on a smaller scale:

62 data huge (index = (pk = (vistnum proc_code))) ; 63 do vistnum = 0 to 1e7 - 1 ; 64 proc_code = put (vistnum, z8.) ; 65 output ; 66 end ; 67 run ;

NOTE: The data set USER.HUGE has 10000000 observations and 2 variables. INFO: Multiple concurrent threads will be used to create the index. NOTE: Composite index pk has been defined. NOTE: DATA statement used (Total process time): real time 29.87 seconds cpu time 40.79 seconds

68 69 data del ; 70 do vistnum = 0 to 1e7 - 1 by 1e3 ; 71 proc_code = put (vistnum, z8.) ; 72 output ; 73 end ; 74 run ;

NOTE: The data set USER.DEL has 10000 observations and 2 variables. NOTE: DATA statement used (Total process time): real time 0.02 seconds cpu time 0.02 seconds

75 76 data huge ; 77 set del ; 78 modify huge key = pk ; 79 remove ; 80 run ;

NOTE: There were 10000 observations read from the data set USER.DEL. NOTE: The data set USER.HUGE has been updated. There were 0 observations rewritten, 0 observations added and 10000 observations deleted. NOTE: DATA statement used (Total process time): real time 1.79 seconds cpu time 1.55 seconds

I bet it should not take much longer than that to do the same with HUGE 20 times larger...

Kind regards, ----------------- Paul M. Dorfman Jacksonville, FL -----------------

On Thu, 30 Jun 2005 14:04:07 -0700, Pardee, Roy <pardee.r@GHC.ORG> wrote:

>I've got a 3.5GB dataset of procedure codes with a primary key >constraint on the fields (VistNum, ProcCode). VistNum is numeric, >ProcCode is a char(8). > >I've got a way smaller dset of (VistNum, ProcCode) values that I want >to delete from my huge dataset. > >The one thing that comes to mind is: > > delete from huge > where put(VistNum, best.) || ProcCode in (select put(VistNum, best.) >|| ProcCode from tiny) ; > >But that will take forever b/c sas will have to do a full table scan. > >Now if my data were in say, mssql, I could say: > > delete huge > from huge INNER JOIN tiny > on huge.VistNum = tiny.VistNum AND > huge.ProcCode = tiny.ProcCode ; > >And it would be fast b/c the join could make use of the pk index. But >that's nonstandard sql & it appears that sas does not support it. > >Is there a more graceful & efficient way? > >Thanks! > >-Roy > >Roy Pardee >Research Analyst/Programmer >Center For Health Studies (Cancer Research Network) >Group Health Cooperative >(206) 287-2078 > >This message and any attached files might contain confidential >information protected by federal and state law. The information is >intended only for the use of the individual(s) or entities originally >named as addressees. The improper disclosure of such information may be

>subject to civil or criminal penalties. If this message reached you in >error, please contact the sender and destroy this message. Disclosing, >copying, forwarding, or distributing the information by unauthorized >individuals or entities is strictly prohibited by law.


Back to: Top of message | Previous page | Main SAS-L page