LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (January 2006, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Fri, 6 Jan 2006 13:13:19 -0500
Reply-To:     Mike Rhoads <RHOADSM1@WESTAT.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Mike Rhoads <RHOADSM1@WESTAT.COM>
Subject:      Re: Simultaneous append and prune a large data set
Content-Type: text/plain; charset="us-ascii"

Clint,

This may not be particularly practical, but at least theoretically it would be an option to avoid making a copy.

It relies on the capability of updating SAS data sets in place (MODIFY statement etc.). Logically it would involve "pairing up" records to be deleted with records to be added, and using MODIFY to replace each one of the former with one of the latter. If more adds than deletes, put the extra ones at the end. If more deletes than adds, then use REMOVE to get rid of the extra old records.

This could of course lead to slow growth of the file over time as you get more "wasted" space, so it would probably be necessary to do a recopy on some periodic basis to reclaim that space. It would also eliminate any real order to the records, if that is of interest. You could of course maintain an index to get around that issue.

It would also require quite a bit of programming and testing, and I don't know how performance would be. Like I said, it's an idea and not necessarily a recommendation.

Good luck!

Mike Rhoads Westat RhoadsM1@Westat.com

-----Original Message----- From: owner-sas-l@listserv.uga.edu [mailto:owner-sas-l@listserv.uga.edu] On Behalf Of Rickards, Clinton (GE Consumer Finance) Sent: Friday, January 06, 2006 11:17 AM To: SAS-L (E-mail) Subject: Simultaneous append and prune a large data set

All,

I think I know the answer to this but I'm hoping for someone can give me some alternatives.

I have a job I inherited which maintains a large data set (220+ GB). The goal is to keep a rolling 180 days of data. The job extracts the current day's data into a temp data set and PROC APPENDs that data to the full data set. A data step then rewrites the full data set, using a where clause to prune data more than 180 days old. Here is the essence of the program:

%let keepdate = "06jul2005"d; <<<< recalculated daily... proc append base=lib.full data=work.daily; run;

data lib.full; set lib.full (where=(tran_date < &keepdate)); run;

The problem is that in the data step, SAS is making a second copy of lib.full, consuming all available disk space. We are working with our sys admins to clear space and to allocate more disk space.

My basic understanding has always been that there is no way to avoid making a copy of the full data set, either into WORK or into another library. Our work space averages 270-300GB free, big enough but close given other users on the system. I am considering modifying the program to look like this:

%let keepdate = "06jul2005"d; <<<< recalculated daily... libname otherlib "in our saswork directory"; data otherlib.days179; set lib.full (where=(tran_date < &keepdate)); run;

proc datasets lib=lib; delete full; run;

data lib.full; set otherlib.days179 work.daily ; run;

Any other alternatives or thoughts?

TIA,

Clint


Back to: Top of message | Previous page | Main SAS-L page