LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (February 2001, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Tue, 20 Feb 2001 08:24:36 -0800
Reply-To:     "Lund, Pete" <Peter.Lund@WSIPP.WA.GOV>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "Lund, Pete" <Peter.Lund@WSIPP.WA.GOV>
Subject:      Re: Wanted: One pass summary solution
Comments: To: Nigel Tufnel <dousk8@HOTMAIL.COM>
Content-Type: text/plain; charset="iso-8859-1"

Nigel- Here's one possibility using arrays to keep track of the starts and completes. You can adjust the upper and lower bounds of the arrays as needed. The output in this example will contain records from the earliest date to the latest date found in the data. You'll have to make a tweak to the incrementing logic if there are any missing values.

/* Your test data */

data test; input startdt : mmddyy. enddt : mmddyy.; cards; 1/1/01 1/1/01 1/1/01 1/1/01 1/1/01 1/3/01 1/2/01 1/2/01 1/2/01 1/3/01 1/2/01 1/4/01 1/2/01 1/4/01 1/3/01 1/3/01 1/3/01 1/3/01 1/3/01 1/4/01 1/3/01 1/4/01 1/3/01 1/5/01 run;

/* Arbitrary lower and upper bounds */

%let lower = 14500; %let upper = 15500;

data StartsAndEnds(keep=Date NumStarts NumCompletes); set test end=done;

format date mmddyy10.;

retain Smallest 99999 Biggest -99999;

array Starts(&Lower:&Upper) _temporary_; array Completes(&Lower:&Upper) _temporary_;

Starts(StartDt) + 1; Completes(EndDt) + 1;

Smallest = min(Smallest,StartDt,EndDt); Biggest = max(Biggest,StartDt,EndDt);

if done then do i = Smallest to Biggest; NumStarts = max(Starts(i),0); NumCompletes = max(Completes(i),0); Date = i; output; end; run;

Hope this is helpful-

Pete Lund ------------------------------------------- WA State Institute for Public Policy 110 East Fifth Avenue, Suite 214 PO Box 40999 Olympia, WA 98504-0999 (360) 586-9436 - voice (360) 586-2793 - fax (360) 280-4892 - cell peter.lund@wsipp.wa.gov ----------------------------------------------------

-----Original Message----- From: Nigel Tufnel [mailto:dousk8@HOTMAIL.COM] Sent: Tuesday, February 20, 2001 6:04 AM To: SAS-L@LISTSERV.UGA.EDU Subject: Wanted: One pass summary solution

I have a data set that I want to summarize by day. I use two primary columns "startdate" and "completedate". I want the count the number of starts and completes by day. To do this I summarize the data set twice (once by startdate and again by completedate) and merge the two.

This data set has millions of records and a few hundred other variables. Is there a way to get the desired results in one pass?

The data set looks like (among other columns):

startdate completedate 1/1/01 1/1/01 1/1/01 1/1/01 1/1/01 1/3/01 1/2/01 1/2/01 1/2/01 1/3/01 1/2/01 1/4/01 1/2/01 1/4/01 1/3/01 1/3/01 1/3/01 1/3/01 1/3/01 1/4/01 1/3/01 1/4/01 1/3/01 1/5/01

The desired result is:

date starts completes 1/1/01 3 2 1/2/01 4 1 1/3/01 5 4 1/4/01 0 4 1/5/01 0 1

Like I mentioned, the two pass solution is:

proc means data = source nway noprint missing; class startdate; var _whatever_; output out=starts(rename=(_freq_ = starts startdate = date)) sum=; run;

proc means data = source nway noprint missing; class completedate; var _whatever_; output out=completes(rename=(_freq_ = completes completedate = date)) sum=; run;

data summ; merge startdate completedate ; by date; run;

A one pass solution here could save a few hours. Any Ideas?

Thanks, Nigel _________________________________________________________________ Get your FREE download of MSN Explorer at http://explorer.msn.com


Back to: Top of message | Previous page | Main SAS-L page