Date: Sun, 14 May 2000 06:06:10 GMT
Reply-To: "Paul M. Dorfman" <sashole@MEDIAONE.NET>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Paul M. Dorfman" <sashole@MEDIAONE.NET>
Organization: KInPh
Subject: Re: DATA MANIPULATION QUESTION
Content-Type: text/plain; charset=us-ascii
Lou,
Right. The idea of improving performance is sound, and both getting rid of the
redundant MONTH in BY and testing FIRST.dot serve the purpose well. Clearing the
array in between the groups is even not optional, it is _imperative_. However,
first, it makes more sense to re-initialize the array elements to missing values
(no data for this month); second, the efficiency of re-initialization can be
further improved by taking advantage of the default DATA step action at the
bottom of the loop. Mass re-initialize to missing is much faster than sticking
zeroes or missing into the buckets explicitly. In SAS words, all we need is:
data step2;
array bal(22);
.............
do until (last.acctnum);
set step1;
by acctnum;
bal(month) = balance;
....................
end;
run;
The explicit DO loop makes retaining all array elements, as well as their
explicit re-initializing, unnecessary. Within the loop, all array items are
"retained" naturally; after the loop, they are set to missing values by default
at the bottom of the step, and one observation per account number is output
automatically.
Kind regards,
====================
Paul M. Dorfman
Jacksonville, FL
====================
Lou Pogoda wrote:
> Normally, I'd say it wouldn't matter much, but we have 10 million records,
> and there's an extra IF test in there. We also don't know if there's a
> balance for every account for every variable for every month. If you don't
> clear out your array variables, you run the risk of RETAINing a balance for
> some month for an account that doesn't have one. Perhaps you know your data
> well enough to *guarantee* that such a situation does not arise either this
> time or at any time in the future (assuming the code will be run more than
> once), but I'd modify the suggested code to use the MONTH variable as the
> array index, clear the variables after writing out the result, and get rid
> of the IF test for first-dot.
>
> proc sort data=indata out=step1;
> by acctnum;
> run;
> data step2 (drop = n);
> /* Set up arrays for each of twenty vars */
> retain;
> array bal{*} bal01-bal22;
> ........
> set step1;
> by acctnum;
> bal{month} = balance;
> if (last.acctnum) then do;
> output;
> do n = 1 to 22;
> bal{n} = 0;
> end;
> end;
> run;
>
> DGrampsas wrote in message <20000513180655.19435.00002568@ng-bg1.aol.com>...
> >proc sort data=indata out=step1;
> > by acctnum month;
> >run;
> >
> >data step2;
> > /* Set up arrays for each of twenty vars */
> > retain;
> > array bal{*} bal01-bal22;
> >
> > set step1;
> > by acctnum month;
> > if (first.acctnum) then do;
> > i = 0;
> > end;
> > i + 1;
> > bal{i} = balance;
> > if (last.acctnum) then do;
> > output;
> > end;
> >run;
> >
> >
> >
> >
> ---original question by Cybie Frontier---
> Hi Folks:
>
> I am trying to rearrange vertically arranged data into a long line of data
> for each record. Here is what I have and where I want to go. I have lot more
> variables and months than what is presented here.
>
> Present: I have a SAS data set that looks like this.
> ========
>
> month acctnum balance
> 1 100 1000
> 1 200 2000
> 1 300 3000
> 2 100 1500
> 2 200 2500
> 2 300 3500
> etc.
> Future: A new SAS data set that looks like:
>
> acctnum balance1 balance2
> 100 1000 1500
> 200 2000 2500
> 300 3000 3500
> etc.
>
> For each account I have 22 months of data and 20 variables(balance, fees
> etc.)There are about 10 million records.
>
> I am looking for a sample code that would generate the output I am looking
> for.
>
> Thank you very much for your help.
>
> CF
|