LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (May 2000, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Mon, 15 May 2000 00:52:14 GMT
Reply-To:     Lou Pogoda <lpogoda@HOME.NOSPAM.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Lou Pogoda <lpogoda@HOME.NOSPAM.COM>
Organization: @Home Network
Subject:      Re: DATA MANIPULATION QUESTION

Well, my background is accounting/financial. There, somewhere along the way these numbers are going to be used in arithmetic operations. Missing values can play hob with this, and early on I adopted the practice of initializing values at zero rather than missing. Which not coincidentally was the practice of the general ledger prackage used by the office I was in when I started learning SAS way too many years ago.

Anyway, in an accounting setting, for questions like "what was the activity in account ABC for the month" the answer zero sounds more sensible than "I don't know". Operationally, the two are pretty much the same thing, since when you start doing arithmetic on balances or activity, if someone hasn't initialized a particular variable at zero you find yourself doing things like IF ACTIVITY = . THEN ACTIVITY = 0, or needing to test for both zero and missing before using a variable as a divisor or when looking for receivables accounts with a credit balance. That gets to be a pain real fast.

The situation gets more complicated when (not that there's any indication in this case) SAS is interacting with an RDBMS such as ORACLE. In some respects, missing values in SAS can be likened to null values in ORACLE, but the two systems handle them quite differently. In particular, missing values can be compared in SAS while null values cannot be compared in ORACLE.

Whether missing rather than zero values is acceptable will depend on the uses to which the file is to be put. The original post did not address the question either way, so I all but unconsciously fell back on what I've encountered over the last 20 years or so as pretty much standard practice.

All of which is beside your point, of course. While I'd quibble with your "it makes more sense to re-initialize ... to missing values" (in fact, I just did) given that missing values are acceptable, you are quite right.

Paul M. Dorfman wrote in message <391E45D5.7D7C7308@mediaone.net>... >Lou, > >Right. The idea of improving performance is sound, and both getting rid of the >redundant MONTH in BY and testing FIRST.dot serve the purpose well. Clearing the >array in between the groups is even not optional, it is _imperative_. However, >first, it makes more sense to re-initialize the array elements to missing values >(no data for this month); second, the efficiency of re-initialization can be >further improved by taking advantage of the default DATA step action at the >bottom of the loop. Mass re-initialize to missing is much faster than sticking >zeroes or missing into the buckets explicitly. In SAS words, all we need is: > >data step2; > array bal(22); > ............. > do until (last.acctnum); > set step1; > by acctnum; > bal(month) = balance; > .................... > end; >run; > >The explicit DO loop makes retaining all array elements, as well as their >explicit re-initializing, unnecessary. Within the loop, all array items are >"retained" naturally; after the loop, they are set to missing values by default >at the bottom of the step, and one observation per account number is output >automatically. > >Kind regards, >==================== >Paul M. Dorfman >Jacksonville, FL >==================== > >Lou Pogoda wrote: > >> Normally, I'd say it wouldn't matter much, but we have 10 million records, >> and there's an extra IF test in there. We also don't know if there's a >> balance for every account for every variable for every month. If you don't >> clear out your array variables, you run the risk of RETAINing a balance for >> some month for an account that doesn't have one. Perhaps you know your data >> well enough to *guarantee* that such a situation does not arise either this >> time or at any time in the future (assuming the code will be run more than >> once), but I'd modify the suggested code to use the MONTH variable as the >> array index, clear the variables after writing out the result, and get rid >> of the IF test for first-dot. >> >> proc sort data=indata out=step1; >> by acctnum; >> run; >> data step2 (drop = n); >> /* Set up arrays for each of twenty vars */ >> retain; >> array bal{*} bal01-bal22; >> ........ >> set step1; >> by acctnum; >> bal{month} = balance; >> if (last.acctnum) then do; >> output; >> do n = 1 to 22; >> bal{n} = 0; >> end; >> end; >> run; >> >> DGrampsas wrote in message <20000513180655.19435.00002568@ng-bg1.aol.com>... >> >proc sort data=indata out=step1; >> > by acctnum month; >> >run; >> > >> >data step2; >> > /* Set up arrays for each of twenty vars */ >> > retain; >> > array bal{*} bal01-bal22; >> > >> > set step1; >> > by acctnum month; >> > if (first.acctnum) then do; >> > i = 0; >> > end; >> > i + 1; >> > bal{i} = balance; >> > if (last.acctnum) then do; >> > output; >> > end; >> >run; >> > >> > >> > >> > >> ---original question by Cybie Frontier--- >> Hi Folks: >> >> I am trying to rearrange vertically arranged data into a long line of data >> for each record. Here is what I have and where I want to go. I have lot more >> variables and months than what is presented here. >> >> Present: I have a SAS data set that looks like this. >> ======== >> >> month acctnum balance >> 1 100 1000 >> 1 200 2000 >> 1 300 3000 >> 2 100 1500 >> 2 200 2500 >> 2 300 3500 >> etc. >> Future: A new SAS data set that looks like: >> >> acctnum balance1 balance2 >> 100 1000 1500 >> 200 2000 2500 >> 300 3000 3500 >> etc. >> >> For each account I have 22 months of data and 20 variables(balance, fees >> etc.)There are about 10 million records. >> >> I am looking for a sample code that would generate the output I am looking >> for. >> >> Thank you very much for your help. >> >> CF >


Back to: Top of message | Previous page | Main SAS-L page