```Date: Tue, 25 Mar 1997 15:45:11 +0000 Reply-To: John Whittington Sender: "SAS(r) Discussion" From: John Whittington Subject: Re: How to calculate sum Comments: To: whitloi1@WESTATPO.WESTAT.COM Comments: cc: Junjia Content-Type: text/plain; charset="us-ascii" On Fri, 14 Mar 1997, Ian Whitlock posted a solution (attached below) to the question by Junjia regarding summation across observations and determination of whether any two observations accounted for 80% or more of the total for any variable. Since SAS DATA steps process data one observation at a time, it is usually much simpler to undertake arithmetic summary activities across variables (within observations) than across observations. In terms of coding simplicity, it is therefore often better to TRANSPOSE data for such exercises. The exercise is question is also aided by the SAS function ORDINAL() which facilitates extraction of the "Nth largest" (i.e. 2nd largest in this case), as well as maximum, value from a list of variables. The following is much briefer (hence more rapidly written and debugged) code to achieve identical results to Ian's code; the PROC TRANSPOSE step clearly will add to overall execution time, although this is more than offset by simpler arithmetic/logic statements and determination of the number of variables *within* the same datastep, rather than by creation of a macrovariable in a separate DATA step. %let data = w ; proc transpose data=&data out=johns ; run ; data done (keep = _NAME_) ; set johns ; array _n(*) _numeric_ ; if ordinal(dim(_n), of _numeric_) + ordinal(dim(_n)-1, of _numeric_) >= .8 * sum(of _numeric_) ; run ; proc print data=done; run ; Using Ian's test dataset (and with the same proviso about all values being positive), this produces identical output to his much more lengthy code - and, in fact, even executes more quickly than Ian's code on my system: IAN'S MINE preliminary step 0.16 secs (DATA) 0.17 secs (TRANSPOSE) main DATA step 0.55 secs 0.17 secs PROC PRINT 0.11 secs 0.11 secs --------------------------------------- TOTAL: 0.82 secs 0.45 secs As always, this illustrates the diversity of possible approaches to the same problem when using SAS. Regards John ----------Ian Whitlock's previous solution ---------- > Subject: How to calculate sum > Summary: Save the two biggest values and check them out. > Respondent: Ian Whitlock > > Junjia asks: > > >I have dataset with 100 variables and 2000 records. I want to calculate > >the total of 2000 records for each variable, and like to check if any two > >of records in 2000 account for 80% of total or not in each variable > >calculating. > > First off I hope all the values are non-negative. With negative values > even two numbers close to 0 might account for 80% of the sum (100 -100 > 1 1). With only non-negative values only the two biggest are the only > candidates. > > It is tempting to sort on each variable and add the top two values but > that would mean *two-hundred* steps. With arrays one can do it in one > step. Store the two biggest values and sum each variable. Then at the > end of file check the condition for each variable and output the names > of variables meeting the condition. > > /* generate test data */ > data w ( drop = i j ) ; > array y ( * ) a1 - a10 b1 - b10 c1 - c30 ; /* 50 vars */ > do j = 1 to dim ( y ) ; y ( j ) = 5 ; end ; output ; > do i = 1 to 4 ; /* a little short of 2000 */ > do j = 1 to dim ( y ) ; > y ( j ) = ranuni (2947561) * 4.2 ; > end ; > output ; > end ; > run ; > > %let data = w ; /* setup problem */ > /* get array size */ > data _null_ ; > if 0 then set &data ; > array y (*) _numeric_ ; > call symput ( 'n' , left ( put ( dim ( y ) , 4. ) ) ) ; > stop ; > run ; > > data wanted ( keep = name ) ; > length name \$ 8 ; > set &data end = eof ; > array _y (*) _numeric_ ; /* the values */ > array _m (%eval(2*(&n))) ; /* hold top two values */ > array _s (&n) ; /* hold sum of values */ > retain _m _s ; > > /* save two biggest values for each variable */ > do i = 1 to dim ( _y ) ; > _s ( i ) + _y ( i ) ; > if _y ( i ) >= _m ( 2 * i - 1 ) then > do ; /* new maximum */ > _m ( 2 * i ) = _m ( 2 * i - 1 ) ; > _m ( 2 * i - 1 ) = _y ( i ) ; > end ; > else > if _y ( i ) >= _m ( 2 * i ) then /* new sub-maximum */ > _m ( 2 * i ) = _y ( i ) ; > end ; > > /* report at end of file */ > if eof then > do ; > do i = 1 to dim ( _y ) ; > if _m ( 2 * i - 1 ) + _m ( 2 * i ) >= .8 * _s ( i ) then > do ; > call vname ( _y ( i ) , name ) ; > output ; > end ; > end ; > end ; > run ; > > proc print data = wanted ; run ; > > Ian Whitlock > -------------------------------------------- Regards, John ----------------------------------------------------------- Dr John Whittington, Voice: +44 1296 730225 Mediscience Services Fax: +44 1296 738893 Twyford Manor, Twyford, E-mail: johnw@mag-net.co.uk Buckingham MK18 4EL, UK CompuServe: 100517,3677 ----------------------------------------------------------- ```

Back to: Top of message | Previous page | Main SAS-L page