LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (March 1997, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Tue, 25 Mar 1997 15:45:11 +0000
Reply-To:     John Whittington <johnw@MAG-NET.CO.UK>
Sender:       "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
From:         John Whittington <johnw@MAG-NET.CO.UK>
Subject:      Re: How to calculate sum
Comments: To: whitloi1@WESTATPO.WESTAT.COM
Comments: cc: Junjia <JUNJIA@MORST.GOVT.NZ>
Content-Type: text/plain; charset="us-ascii"

On Fri, 14 Mar 1997, Ian Whitlock <whitloi1@WESTATPO.WESTAT.COM> posted a solution (attached below) to the question by Junjia <JUNJIA@MORST.GOVT.NZ> regarding summation across observations and determination of whether any two observations accounted for 80% or more of the total for any variable.

Since SAS DATA steps process data one observation at a time, it is usually much simpler to undertake arithmetic summary activities across variables (within observations) than across observations. In terms of coding simplicity, it is therefore often better to TRANSPOSE data for such exercises. The exercise is question is also aided by the SAS function ORDINAL() which facilitates extraction of the "Nth largest" (i.e. 2nd largest in this case), as well as maximum, value from a list of variables.

The following is much briefer (hence more rapidly written and debugged) code to achieve identical results to Ian's code; the PROC TRANSPOSE step clearly will add to overall execution time, although this is more than offset by simpler arithmetic/logic statements and determination of the number of variables *within* the same datastep, rather than by creation of a macrovariable in a separate DATA step.

%let data = w ; proc transpose data=&data out=johns ; run ;

data done (keep = _NAME_) ; set johns ; array _n(*) _numeric_ ; if ordinal(dim(_n), of _numeric_) + ordinal(dim(_n)-1, of _numeric_) >= .8 * sum(of _numeric_) ; run ;

proc print data=done; run ;

Using Ian's test dataset (and with the same proviso about all values being positive), this produces identical output to his much more lengthy code - and, in fact, even executes more quickly than Ian's code on my system:

IAN'S MINE preliminary step 0.16 secs (DATA) 0.17 secs (TRANSPOSE) main DATA step 0.55 secs 0.17 secs PROC PRINT 0.11 secs 0.11 secs --------------------------------------- TOTAL: 0.82 secs 0.45 secs

As always, this illustrates the diversity of possible approaches to the same problem when using SAS.

Regards

John

----------Ian Whitlock's previous solution ---------- > Subject: How to calculate sum > Summary: Save the two biggest values and check them out. > Respondent: Ian Whitlock <whitloi1@westat.com> > > Junjia <JUNJIA@MORST.GOVT.NZ> asks: > > >I have dataset with 100 variables and 2000 records. I want to calculate > >the total of 2000 records for each variable, and like to check if any two > >of records in 2000 account for 80% of total or not in each variable > >calculating. > > First off I hope all the values are non-negative. With negative values > even two numbers close to 0 might account for 80% of the sum (100 -100 > 1 1). With only non-negative values only the two biggest are the only > candidates. > > It is tempting to sort on each variable and add the top two values but > that would mean *two-hundred* steps. With arrays one can do it in one > step. Store the two biggest values and sum each variable. Then at the > end of file check the condition for each variable and output the names > of variables meeting the condition. > > /* generate test data */ > data w ( drop = i j ) ; > array y ( * ) a1 - a10 b1 - b10 c1 - c30 ; /* 50 vars */ > do j = 1 to dim ( y ) ; y ( j ) = 5 ; end ; output ; > do i = 1 to 4 ; /* a little short of 2000 */ > do j = 1 to dim ( y ) ; > y ( j ) = ranuni (2947561) * 4.2 ; > end ; > output ; > end ; > run ; > > %let data = w ; /* setup problem */ > /* get array size */ > data _null_ ; > if 0 then set &data ; > array y (*) _numeric_ ; > call symput ( 'n' , left ( put ( dim ( y ) , 4. ) ) ) ; > stop ; > run ; > > data wanted ( keep = name ) ; > length name $ 8 ; > set &data end = eof ; > array _y (*) _numeric_ ; /* the values */ > array _m (%eval(2*(&n))) ; /* hold top two values */ > array _s (&n) ; /* hold sum of values */ > retain _m _s ; > > /* save two biggest values for each variable */ > do i = 1 to dim ( _y ) ; > _s ( i ) + _y ( i ) ; > if _y ( i ) >= _m ( 2 * i - 1 ) then > do ; /* new maximum */ > _m ( 2 * i ) = _m ( 2 * i - 1 ) ; > _m ( 2 * i - 1 ) = _y ( i ) ; > end ; > else > if _y ( i ) >= _m ( 2 * i ) then /* new sub-maximum */ > _m ( 2 * i ) = _y ( i ) ; > end ; > > /* report at end of file */ > if eof then > do ; > do i = 1 to dim ( _y ) ; > if _m ( 2 * i - 1 ) + _m ( 2 * i ) >= .8 * _s ( i ) then > do ; > call vname ( _y ( i ) , name ) ; > output ; > end ; > end ; > end ; > run ; > > proc print data = wanted ; run ; > > Ian Whitlock > --------------------------------------------

Regards,

John

----------------------------------------------------------- Dr John Whittington, Voice: +44 1296 730225 Mediscience Services Fax: +44 1296 738893 Twyford Manor, Twyford, E-mail: johnw@mag-net.co.uk Buckingham MK18 4EL, UK CompuServe: 100517,3677 -----------------------------------------------------------


Back to: Top of message | Previous page | Main SAS-L page