LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (August 2007, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Fri, 24 Aug 2007 13:50:56 -0400
Reply-To:     Muthia Kachirayan <muthia.kachirayan@GMAIL.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Muthia Kachirayan <muthia.kachirayan@GMAIL.COM>
Subject:      Re: SAS data step compute COUNTs and Percentages
Comments: To: Tom White <tw2@mail.com>
In-Reply-To:  <20070822190444.4DA90478106@ws1-5.us4.outblaze.com>
Content-Type: text/plain; charset=ISO-8859-1

Tom,

The Proc Freq could be the best choice to solve your problem. In view of 7 Million records, you may have or have met 'out of memory' problem.

In case of memory limitation, the following data step program may help you. The steps include:

1. Proc format is used to transform selected CODEs into a number(char) by the informat code2i. The label 'N' is used to represent those CODEs you are not interested in. The numbers will be used as subscript of the array, k[], later.

2. Proc format is similarly used to do the opposite by i2code. as this required to display.

3. Four DO-loops are used to process the records, which are commented in the code below.

4. You may turn on SASFILE option if you gain processing time.

5. The output will include nonmissing and non-zero Rows only.

data a; input code $; cards; NULL 59510 J2505 59400 J1745 J3490 27447 A0431 FACIL J9355 J9310 E0784 ; run;

** Transform code to number; data fmt (rename=(code=start)); retain fmtname 'code2i' type 'C' ; set a end = eof; label = put(_N_,best.); output; if eof then do; HLO = 'O'; label = 'N'; output; end; run; proc format cntlin = fmt; quit;

** Transform number to code; data fmt ; retain fmtname 'i2code'; set a end = eof; start = _N_; label = code; output; if eof then do; HLO = 'O'; label = 'MISS'; output; end; run;

proc format cntlin = fmt; quit; data given; input id $ code $; cards; 1 J3490 1 NULL 1 36415 1 88142 1 59510 2 99213 2 80053 2 J2505 3 FACIL 3 J3490 4 99232 5 80053 5 E0784 5 E0784 5 J9310 6 A0431 7 59400 7 59400 8 NULL 8 NULL 8 NULL 8 NULL 9 27447 9 J9355 10 E0784 10 NULL 10 E0784 10 NULL 10 J2505 10 59510 10 NULL 10 NULL 10 E0784 10 59510 ; run;

proc sort data = given; by id; run;

** SASFILE given OPEN; data match(keep=id code sum_count within_pct overall_pct); array id_tot[10] _temporary_; ** do-loop to accumulate the total for each ID; do until(eof); set given end = eof nobs = numobs; idn = input(id, 8.); if prev ne idn then do; prev = idn; id_tot[idn] = 1; end; else id_tot[idn] ++ 1; end; ** do-loop to add total for each CODE within a given ID; do until(eof2); array k[12] _temporary_; do until(last.id); set given end = eof2; by id; if put(code, $code2i.) = 'N' then continue; else do; i = input(put(code,$code2i.),best.); k[i] ++ 1; end; end; ** do-loop to output percents at ID break; do i = 1 to 12; idn = input(id, 8.); ** ignore missing, k[i] = 0 and id_tot[idn] = 0; if k[i] > 0 and id_tot[idn] > 0 then do; code = put(i,i2code.); sum_count = k[i]; within_pct = k[i] / id_tot[idn]; overall_pct = k[i] / numobs; output; end; end; ** do-loop to intialize counters for the next ID; do i = 1 to 12; k[i] = 0; end; end; if eof2 then stop; run; ** SASFILE given CLOSE;

proc print data = match; run;

We move to Proc Freq approach, which takes few lines of code.

The use of informat, code2i. , can be use subset rows when PROC FREQ is used. The output of the table can be sent to a file as:

ods output CrossTabFreqs = wanted; proc freq data = given ; where put(code,$code2i.) ne 'N'; tables id * code/nocol nocum ; run; ods output close;

proc print data = wanted; where _TYPE_ = '11'; run;

I appreciate your feedback on the use of these approaches.

Muthia Kachirayan


Back to: Top of message | Previous page | Main SAS-L page