LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (May 2004, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Fri, 21 May 2004 11:27:20 -0700
Reply-To:     cassell.david@EPAMAIL.EPA.GOV
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "David L. Cassell" <cassell.david@EPAMAIL.EPA.GOV>
Subject:      Re: Calculating equal proportions
Content-type: text/plain; charset=US-ASCII

Howard Schreier <Howard_Schreier@ITA.DOC.GOV> sagely replied: > I don't see the need for the PROC RANK results.

One doesn't need it. I was just showing that one can use Paul's suggestion to get the breakout for the base set by using the proc. I guess I didn't make that clear enough.

> It's the output from > UNIVARIATE which provides the cutpoints.

Right.

> Then it seems to me that the > tricky part may be "current" scores which exactly equal the boundary > values. The boundary cases have to be decided by the end-user. That is, the user has to decide where to place a value of, say, 456.7 if that happens to be the cutpoint. Usually you specify (ahead of time, in detail) that values hitting the cutpoint go into the lower bin. Or the upper bin. That is, you write something like:

if X <= p_10 then bin = 1; else if X <= p_20 then bin = 2; . . .

I said "something like" in my paragraph above, because the code here quickly degenerates into the dreaded 'wallpaper code' which causes us to suffer THE WRATH OF IAN. :-)

> The PCTLDEF= option in UNIVARIATE may be important.

Unfortunately, I am going to disagree with you here. Is this the first time I ever disagreed with you in sAS-L? :-) :-) The PCTLDEF= option is likely to be useless in making the decision on the binning. It doesn't determine where to put a value which matches the quantile, and all the values of the quantile can be the same, regardless of the value of PCTLDEF. Here's a little snippet of code to prove my point:

data temp1; input x @@; datalines; 1 1 1 1 1 1 1 1 1 2 3 4 5 6 7 8 47 ; run;

%macro pctls(start=1,finish=5); %local i; %do i = &START %to &FINISH; proc univariate noprint data=temp1 pctldef= &I ; var x; output out=outdef&I pctlpre=p_ pctlpts=25,50,75 ; run; %end; %mend pctls;

%pctls(start=1,finish=5)

data all; set outdef1 outdef2 outdef3 outdef4 outdef5; pctl_def = _n_ ; run;

proc print data=all noobs; run;

Now we get the following output:

p_25 p_50 p_75 pctl_def

1 1 4.75 1 1 1 5.00 2 1 1 5.00 3 1 1 5.50 4 1 1 5.00 5

So we see that the PCTL_DEF value may not change the value of any of the breakpoints where there are ties, and that those values will be cases where a decision on the binning has to be done by the user (or the user's Pointy-Haired Boss).

Hey! I just realized. The above macro has parameters, but the parameters are *stupid*. By the design of the macro, we want the %do-loop to go from 1 to 5, no matter what. So this could be written without any parameters, and it wouldn't really suffer as a result. But I'm not submitting a copy of this post to the 'scope of macro variables' thread still going on.

David -- David Cassell, CSC Cassell.David@epa.gov Senior computing specialist mathematical statistician


Back to: Top of message | Previous page | Main SAS-L page