Date: Fri, 21 May 2004 11:27:20 -0700
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "David L. Cassell" <cassell.david@EPAMAIL.EPA.GOV>
Subject: Re: Calculating equal proportions
Content-type: text/plain; charset=US-ASCII
Howard Schreier <Howard_Schreier@ITA.DOC.GOV> sagely replied:
> I don't see the need for the PROC RANK results.
One doesn't need it. I was just showing that one can use Paul's
suggestion to get the breakout for the base set by using the proc.
I guess I didn't make that clear enough.
> It's the output from
> UNIVARIATE which provides the cutpoints.
> Then it seems to me that the
> tricky part may be "current" scores which exactly equal the boundary
The boundary cases have to be decided by the end-user. That is, the
user has to decide where to place a value of, say, 456.7 if that happens
to be the cutpoint. Usually you specify (ahead of time, in detail) that
values hitting the cutpoint go into the lower bin. Or the upper bin.
That is, you write something like:
if X <= p_10 then bin = 1;
else if X <= p_20 then bin = 2;
I said "something like" in my paragraph above, because the code here
quickly degenerates into the dreaded 'wallpaper code' which causes us
to suffer THE WRATH OF IAN. :-)
> The PCTLDEF= option in UNIVARIATE may be important.
Unfortunately, I am going to disagree with you here. Is this the first
time I ever disagreed with you in sAS-L? :-) :-) The PCTLDEF=
option is likely to be useless in making the decision on the binning.
It doesn't determine where to put a value which matches the quantile,
and all the values of the quantile can be the same, regardless of the
value of PCTLDEF. Here's a little snippet of code to prove my point:
input x @@;
1 1 1 1 1 1 1 1 1 2 3 4 5 6 7 8 47
%do i = &START %to &FINISH;
proc univariate noprint data=temp1 pctldef= &I ;
output out=outdef&I pctlpre=p_ pctlpts=25,50,75 ;
set outdef1 outdef2 outdef3 outdef4 outdef5;
pctl_def = _n_ ;
proc print data=all noobs; run;
Now we get the following output:
p_25 p_50 p_75 pctl_def
1 1 4.75 1
1 1 5.00 2
1 1 5.00 3
1 1 5.50 4
1 1 5.00 5
So we see that the PCTL_DEF value may not change the value of
any of the breakpoints where there are ties, and that those values
will be cases where a decision on the binning has to be done by
the user (or the user's Pointy-Haired Boss).
Hey! I just realized. The above macro has parameters, but the
parameters are *stupid*. By the design of the macro, we want the
%do-loop to go from 1 to 5, no matter what. So this could be written
without any parameters, and it wouldn't really suffer as a result.
But I'm not submitting a copy of this post to the 'scope of macro
variables' thread still going on.
David Cassell, CSC
Senior computing specialist