|
I realized that my statement about lower percentiles (ranks) with PROC
RANK than given in PROC UNIVARIATE may be due to PROC UNIVARIATE
displaying the mid-point of the percentile is incorrect.
The specific case of which I was thinking when discussing this was that
for another measure, PROC RANK places the value of 8.02139 in the 73rd
percentile (of a 26-observation dataset), while the PROC UNIVARIATE
lists that exact value as the number for the 75th percentile of the
distribution.
-----Original Message-----
From: Kirby, Ted
Sent: Monday, March 14, 2011 1:24 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: PROC RANK Percentiles vs. PROC UNIVARIATE Percentiles
Is there a way to have PROC RANK percentiles (i.e., PROC RANK groups=100
as indicated in the documentation for PROC RANK) to match percentiles
generated by PROC UNIVARIATE on relatively sparse datasets? For example
I have a 30-observation dataset in which we want to report to each
member their percentile rank for a particular measure of performance.
Using PROC RANK I can generate the following table (sorted by
percentile):
Measure Percentile
6 4
6 4
15 9
19 12
40 16
43 19
59 24
59 24
70 29
76 32
97 35
103 38
109 41
111 45
116 48
118 51
119 56
119 56
131 62
131 62
158 67
165 70
178 74
299 77
310 80
334 83
358 87
825 90
1,589 93
1,614 96
However, the PROC UNIVARIATE distribution for this dataset is as
follows:
Quantile Estimate
100% Max 1614.0
99% 1614.0
95% 1589.0
90% 591.5
75% Q3 178.0
50% Median 117.0
25% Q1 59.0
10% 17.0
5% 6.0
1% 6.0
0% Min 6.0
Notice the PROC RANK puts 825 into the 90th percentile, but PROC
UNIVARIATE indicates that the 90th percentile is 591.5.
As I was typing this message, I think I figured it out. The 90th
percentile can be considered as a RANGE OF VALUES and thus 825 IS in the
90th percentile because it is between 591.5 and <some other value> that
would be the 91st percentile of this distribution. Is that correct?
Another question that occurred to me as I was typing is:
Are the values in the Quantile section of the PROC UNIVARIATE output the
lower bound, mid-point or upper bound of the percentile?
The reason I ask is that have other examples where the PROC RANK
percentile is lower than the PROC UNIVARIATE percentile and that would
be explained if the values in Quantile section were the midpoints of the
percentile.
(I know that there may be a difference of one in the percentiles
generated by the two procedures since PROC RANK uses 0 to 99 as the
minimum and maximum ranks, but PROC UNIVARIATE goes from 0 to 100 when
reporting percentiles. However, the discrepancies in percentiles/ranks
about which I am concerned are greater than one.)
************* IMPORTANT - PLEASE READ ********************
This e-mail, including attachments, may include confidential and/or proprietary information,
and may be used only by the person or entity to which it is addressed. If the reader of this
e-mail is not the intended recipient or his or her authorized agent, the reader is hereby
notified that any dissemination, distribution or copying of this e-mail is prohibited. If you
have received this e-mail in error, please notify the sender by replying to this message
and delete this e-mail immediately.
|