Date: Thu, 25 Jan 2001 23:55:02 -0000 sashole@bellsouth.net "SAS(r) Discussion" Paul Dorfman Re: Finding Lowest/Highest Scores Efficiently To: rlokhamp@hotmail.com text/plain; format=flowed

Robert,

In principle, your own solution does not look badly at all. Let me reproduce it if I understand it right. First, some test data (let us limit the sample to mere 5000 observations), then the solution:

*** test data; data a; array v(1:200); do i=1 to 5000; do j=1 to dim(v); v(j) = ranuni(1); end; output; end; run;

*** transpose-sort-transpose; data nw (keep=n w); array v(1:200); set a; n ++ 1; do j=1 to dim(v); w = v(j); output; end; run; proc sort; by n w; run; data s (keep=v1-v50 v151-v200); array v (1:200); do j=1 by 1 until (last.n); set nw; by n; v(j) = w; end; run;

This runs in (0.82+3.38+1.94)=6.14 CPU seconds under OS/390. Now let us turn to the solution, indeed elegant, proposed by Ludwig Boltzmann (and, with a significant distinction, by Mike Zdeb). He offers to execute the ORDINAL function 100 times in each observation: 50 times to get the lowest 50 scores, and 50 times - to get the highest ones:

data b (keep=v1-v50 v151-v200); set a; array v(1:200); array o(1:200) _temporary_; do j=1 to 50,151 to 200; o(j) = ordinal(j,of v(*)); end; do j=1 to 50,151 to 200; v(j) = o(j); end; run;

Cool, ain't it? Unfortunately, it just proves once again that not all is gold that glitters: Using ORDINAL is a typical example where a small number of elegant DATA step instructions translates into a heck of a job for the computer. And it should be no surprise, for the instruction ORDINAL(j,of v(*)), however short, makes the machine scan the entire array. As a result, the code above takes a whopping 158.42 CPU seconds to execute. Getting back to Mike's suggestion, if the *entire* array were to be sorted using ORDINAL (that is, the function were called 200 times per record), it would take twice the time.

Bob Burnham proposes to use the bubble sort. It is indeed quite simple to code, and performs noticeably better, for even though it has to make N scans through the array, it does not scan the *entire* array each time. I mimicked Bob's sort as

data s (keep=v1-v50 v151-v200); array v (1:200); set a; do i=hbound(v) to lbound(v) by -1; do j=1 to i; if v(i) => v(j) then continue; t = v(i); v(i) = v(j); v(j) = t; end; end; run;

It takes 48.49 CPU seconds to do the job - much better than either ORDINAL solution, but still a far cry from the "standard" transpose-sort-transpose. Interestingly, bubble sort, due to its very nature, has more potential than was realized above. Why is it call "bubble", anyway? Because each time the outer loop iterates, the next largest value "bubbles" to the top of the array. But if so, why should we execute the loop all 200 times when 50 times in one direction and 50 times in the other direction will suffice - we only care about 50 smallest and 50 largest values, so why sort the entire thing? Therefore, Bob essentially falls in the same trap as Mike. The amended bubble sort approach could look like this:

data s (keep=v1-v50 v151-v200); array v (1:200); set a; do i=hbound(v) to hbound(v)-49 by -1; do j=1 to i; if v(i) <= v(j) then continue; t = v(i); v(i) = v(j); v(j) = t; end; end; do i=hbound(v) to hbound(v)-49 by -1; do j=1 to i; if v(i) => v(j) then continue; t = v(i); v(i) = v(j); v(j) = t; end; end; run;

This code executes in 38.56 CPU seconds, thus saving a good chunk off the full-blown 48.49; however, it is still nowhere near the "standard" 6.14! So, it not possible to beat it, is it? Of course it is, but then, just as Bob notes, a principally different sorting method should be used. Perhaps any sort running in O(N*log(N)) time will be competitive. I will leave the sas-l public puzzled as to the nature of the SORT link routine below, but an attentive programmer should be able to recognize the algorithm readily. Those having succeeded tell those who have failed. Note that this is an example, in a sense *opposite* to the ORDINAL approach - it contains a lot of DATA step commands, however judging from the run-time, they must translate in a relatively few machine instructions:

data s (keep=v1-v50 v151-v200); array v (1:200); set a; link sort; return; sort: %macro x (i,j); do; t=v(&i); v(&i)=v(&j); v(&j)=t; end; %mend x; array z(0:1,0:50) _temporary_; l = lbound(v); r = hbound(v); do s=1 by 0 while (s); i=(l + r)*.5; if v(l) > v(i) then %x (l,i); if v(l) > v(r) then %x (l,r); if v(i) > v(r) then %x (i,r); p=v(i); v(i)=v(l); v(l)=p; i=l; j=r+1; do until (i => j); do i=i+1 by +1 until (v(i) => p); end; do j=j-1 by -1 until (v(j) <= p); end; if i < j then %x (i,j); end; v(l)=v(j); v(j)=p; if r-j => j-l > 9 then do s=s+1; z(0,s)=j+1; z(1,s)=r; r=j-1; end; else if j-l => r-j > 9 then do s=s+1; z(1,s)=j-1; z(0,s)=l; l=j+1; end; else if j-l > 9 => r-j then r=j-1; else if r-j > 9 => j-l then l=j+1; else do; l=z(0,s); r=z(1,s); s=s-1; end; end; do j=lbound(v)+1 to hbound(v); if v(j-1) > v(j) then do; p=v(j); do i=j-1 to lbound(v) by -1; if p => v(i) then leave; v(i+1)=v(i); end; v(i+1)=p; end; end; run;

Although this code performs a full-blown array sort, the entire thing takes 4.04 CPU seconds - about 50% improvement over the "standard". Surely it is not enough to justify coding the sort routine above from scratch (it is not difficult but takes *a lot* more time and programming expertise than putting the "standard" SAS code together). On the other hand, 1)one can get the code from sas-l (you just did) 2) with a much wider array and more records, the performance advantage would be much more significant.

Kind regards, ==================== Paul M. Dorfman Jacksonville, Fl ====================

>From: "robert lokhamp" <rlokhamp@hotmail.com>

>What do you think would be a more efficient way to do the following. I've >got a SAS dataset with 200 (numeric) scores, V1-v200. I need to move 50 >lowest scores to variables V1-v50, and 50 highest scores to V151-V200, and >drop the rest. The dataset has approximately 100,000 observations. So far, >I've been transposing the dataset 'vertically', then sorting it, then >transposing it back. At this point the dataset is like original but V1-V200 >are sorted in each obs. Then I would just drop the mid-vars. Gets the job >done but is pretty slow and feels kinda tacky. Any other ways (more >elegant, faster, etc)?