Date: Fri, 12 Dec 2008 16:41:49 -0500
Reply-To: Paul Dorfman <sashole@BELLSOUTH.NET>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Paul Dorfman <sashole@BELLSOUTH.NET>
Subject: CombSort
The reason I am posting this is because I have developed a habit of
searching SAS-L for... my own code. Having been asked by a coworker about a
SAS implementation of the CombSort algorithm, I remembered I had posted it
on SAS-L, yet when I resorted to the habit in order to locate it, I could
only find a subsequent post of mine stating that I had posted it earlier.
Durn it! I need to be able to find not the reference to the real thing, but
the thing itself.
Luckily, it is easy to re-implement the algorithm anew. For those who
happen to have never heard of it, CombSort is related to the bubble sort
approximately as Shell sort is related to the insertion sort. To my
knowledge, it was first described in Byte magazine waaaay ago. The nice
thing about CombSort is that, although at N=1000000, it is 2-3 times slower
than iterative Quicksort, is it much shorter and uncomparably easier to
implement and get right. Moreover, at N=10000, the speed difference becomes
practically negligible.
SAS CALL SORTN/C routines are quite a bit quicker at large Ns, but again,
at N<~10000 the speed difference dwindles to non-essentiality, while
CombSort, being an exchange sorting algorithm, has the advantage of being
suited in the same guise for both numeric and character arrays. That is,
for most practical intents and purposes, it is both simple enough and quick
enough - unlike its bubble sort brethren which runs out of breath already
at N=1000 due to its O(N**2) nature.
Here is the "naked" DATA step code for sorting an array A ascending (for
descending, reverse the single inequality in the code):
do g = hbound (A) - 1 by 0 while (s or g > 1) ;
g = int (g / 1.3) ;
if g in (0 ) then g = 1 ;
else if g in (9, 10) then g = 11 ;
s = 0 ;
do j = lbound (A) to hbound (A) - g ;
k = j + g ;
if A[j] >= A[k] then continue ;
t = A[j] ;
A[j] = A[k] ;
A[k] = t ;
s = 1 ;
end ;
end ;
On my desktop (same I am using to type this), it sorts a completely
disordered numeric array with N=1000000 in 4 seconds flat (quicksort does
it just under 2). Or, one may prefer to encapsulate (omitting parameter
validation):
%macro combsort (arr =, order = <) ;
do __g = hbound (&arr) - 1 by 0 while (__s or __g > 1) ;
__g = int (__g / 1.3) ;
if __g in (0 ) then __g = 1 ;
else if __g in (9, 10) then __g = 11 ;
__s = 0 ;
do __j = lbound (&arr) to hbound (&arr) - __g ;
__k = __j + __g ;
if &arr[__j] &order &arr[__k] then continue ;
__t = &arr[__j] ;
&arr[__j] = &arr[__k] ;
&arr[__k] = __t ;
__s = 1 ;
end ;
end ;
drop __: ;
%mend ;
and then, for example:
data _null_ ;
array test [1000000] _temporary_ ;
do _n_ = lbound (test) to hbound (test) ;
test[_n_] = ranuni (1) ;
end ;
retain iWantItSorted 1 ;
if iWantItSorted then %combsort(arr = test) ;
sorted = 1 ;
do _n_ = lbound (test) to hbound (test) - 1 while (sorted) ;
if test[_n_ + 1] < test[_n_] then sorted = 0 ;
end ;
put sorted = ;
run ;
Kind regards
------------
Paul Dorfman
Jax, FL
------------