|
A tiny improvement to the heap sort algorithm
Quick sort algorithm and derivatives need a lot of element exchange operations. While this is not so important when dealing with arrays, it becomes a little bit of a problem with linked lists, especially the single-linked ones.
Another advantage is that heap sort does not involve recurrence which improves memory usage.
Working with the c-toolkit library, I chose the heap sort algorithm for sorting linked lists. They were single linked, for memory efficiency again. It turned out that the original procedure can be improved to make less comparisons, and with minimum extra code.
Classic heap sort
The classic heap sort algorithm takes one element off the original list, and looks for its new position comparing elements one by one:
C implementation of the algorithm may look like this (i is the current element we are inserting that is -- blue on the graph):
void heapsort (Object *list, Compare comp_fun)
{
Object i, *prev, next;
if (*list == NULL)
return;
i = (*list) -> next;
(*list) -> next = NULL;
while (i != NULL)
{
next = i -> next;
prev = list;
while ((*prev != NULL) && (comp_fun (i -> data, (*prev) -> data) == 0))
prev = &((*prev) -> next);
i -> next = *prev;
*prev = i;
i = next;
}
}
The correction
It is obvious, that the procedure can be sped up by skipping over a number of elements rather than checking them one by one. The corrected algorithm jumps over step = nodes/div elements (where nodes is the number of elements in the sorted list and div is constant). If step becomes small enough (lesser than slimit ) the program switches back to the classic heap sort.
Here we have made only 9 comparisons instead of 18 for the original heap sort.
C code implementation of this algorithm may look like this:
void heapsort2 (Object *list, Compare comp_fun)
{
Object i, *j, *prev, next;
unsigned int nodes, step, n;
if (*list == NULL)
return;
nodes = 1;
i = (*list) -> next;
(*list) -> next = NULL;
while (i != NULL)
{
next = i -> next;
prev = list;
step = nodes;
loop:
step /= div;
if (step < slimit)
while ((*prev != NULL) && (comp_fun (i -> data, (*prev) -> data) == 0))
prev = &((*prev) -> next);
else
if ((*prev != NULL) && (comp_fun (i -> data, (*prev) -> data) == 0))
{
j = prev;
do
{
j = &((*j) -> next);
prev = j;
for (n = step - 1; ((*j) != NULL) && (n > 0); n --)
j = &((*j) -> next);
}
while ((*j != NULL) && (comp_fun (i -> data, (*j) -> data) == 0));
goto loop;
}
i -> next = *prev;
*prev = i;
nodes ++;
i = next;
}
}
The above code introduces one more enhancement. Suppose step is still a large number and the matching element is found before any step has been made. The first version of the program would divide step by div , compare, divide and so on, until the condition step < slimit is true.
Hence the large if(comp_fun... block was added.
Values of slimit and div
Experiments showed that the optimal value of div is between 3 and 5 (inclusive), and for slimit it is 2.
Results
The following table presents the number of comparisons for three sets of 1,000 numbers: random, sorted and sorted in the reverse order. div = 4 .
slimit | Random | Sorted | Reverse |
1 | 13,541 | 7,459 | 999 |
2 | 13,290 | 7,459 | 999 |
3 | 13,467 | 7,577 | 999 |
4 | 14,096 | 7,819 | 999 |
5 | 14,385 | 7,987 | 999 |
6 | 14,851 | 8,211 | 999 |
7 | 15,405 | 8,491 | 999 |
classic heap sort | 250,890 | 499,500 | 999 |
|