ML

A ti­ny im­pro­ve­ment to the heap sort al­go­rithm

Quick sort al­go­rithm and de­ri­va­ti­ves need a lot of ele­ment ex­chan­ge ope­ra­ti­ons. Whi­le this is not so im­por­tant when de­a­ling with ar­rays, it be­co­mes a lit­tle bit of a prob­lem with lin­ked lists, es­pe­ci­al­ly the sin­gle-lin­ked ones.

Ano­ther ad­van­ta­ge is that heap sort does not in­vol­ve re­cur­ren­ce which im­pro­ves me­mo­ry usa­ge.

Wor­king with the c-toolkit li­bra­ry, I cho­se the heap sort al­go­rithm for sor­ting lin­ked lists. They were sin­gle lin­ked, for me­mo­ry ef­fi­cien­cy aga­in. It tur­ned out that the ori­gi­nal pro­ce­du­re can be im­pro­ved to make less com­pa­ri­sons, and with mi­ni­mum ex­tra co­de.

Clas­sic heap sort

The clas­sic heap sort al­go­rithm ta­kes one ele­ment off the ori­gi­nal list, and lo­oks for its new po­si­tion com­pa­ring ele­ments one by one:

Classic heap sort alhorithm

C im­ple­men­ta­tion of the al­go­rithm may look like this (i is the cur­rent ele­ment we are in­ser­ting that is -- blue on the graph):

void heapsort (Object *list, Compare comp_fun)
{
Object i, *prev, next;
if (*list == NULL)
return;
i = (*list) -> next;
(*list) -> next = NULL;
while (i != NULL)
{
next = i -> next;
prev = list;
while ((*prev != NULL) && (comp_fun (i -> data, (*prev) -> data) == 0))
prev = &((*prev) -> next);
i -> next = *prev;
*prev = i;
i = next;
}
}

The cor­rec­tion

It is ob­vi­ous, that the pro­ce­du­re can be sped up by skip­ping over a num­ber of ele­ments ra­ther than che­cking them one by one. The cor­rec­ted al­go­rithm jumps over step = nodes/div ele­ments (where nodes is the num­ber of ele­ments in the sor­ted list and div is con­stant). If step be­co­mes small eno­ugh (les­ser than slimit) the pro­gram swit­ches back to the clas­sic heap sort.

The improved heap sort alhorithm

Here we have made only 9 com­pa­ri­sons in­stead of 18 for the ori­gi­nal heap sort.

C code im­ple­men­ta­tion of this al­go­rithm may look like this:

void heapsort2 (Object *list, Compare comp_fun)
{
Object i, *j, *prev, next;
unsigned int nodes, step, n;
if (*list == NULL)
return;
nodes = 1;
i = (*list) -> next;
(*list) -> next = NULL;
while (i != NULL)
{
next = i -> next;
prev = list;
step = nodes;
loop:
step /= div;
if (step < slimit)
while ((*prev != NULL) && (comp_fun (i -> data, (*prev) -> data) == 0))
prev = &((*prev) -> next);
else
if ((*prev != NULL) && (comp_fun (i -> data, (*prev) -> data) == 0))
{
j = prev;
do
{
j = &((*j) -> next);
prev = j;
for (n = step - 1; ((*j) != NULL) && (n > 0); n --)
j = &((*j) -> next);
}
while ((*j != NULL) && (comp_fun (i -> data, (*j) -> data) == 0));
goto loop;
}
i -> next = *prev;
*prev = i;
nodes ++;
i = next;
}
}

The abo­ve code in­tro­du­ces one more en­han­ce­ment. Sup­pose step is still a large num­ber and the mat­ching ele­ment is fo­und be­fore any step has been made. The first ver­sion of the pro­gram would di­vide step by div, com­pare, di­vide and so on, un­til the con­dition step < slimit is true.

The improved heap sort alhorithm

Hence the large if(comp_fun... block was added.

Va­lues of slimit and div

Ex­pe­ri­ments sho­wed that the op­ti­mal va­lue of div is be­tween 3 and 5 (in­clu­sive), and for slimit it is 2.

Re­sults

The fol­lo­wing ta­ble pre­sents the num­ber of com­pa­ri­sons for three sets of 1,000 numbers: ran­dom, sor­ted and sor­ted in the re­ver­se or­der. div = 4.
slimitRandomSortedReverse
1 13,541 7,459 999
2 13,290 7,459 999
3 13,467 7,577 999
4 14,096 7,819 999
5 14,385 7,987 999
6 14,851 8,211 999
7 15,405 8,491 999
classic
heap
sort
250,890 499,500 999