Sorting
Home
Introduction
Philosophy
General techniques
Sorting
Searching
Factory
Persistence
Logging
Streaming
Tokenizers
Parsing
File Searching
Command
PseudoPatterns
Compiling
Downloads
FeedBack

By Willy Van den Driessche

It started with “Sorting collections” ...

This is a story about the birth of my sorting routines.  Since I have a verbose writing style, It will be a long story.

One day I noticed that in our project we hard al lot of sorting to do. We had many custom collection classes and they all needed a way to sort them. Up till then we used the storage classes to perform the searching (ORDER BY in SQL clause).  It was my idea that once you have a collection in memory, it is a terrible waste to throw it away only to retrieve it later in an another order. I therefore took off to write some classes for the following requirement :

“Build something that is able to sort any collection”.

Since sorting is such a common thing, I took a look at far cousin C, where I knew there is a predefined function to perform a QuickSort.  The function prototype goes like this :

void   qsort(void *, size_t, size_t, int (*)
       (const void *, const void *));

For those of us able to read C : there are mainly 2 parts in this declaration : the first part defines the array to be sorted.  At which address does it start, how large are its items, how many items are there ?  The second part defines a function that is used to compare two elements in the array.

If we compare this to our requirements, we see that we have the same two parts.  In our case, the first argument will always be a collection.  After all, the requirement is to sort a collection. The second argument is a function pointer. While there are ways to simulate passing functions around, VB actually offers a much better approach.   All we need to do is define an interface for such functions.   (technically, its an interface for an object, not a function, but since these objects are actually functors, we choose not to be too sharp in terminology).

Let’s have a close look at the prototype of the C function :

int (*) (const void *, const void *)

Comparers

You should read this as “ a function that takes 2 arguments of any type and returns an integer”. With this in mind, the definition of our comparer interface is actually quite simple :

Interface iComparer
public function Compare (ByVal varFirst as Variant, ByVal varSecond as Variant) as long

So, a comparer takes two variants, compares them and returns :

  • a negative long if first < second
  • a positive long if first > second
  • zero if first = second

Having the comparer return a long is debateable, but it has the advantage of having the relative order of two items in one function call. Having variant arguments is even more debateable.  I use variants because in VB6 a variant means ‘anything’. In .NET this will probably change to Object.

After defining the iComparer interface, I was able to write the sorting routine (it was a QuickSort)

Public Sub QuickSortCollection(ByVal objCollection As Collection, ByVal objComparer As iComparer)

I was pleased with this implementation.  However, there are some important remarks to be made :

  • Collections are black boxes, as far as VB programmers are concerned. Internally a VB collection is most probably a doubly linked list with an associated red/black tree that keeps the keys.  There is no doubt that swapping items in such a datastructure is suboptimal (to say the least).  However, in my defense, I must say that there really is no other way to do this (without hacking using implementation details you are not supposed to know or use) I got a lot of critique saying that collections are not something to be sorted.  But “Collection” is just an interface and there is nothing preventing me from giving it another implementation when I would like to (allthough VB cannot implement the required interfaces - easily)
  • Collections provide no means whatsoever to get to the keys.  If you swap two items in the collection, you’ll lose the keys.  Again, there is absolutely nothing you can do about it (allthough in general it can be fixed, as we’ll see later)

Apart from these remarks, my problem was solved and I was a happy man (I was before too off course). Since I had the intention of writing a reusable component, I figured it made sense to define some standard Comparers.

Standard Comparers

I started by defining a comparer for every built-in type : cBooleanComparer, CCurrencyComparer, CStringComparer....

Now the only thing that is of interest to the user of the sorting component is the iComparer interface of these classes.  This is why I decided to make all these classes private and provide factory methods for each of them.  These factory methods go like this :

' CreateStringComparer :
' ----------------------
'  Creates a string comparer that is configured according to the
'  parameter.
'  This utility function reduces the need to know about comparers
Public Function CreateStringComparer(Optional ByVal eCompareMethod As VbCompareMethod = vbTextCompare) As iComparer
   Dim objResult   As cCompStr
  
   Set objResult = New cCompStr
   objResult.CompareMethod = eCompareMethod
   Set CreateStringComparer = objResult
End Function
' CreateIntegerComparer :
' ----------------------
'  Creates an integer comparer.
'  This utility function reduces the need to know about comparers
' Note : in principle this could be implemented as a singleton
Public Function CreateIntegerComparer() As iComparer
   Set CreateIntegerComparer = New cCompInt
End Function

Now, somebody wanting to sort a collection of Integers, wouldn’t have to write any comparer at all.  He could reuse the existing comparer and write code that goes like this :

QuickSortCollection myIntegerCollection, CreateIntegerComparer

The comparers for the built-in types are the most obvious candidates to be reused. However, there are other candidates.  Often you want to be able to sort ascending as well as descending.  Therefore I created an InverseComparer, which decorates an existing comparer.  I created the same factory method for this type of comparer.  Sorting an integer collection descending became :

QuickSortCollection myIntegerCollection, CreateIntegerComparer

Another comparer made it into the standard comparers.  It was a little less obvious to discover. Suppose you have a collection of custom objects, say instances of cCustomer. You have written a cCompCustomerByName and a cCompCustomerByAddress class.  Now you want to sort your collection by name and then by address.  How do you do that ?  Until now you had to write a new comparer for that. Therefore we create a Composite comparer cCascadingComparer which takes any number of other comparers and use their results in order.  If two items are ‘the same’ when using the first comparer, the second comparer is used to compare them, and so on for any number of subComparers.

I created a factory method for the cascading comparer so that I was able to write code like this :

QuickSortCollection myCustomerCollection,
                               CreateCascadingComparer (new cCompCustomerByName, new cCompCustomerByAddress)

One final standard comparer made it through. It is another decorating comparer but this time used for debugging only.  cTimedComparer will ‘wrap’ another comparer and tell you how many times the comparsion was called and how much time it spend inside it.

QuickSortCollection myCustomerCollection, CreateTimedComparer (new cCompCustomerByName)

Overview of comparers:

At this time, I was done with my solution, since it fullfilled the requirements.

Sorting arrays

Then one day, one of my collegues asked if I had a routine to sort an array in VB. Clearly, I did not.  What was wrong ? Nothing at all. My little routine perfectly fulfilled its requirements.  Only, the requirements where to sort a collection, not an array.

When I looked at my code, I immediately saw that it was very simple to change the QuickSortCollection in a QuickSortArray routine. However, I was very reluctant to do that.  It would probably have been much more performant to do so.  But I have always learned that the only reasonable numbers are zero, one and infinity. So it’s allright to have a sorting routine that can sort 1 type of container or all types of containers.  It doesn’t make me feel good to have a routine that is capable of sorting two types of container.

I was clearly missing something and therefore I started by reformulating my requirements as :

“Build something that is able to sort anything that is like a collection”.

SortableContainers

What does that mean ? “Like a collection”. Well, that’s a very hard question. You can come up with several solutions for this statement.  I’ll point out that they have a related but separated solution space. I’ll show you why I decided to keep my solution in a minute.

I came up with the concept of a sortableContainer, which had 3 requirements :

  • being able to tell how many elements it contains
  • being able to retrieve an element at a given position
  • being able to swap elements at two given distinct positions

In VB, this translates in the following interface :

interface ISortableContainer
Public Property Get Count () As Long
Public Property Get Item (ByVal nPos As Long) As Variant
Public Sub Swap (ByVal nPosFrom As Long, ByVal nPosTo as Long)

Now, the first thing you might notice is that this interface definition is assymmetric.  You are able to retrieve an item at a given index, but you are not able to set it at a given index.  As I said before, this is a design choice and other design choices work just as well but allow for other solutions. You will see shortly why I chose the interface with the Swap method.

OK, now that we have the definition of an ISortableContainer, we still have to create implementations for it.  I started out with the two most wanted solutions : cSortableCollection and cSortableArray. The code for the collection is here below :

' Usage : this class is a typical 'adapter class' which
' will adapt a collection to behave as a sortable container.
' A sortable container is something which can be sorted.
' By using this class you are able to sort the items
' in a collection (in place sorting).
Private mCollection     As Collection
Implements iSortableContainer
' Collection :
' ------------
' Use this property to initialize the collection before
' and retrieve the collection after the sort.
Public Property Get Collection() As Collection
   Set Collection = mCollection
End Property
Public
Property Set Collection(Byval objVal As Collection)
   Debug.Assert Not (objVal Is Nothing)
   Set mCollection = objVal
End Property

' iSortableContainer_Count :
' -----------------------------
' Return the number of elements in the container.
' For a collection this is just the count of the collection
Private Property Get iSortableContainer_Count() As Long
  Debug.Assert Not (mCollection Is Nothing)
 
  iSortableContainer_Count = mCollection.Count
End Property

' iSortableContainer_Item :
' ----------------------------
'  Return the (1 based) element at the given position.
'  For a collection this is just the item method
'  Note the use of the varassign routine since we don't know wether
'  we are dealing with objects or base types.
Private Property Get iSortableContainer_Item(ByVal nPos As Long) As Variant
  Debug.Assert Not (mCollection Is Nothing)
  Call VarAssign(iSortableContainer_Item, mCollection(nPos))
End Property

' iSortableContainer_Swap :
' ----------------------------
'  Swap the items at the given positions.
'  As you can see the implementation is somewhat cumbersome
Private Sub iSortableContainer_Swap(ByVal nPos1 As Long, ByVal nPos2 As Long)
   Dim vTemp As Variant
      
   With mCollection
       Call VarAssign(vTemp, .Item(nPos1))
       .Add .Item(nPos2), , , nPos1
       .Remove nPos1
       .Add vTemp, , , nPos2
       .Remove nPos2
   End With
End
Sub

The code for cSortableArray is very similar to the code for the collection.

With these container implementations, I was able to take the code for QuickSortCollection and rework it to operate on iSortableContainers. I created the routine QuickSortContainer.  The funny thing is I was able to keep my QuickSortCollection routine, but now as a façade for the ‘real sorting routine’.

public sub QuickSortCollection (byval objCollection as collection, byval objComparer as IComparer)
     QuickSortContainer CreateCollectionContainer(objCollection), objComparer
end sub

Standard SortableContainers

Note that these two iSortableContainers are applications of the Adapter design pattern.  Since we don’t have the source code for the Collection class, it is impossible to make it implement the ISortableContainer interface.  (It is possible, if you use a technique call COM aggregation.  This technique, however, requires some non-trivial under-the-hood techniques that I prefer not to use unless I’m forced to)  For the arrays, an adapter is the only possible option for implementing this solution.

As for the comparers, I didn’t stop with these two standard iSortableContainer implementations.  The most useful other implementation is cSortablePermutation which is a decorator for another ISortableContainer.  However, the decoree is never modified. Instead, the cSortablePermutation keeps an internal “array of indices” which it swaps instead of swapping the items in the original container. You might be wondering, why you need such a funny sortable container.  The answer is simple.  With such a container you never modify the original container and therefore it is the perfect addition for collections.  Remember that collections loose their keys when you sort them ? Well, if you use the resulting “array of indices” (a cPermutation object actually) to access the original collection, you circumvent this problem alltogether. Furthermore, this solution allows you to have many ”indexes” on a collection ( or an array for that matter).

    Note that this solution was not possible with a symmetric definition of ISortableContainer.  Thanks to the swap method we are able to work on the permutation instead of on the real container. If, instead there would have been a ‘Set Item’ method a permutation would be out of the question.

Two other related standard implementations for iSortableContainer are also decorators: cSortableFromEnd will reverse the underlying container. This gives you another way to sort a container in reverse order.  cSortableSubRange takes another container and only exposes a subrange of it.  This would allow you to sort the first five elements in a collection. The usage of these containers might seem a little bit far-fetched but they are absolutely necessary for the Searching sequel to this story.  (a little in advance : they let you search from back to front and let you search subranges)

In analogy with the comparers, I also wrote a cTimedContainer which decorates another Container.  It counts and times the number of accesses to each function of the underlying container .  It can be very instructive to see the number of swaps in your container.

For pure proof-of-concept purposes, I also implemented a cStringContainer, which will consider a string as a container of characters. You can therefore sort the characters in a string.

SortAlgorithms

Now you might think that I stopped right there.  Well, I didn’t. The reason is that the code required to sort a simple array or collection is not as simple as I would like it. See the code below and move your eyebrows :

QuickSortContainer CreateCollectionContainer(myCollection), CreateIntegerComparer
QuickSortContainer CreateArrayContainer(myArray),  CreateIntegerComparer

I therefore introduced some Façade methods (remember that I had one allready : QuickSortCollection).  These shorten the code above to :

QuickSortCollection myCollection, CreateIntegerComparer
QuickSortArray myArray, CreateIntegerComparer

Now even there I didn’t stop.  Up until now, I had been doing quicksorts.  Quicksort has some nice performance properties.  But it has some bad properties as well. The worst performance is O(n2), which is something that rarely occurs but it does. Furthermore quicksort is known to be unstable (I.e. you never know where to items that are the ‘same’ for the comparer might end up relatively to each other). For some cases, a quicksort is just plain overkill.  (Containers that are almost sorted come to mind)

I therefore took a dangerous step : I abstracted the sort algorithm as well. It was not difficult to come up with a defining interface :

Interface ISortAlgorithm
Public Sub SortContainer( ByVal objContainer As iSortableContainer, ByVal objComparer As iComparer)
' Preconditions : objContainer is not nothing
'                objComparer is not nothing
' PostCondition : objContainer is sorted according to objComparer
End Sub

People have been killed for lesser things. This is the point where most people will accuse you of overdesigning things.  They sure have a point there. I  did it because it allows you to define decorators for existing sorting algorithms.

After defining the interface for sorth algorithms, I implemented some obvious instances : cQuickSorthAlgorithm, cShellSortAlgorithm, cInsertionSortAlgorithm, cHeapSortAlgorithm, cSelectionSortAlgorithm and cBubbleSorthAlgorithm. I also did the obligatory decorator cTimedAlgorithm.

With the necessary Façade methods in place, you now pick your comparer, your container and your sorting algorithm. These 3 more-or-less orthogonal dimensions guarantee that any addition you do to any of the dimensions results in a plethora of possible working combinations.  I think that’s a good sign.

Final remarks

To make for a real proof of concept, I built a little testing application. I created a form with various vertical Line objects on it and implemented the iSortableContainer for it. That way you can visually see the sorting algorithms work on the form; much like the threaded sorting demo in Delphi (but without the threading).

Now for the final remarks :

  • Is this a final component ?  Yes it is.  It allows for extensions by anybody who uses the component. I has a façade for those who don’t want to dive into the gory details.
  • The implementation is not especially speedy .  Due to the many abstractions, you risk loosing an order of magnitude over a handcrafted solution. Most of the time, abstraction’s penalty is performance.
    However, there is little inherently non-performant in the concept.  One could imagine implementations that do some address juggling to speed things up. (read all about my little
    benchmarks)
  • This implementation is highly portable. I have ported it to Delphi, where it also works fine.  There should be not problem porting it to C++, VB.NET, C# or Java.
  • This component was meant to do in-memory sorting.  It is not meant to do distributed of disk sorting. Furthermore, there are some sorting algorithms (like merge sort) that don’t fit in very well.  These sorting algorithms generally don’t sort in-place and need extra temporary space.

I you liked this story, then you can read on with the searching story on the same site. I you would like to have a look at the code, you can dowload it here.

related :

[BenchMarks]
[
Sorting Remarks]
[
James Barbetti Implementations]

Subitems :

[BenchMarks]
[
Sorting Remarks]
[
James Barbetti Implementations]

 

Site updated : Monday, February 17, 2003