By Willy Van den Driessche It started with “Sorting collections” ...This is a story about the birth of my sorting routines. Since I have a verbose writing style, It will be a long story. One day I noticed that in our project we hard al lot of sorting to do. We had many custom collection classes and they all needed a way to sort them. Up till then we used the storage classes to perform the searching (ORDER BY in SQL clause). It was my idea that once you have a collection in memory, it is a terrible waste to throw it away only to retrieve it later in an another order. I therefore took off to write some classes for the following requirement : “Build something that is able to sort any collection”. Since sorting is such a common thing, I took a look at far cousin C, where I knew there is a predefined function to perform a QuickSort. The function prototype goes like this : void qsort(void *, size_t, size_t, int (*) For those of us able to read C : there are mainly 2 parts in this declaration : the first part defines the array to be sorted. At which address does it start, how large are its items, how many items are there ? The second part defines a function that is used to compare two elements in the array. If we compare this to our requirements, we see that we have the same two parts. In our case, the first argument will always be a collection. After all, the requirement is to sort a collection. The second argument is a function pointer. While there are ways to simulate passing functions around, VB actually offers a much better approach. All we need to do is define an interface for such functions. (technically, its an interface for an object, not a function, but since these objects are actually functors, we choose not to be too sharp in terminology). Let’s have a close look at the prototype of the C function : int (*) (const void *, const void *) ComparersYou should read this as “ a function that takes 2 arguments of any type and returns an integer”. With this in mind, the definition of our comparer interface is actually quite simple : Interface iComparer So, a comparer takes two variants, compares them and returns :
Having the comparer return a long is debateable, but it has the advantage of having the relative order of two items in one function call. Having variant arguments is even more debateable. I use variants because in VB6 a variant means ‘anything’. In .NET this will probably change to Object. After defining the iComparer interface, I was able to write the sorting routine (it was a QuickSort) Public Sub QuickSortCollection(ByVal objCollection As Collection, ByVal objComparer As iComparer) I was pleased with this implementation. However, there are some important remarks to be made :
Apart from these remarks, my problem was solved and I was a happy man (I was before too off course). Since I had the intention of writing a reusable component, I figured it made sense to define some standard Comparers. Standard ComparersI started by defining a comparer for every built-in type : cBooleanComparer, CCurrencyComparer, CStringComparer.... Now the only thing that is of interest to the user of the sorting component is the iComparer interface of these classes. This is why I decided to make all these classes private and provide factory methods for each of them. These factory methods go like this : ' CreateStringComparer : Now, somebody wanting to sort a collection of Integers, wouldn’t have to write any comparer at all. He could reuse the existing comparer and write code that goes like this : QuickSortCollection myIntegerCollection, CreateIntegerComparer The comparers for the built-in types are the most obvious candidates to be reused. However, there are other candidates. Often you want to be able to sort ascending as well as descending. Therefore I created an InverseComparer, which decorates an existing comparer. I created the same factory method for this type of comparer. Sorting an integer collection descending became : QuickSortCollection myIntegerCollection, CreateIntegerComparer Another comparer made it into the standard comparers. It was a little less obvious to discover. Suppose you have a collection of custom objects, say instances of cCustomer. You have written a cCompCustomerByName and a cCompCustomerByAddress class. Now you want to sort your collection by name and then by address. How do you do that ? Until now you had to write a new comparer for that. Therefore we create a Composite comparer cCascadingComparer which takes any number of other comparers and use their results in order. If two items are ‘the same’ when using the first comparer, the second comparer is used to compare them, and so on for any number of subComparers. I created a factory method for the cascading comparer so that I was able to write code like this : QuickSortCollection myCustomerCollection, One final standard comparer made it through. It is another decorating comparer but this time used for debugging only. cTimedComparer will ‘wrap’ another comparer and tell you how many times the comparsion was called and how much time it spend inside it. QuickSortCollection myCustomerCollection, CreateTimedComparer (new cCompCustomerByName) Overview of comparers:
At this time, I was done with my solution, since it fullfilled the requirements. Sorting arraysThen one day, one of my collegues asked if I had a routine to sort an array in VB. Clearly, I did not. What was wrong ? Nothing at all. My little routine perfectly fulfilled its requirements. Only, the requirements where to sort a collection, not an array. When I looked at my code, I immediately saw that it was very simple to change the QuickSortCollection in a QuickSortArray routine. However, I was very reluctant to do that. It would probably have been much more performant to do so. But I have always learned that the only reasonable numbers are zero, one and infinity. So it’s allright to have a sorting routine that can sort 1 type of container or all types of containers. It doesn’t make me feel good to have a routine that is capable of sorting two types of container. I was clearly missing something and therefore I started by reformulating my requirements as : “Build something that is able to sort anything that is like a collection”. SortableContainersWhat does that mean ? “Like a collection”. Well, that’s a very hard question. You can come up with several solutions for this statement. I’ll point out that they have a related but separated solution space. I’ll show you why I decided to keep my solution in a minute. I came up with the concept of a sortableContainer, which had 3 requirements :
In VB, this translates in the following interface : interface ISortableContainer Now, the first thing you might notice is that this interface definition is assymmetric. You are able to retrieve an item at a given index, but you are not able to set it at a given index. As I said before, this is a design choice and other design choices work just as well but allow for other solutions. You will see shortly why I chose the interface with the Swap method. OK, now that we have the definition of an ISortableContainer, we still have to create implementations for it. I started out with the two most wanted solutions : cSortableCollection and cSortableArray. The code for the collection is here below : ' Usage : this class is a typical 'adapter class' which The code for cSortableArray is very similar to the code for the collection. With these container implementations, I was able to take the code for QuickSortCollection and rework it to operate on iSortableContainers. I created the routine QuickSortContainer. The funny thing is I was able to keep my QuickSortCollection routine, but now as a façade for the ‘real sorting routine’. public sub QuickSortCollection (byval objCollection as collection, byval objComparer as IComparer) Standard SortableContainersNote that these two iSortableContainers are applications of the Adapter design pattern. Since we don’t have the source code for the Collection class, it is impossible to make it implement the ISortableContainer interface. (It is possible, if you use a technique call COM aggregation. This technique, however, requires some non-trivial under-the-hood techniques that I prefer not to use unless I’m forced to) For the arrays, an adapter is the only possible option for implementing this solution. As for the comparers, I didn’t stop with these two standard iSortableContainer implementations. The most useful other implementation is cSortablePermutation which is a decorator for another ISortableContainer. However, the decoree is never modified. Instead, the cSortablePermutation keeps an internal “array of indices” which it swaps instead of swapping the items in the original container. You might be wondering, why you need such a funny sortable container. The answer is simple. With such a container you never modify the original container and therefore it is the perfect addition for collections. Remember that collections loose their keys when you sort them ? Well, if you use the resulting “array of indices” (a cPermutation object actually) to access the original collection, you circumvent this problem alltogether. Furthermore, this solution allows you to have many ”indexes” on a collection ( or an array for that matter). Note that this solution was not possible with a symmetric definition of ISortableContainer. Thanks to the swap method we are able to work on the permutation instead of on the real container. If, instead there would have been a ‘Set Item’ method a permutation would be out of the question. Two other related standard implementations for iSortableContainer are also decorators: cSortableFromEnd will reverse the underlying container. This gives you another way to sort a container in reverse order. cSortableSubRange takes another container and only exposes a subrange of it. This would allow you to sort the first five elements in a collection. The usage of these containers might seem a little bit far-fetched but they are absolutely necessary for the Searching sequel to this story. (a little in advance : they let you search from back to front and let you search subranges) In analogy with the comparers, I also wrote a cTimedContainer which decorates another Container. It counts and times the number of accesses to each function of the underlying container . It can be very instructive to see the number of swaps in your container. For pure proof-of-concept purposes, I also implemented a cStringContainer, which will consider a string as a container of characters. You can therefore sort the characters in a string.
SortAlgorithmsNow you might think that I stopped right there. Well, I didn’t. The reason is that the code required to sort a simple array or collection is not as simple as I would like it. See the code below and move your eyebrows : QuickSortContainer CreateCollectionContainer(myCollection), CreateIntegerComparer I therefore introduced some Façade methods (remember that I had one allready : QuickSortCollection). These shorten the code above to : QuickSortCollection myCollection, CreateIntegerComparer Now even there I didn’t stop. Up until now, I had been doing quicksorts. Quicksort has some nice performance properties. But it has some bad properties as well. The worst performance is O(n2), which is something that rarely occurs but it does. Furthermore quicksort is known to be unstable (I.e. you never know where to items that are the ‘same’ for the comparer might end up relatively to each other). For some cases, a quicksort is just plain overkill. (Containers that are almost sorted come to mind) I therefore took a dangerous step : I abstracted the sort algorithm as well. It was not difficult to come up with a defining interface : Interface ISortAlgorithm
People have been killed for lesser things. This is the point where most people will accuse you of overdesigning things. They sure have a point there. I did it because it allows you to define decorators for existing sorting algorithms. After defining the interface for sorth algorithms, I implemented some obvious instances : cQuickSorthAlgorithm, cShellSortAlgorithm, cInsertionSortAlgorithm, cHeapSortAlgorithm, cSelectionSortAlgorithm and cBubbleSorthAlgorithm. I also did the obligatory decorator cTimedAlgorithm.
With the necessary Façade methods in place, you now pick your comparer, your container and your sorting algorithm. These 3 more-or-less orthogonal dimensions guarantee that any addition you do to any of the dimensions results in a plethora of possible working combinations. I think that’s a good sign. Final remarksTo make for a real proof of concept, I built a little testing application. I created a form with various vertical Line objects on it and implemented the iSortableContainer for it. That way you can visually see the sorting algorithms work on the form; much like the threaded sorting demo in Delphi (but without the threading). Now for the final remarks :
I you liked this story, then you can read on with the searching story on the same site. I you would like to have a look at the code, you can dowload it here. related :
| |||||||||||||||||||||||||||||||||||||
Subitems : | |||||||||||||||||||||||||||||||||||||
[BenchMarks] | |||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||
| Site updated : Monday, February 17, 2003
| |||||||||||||||||||||||||||||||||||||
|
| |||||||||||||||||||||||||||||||||||||