File Searching
Home
Introduction
Philosophy
General techniques
Sorting
Searching
Factory
Persistence
Logging
Streaming
Tokenizers
Parsing
File Searching
Command
PseudoPatterns
Compiling
Downloads
FeedBack

The Story

Last week, one of my fellow programmers rightfully suggested that we could make our application more robust with the following feature. Whenever the application needs a file ( or directory ) and it doesn’t find it, it could popup a screen to the user allowing her to manually search for the file. Better still, it could even go and search for it automatically. In the latter case, we would know what we were looking for and we could scan the disks the user wanted us to scan.  If after all this, we where still not able to find the file, we do as we do now : dump a log message and exit painfully.

The weeks before, we had been correcting some reported bugs and for me it was time to start something new. Our application is quite big and it needs a lot of files and directories:  various databases, configuration files, import and export paths, help file paths, ....  It was clear to me from the start that there was a reusable component in the making.

First of all, there are some API’s available for searching for files.  I know these API’s, but I wanted more.  I wanted a component that is uniform in searching for files. Whether you search for files with some extension, some size, some structure or some contents, the component should be called the same for all of them.

IFileCriterium

Whatever our component would look like, it would need to separate the good from the bad (and the ugly). This leads us to a very simple interface :

Public Function FileOK( ByVal aFile As file) As Boolean
  
' see if this file is OK
End Function

For those of you who wonder, the ‘file’ datatype comes from the Microsoft Scripting Runtime (aka scrrun.dll) (a reference you can normally add to your project).  The interface looks very simple and of course, that was the intention.  We’ll see later that it is not yet complete, but for the moment it will do.

Our first job now is to create some “stock” implementations for this interface.  They will -to a great extent- make the component more or less usable.  Without them, a user of our component would be forced to implement them herself.

The most trivial of them is the cFileCritNameLike class.  It will take a name pattern and return true for all files of which the name matches the pattern. It looks like this :

Private mNamePattern As String

Implements iFileCriterium

 

Friend Sub Initialize(ByVal strNamePattern As String)

   mNamePattern = strNamePattern

   Debug.Assert (mNamePattern <> "")

   If mNamePattern = "" Then mNamePattern = "*"

End Sub

 

Private Function iFileCriterium_FileOK(ByVal aFile As Scripting.IFile) As Boolean

   Debug.Assert Not (aFile Is Nothing)

   iFileCriterium_FileOK = (aFile.Name Like mNamePattern)

End Function

None of this code is quite spectacular.  I do have some comments to add :

  • The code is filled with debug.assert statements.   These make it easier to debug the component when you create him.
  • The initialize method has code to rectify anomalies. When the search pattern is empty, it is automatically initialized to something else.
  • The initialize method is a friend method.  I used a friend “initialize” method for private classes that implement a public interface.  This “initialize” method obviates the need for constructors with parameters.  If I need more than one I just call the second Intialize2 or use InitializeForXXX and InitializeForYYY to clarifiy the names.
  • This class holds no references to other objects and thus doesn’t need a class_terminate method.

Since performance is of high (but not highest) importance in this component, I have written a seperate class cFileCritNameExactly class. To balance the criteria, I have also written cFileCritExtensionLike and cFileCritExtensionExactly. The only benefit of the separation of these classes is that you only pay the pattern matching price when you really have to.

In a good tradition of my other components, it is clear that there is no need at all for any of these classes to be public. We only need some factory methods to create them with the correct parameters but after that, we are only interested in the interface they implement.

Another obvious file criterium is the cFileCritAttributes. It takes some attributes to match and only lets the files that match them pass the “gate”.

I could have written more boilerplate implementations, but for the moment I did not do this. Suffice it to say that the Scripting.File is a very rich object and you can easily write your own implementations and use them with the component.

Recursing the directories

Then it was time to start writing the real code that did the search. My first attempt was :

Sub FindFiles(ByVal strStartPath as string, byval objFilter as iFileFilter)

It didn’t take me long to see that this was a little too unambitious.  You could only search files in a given subdirectory. If you wanted to search files on multiple drives or in multiple folders, you have a problem. with this limited method signature.  Furthermore, why should you limit yourself to searching for files. Why not folders ?  Or Drives ?

Hm.  Finding an answer is easy if you ask the right questions. Searching for files and folders implied iFolderCriteria and iDriveCriteria.  That was exactly what I did.  The definitions of the interfaces are nearly exactly the same as IFileCriterium.  The following table summarizes the classes.

class name

Purpose

cFoldCritNameLike

Search for folders that match the name with a pattern

cFolderCritNameExact

Search for folders that match the name exactly

cFolderCritExtensionLike

Search for folders that match the extension with a pattern

cFolderCritExtensionExact

Search for folders that match the extension exactly

cFolderCritAll

Matches all folders (NullObject pattern)

cDriveCritLetter

Search for drives with a given letter

cDriveCritType

Search for drives of a given type (CDRom, Fixed, removable, network, ...)

cDriveCritReady

Search for drives that are ready

cDriveCritAll

Matches all drives (NullObject Pattern)

Equipped with file-, folder- and drivecriteria, it became trivial to come up with the following functions :

Event FileFound (byval whichFile as File)
Event FolderFound (byval whichFolder as Folder)
Event DriveFound (byval whichDrive as Drive)


Sub FindFilesIn(ByVal strStartPath as string, byval objFilter as iFileCriterium)
Sub FindDrives(ByVal aDriveCrit as iDriveCriterium)
Sub FindFolders(ByVal aDriveCrit as iDriveCriterium, ByVal aFolderCrit as iDriveCriterium)
Sub FindFiles(ByVal aDriveCrit as iDriveCriterium,
                       ByVal aFolderCrit as iDriveCriterium,
                       ByVal aFileCrit as iFileCriterium)

Higher-order criteria

The benefit of an article is that you usually write it after writing a component.  This means that it looks like you knew everything in advance and it was just the typing of the code that took time. As most of you know, it’s never like that.

I started writing higher-order criteria from the moment I had finished writing the base file criteria.  These allowed me to combine the file criteria in the usual ways : and, or, not.

I had a very bad feeling when I wrote the same things for Folders and Drives.  Why ?  It’s easier with an example :

Private mDecoree As iFileCriterium
Implements iFileCriterium

Friend Sub Initialize(ByVal objCriterium As iFileCriterium)
   Set mDecoree = objCriterium
   Debug.Assert Not (mDecoree Is Nothing)
End Sub

Private
Sub Class_Terminate()
   Set mDecoree = Nothing
End
Sub

Private
Function iFileCriterium_FileOK(ByVal aFile As Scripting.IFile) As Boolean
   iFileCriterium_FileOK = Not mDecoree.FileOK(aFile)
End Function

 

Private mDecoree As IFolderCriterium
Implements IFolderCriterium

Friend Sub Initialize(ByVal objCriterium As IFolderCriterium)
   Set mDecoree = objCriterium
   Debug.Assert Not (mDecoree Is Nothing)
End Sub

Private
Sub Class_Terminate()
   Set mDecoree = Nothing
End
Sub

Private
Function iFolderCriterium_FolderOK(ByVal aFolder As Scripting.IFolder) As Boolean
   iFolderCriterium_FolderOK = Not mDecoree.FolderOK(aFolder)
End Function

 

The similarity is so striking (you can product the code on the left with a simple search replace), that it hurted me.  How could I create two classes that where so similar whitout attempting for reuse ?  In fact I wrote a version in which there was an iObjectCriterium. It worked fine but I feared that the performance penalty would be too high to use the generic classes.  I had received some critiques for my sorting component and most of them where of the kind : “nice but how does it perform ?”.  It’s so infuriating.  People are wanting to use behemot components for grids or XML for data storage but not for any other stuff. Anyway, I took the performance side this time and created a bunch of very similar classes.

How to use it.

So far for the component.  Now how do you use it ? Currently the class is written such that it should be used in a form. In your form, you could write the following code :

 

Subitems :

 

 

Site updated : Monday, February 17, 2003