|
| P | E | R | E |
| D | K | C | P |
| V | O | G | R |
| T | I | O | A |
The most simple type of algorithm does not use information about the search space. We therefore call them “blind” methods. They are often very useful, especially when the problem is not too complicated.
Figure 1.2 Search mask for the chess horse puzzle. X is the start cell, O’s can be reached from X by jumping like the chess horse.
| O | O | |||
| O | O | |||
| X | ||||
| O | O | |||
| O | O |
Consider, for example, a well-known puzzle, as described in Figure 1.1. Beginning with the letter in bold, you have to find a word, by jumping to the following letter like the horse in chess. Since at each stage at most 7 jumps are possible (8 minus the previous one), and the maximal number of nodes is 16 (since there are only 16 letters and each letter is used at most once) the tree does grow too much. Such a problem is easily solved with a blind search algorithm, as will be demonstrated in the following paragraphs.
Searching for a solution
One way to look for a solution is the following. You start with the letter P (as instructed) and you jump to the next possible letter. To make things easy, we will agree that we first look upwards, then right, down and left. The search mask (Figure 1.2) provides a scheme for which cell are reachable from a certain start cell.
For the first cell, we can exclude all but two cells, because they fall outside the table. Only two possibilities remain: We can reach letters C and O. When we order this result in such a way that we connect to the letter P all letters that can be reached from it, to those letters all letters we can reach from them, and so on, we get a search tree like in Figure 1.3. The process of adding child nodes to the existing partial solutions is called “expanding”. In the following paragraphs, I shall rely on the discussion on search methods by Russel and Norvig (1995) and Winston (1993).
Figure 1.3 Search tree for the chess horse puzzle (redundant branches not shown)
| P | ||||||
| C | O | |||||
| I | A | R | A | |||
| … | … | … | … | … | … | … |
Depth-first Search
Suppose now we build the search tree of Figure 1.3 in such a way that we first explore one path from the top downwards, until we get stuck, and then return to the last not fully explored choice point. Until, of course, we reach the solution: a word that describes an edible substance. We then explore the search tree in a depth-first manner. Indeed, we dive into the tree, only returning when we explored the whole path downwards. A drawback of this search strategy is that it gets stuck in branches that never reach a dead end nor lead to a solution.
Breadth-first Search
Another way to navigate through this tree is to enumerate at each level, for all nodes reached so far, all their possible child nodes, until one of the child nodes is the last letter of the solution. We then obtain a breadth-first search strategy. This strategy is not prone to getting stuck like depth-first, but the memory and time requirements for a breadth-first search are enormous for real-life problems.
These two basic algorithms have a lot of variants, often meant to avoid disadvantages of breadth- or depth-first search. A simple one is nondeterministic search, in which you expand a randomly chosen node. This type of search will prove useful when one cannot be sure the search space does not contain infinite branches or nodes with high branching factor. Several other techniques, however, allow to cope with this uncertainty in a more efficient way.
Uniform-cost Search
For example, breadth-first may be suboptimal if the path cost is not uniform throughout the search space. Uniform-cost search is an alternative. This strategy takes into account individual costs for each step. It is therefore guaranteed to find the path with lowest cost. For our toy problem, however, it is not very useful since all steps have an equal cost.
Depth-limited Search and Iterative Deepening
Depth-first search fails if it encounters infinite branches in the search tree. It will dive into the branch and never return. A modification, named depth-limited search, brings solution. Depth-limited search only expands partial paths on the stack if their length is smaller than a certain threshold.
There are two main possibilities for determining the threshold:
Choose a fixed number (eventually by a useful criterion, for example depending on the size of the state space)
Try out the algorithm with limit 1, then 2, then 3 and so on.
This last way results in iterative deepening search. For most problems, and if no informed method is desired or possible, iterative deepening is the best way, because it combines the advantages of depth-first search (modest memory requirements) and those of breadth-first search (optimality and completeness).
Informed Methods
In the search tree as described in the first paragraph, an English speaker immediately remarks that the first branch sprouting out of P can be cut off, for no English word begins with the letter combination “PC”.
Strategies that make use of knowledge about the problem space are called “informed methods”. When the problem is very complex, the search tree will be large and informed methods can help in choosing branches that presumably lead to a solution. Indeed, the simple knowledge of the fact that words beginning with PC are very unlikely saves a great deal of work: half of the search tree doesn’t have to be considered, and still the solution is found.
As heuristic for this example may serve a “wordiness” index: the number of words in my English dictionary starting with the given letter combination. The search tree combinations PC, PO, PCI, PCA, POR, and POA get 0, 121, 0, 0, 22 and 2, respectively. Needless to say, a letter string cannot get a higher value than any of its parents. For some informed strategies, the toy example of the chess horse does not really apply: therefore, I will abandon the example at this point.
Many informed strategies exist due to the virtually unlimited amount of criteria one might use for “information about the search space”. In this small introduction, I shall only describe three important classes of informed search:
Best-first search: greedy search and A* searchMemory-bounded search: IDA* search and SMA* searchIterative-Improvement search: Hill-climbing search and simulated annealing Best-first Search
If a search strategy takes into account the estimated cost to reach the goal in order to sort the alternative partial paths to the goal, we obtain a best-first search method. In fact, best-first is a bad name; “apparently-best - first” would be a better one, since real best-first search would proceed directly to the goal.
The simplest type of best-first search is called “greedy search”. It is greedy in that it always expands the node whose estimated distance from the goal is smallest. It uses an evaluation function h(n), which is the estimated cost of the shortest path from the current node to the goal. The only requirement for h(n) is that its value be 0 for n is the goal. Greedy search seems an attractive method because it is simple, but it is neither optimal neither complete.
A variant of this method is beam search. Beam search expands the w best partial paths (with w the beam width) and forgets the rest. Beam search is simple too, but is like greedy search neither optimal neither complete.
A second type of best-first search, A*, tries to minimise the distance of the total path. A* combines the informed method of greedy search and the optimality and completeness of uniform-cost search. It uses as heuristic function the sum of the distance between start and current node and the estimated distance from the current node to the goal. It can be proven that, provided h(n) is admissible (that is, it never overestimates the actual distance to travel) A* is complete and optimal. A drawback, however, is the amount of memory A* needs. Therefore, a second class of heuristic methods has been developed that avoids exponential growth of memory requirements.
Memory-bounded Search
There are two main problems for search strategies in general: time requirements and memory requirements. The latter usually shows up first, so algorithms have been developed to cope with this problem.
A first way to combine informed search with memory-reducing techniques is IDA*, or iterative deepening A*. Basically, the algorithm does what iterative deepening does. There is one difference: while Iterative Deepening draws step contours, IDA* draws cost contours. In each iteration it searches in a depth-first way all paths with less than a given cost. The algorithm works pretty good if there are only a small number of cost values, hence a small number of iterations. But in complex domains, where there are a lot of different values, IDA* performs badly. A possible solution is to determine the contour not by exact value, but by giving a range of costs. But like A* search, IDA* repeats its history.
A second way one could perform memory bounded search is SMA*, or Simplified-memory bounded search. SMA* does not repeat its history, for it uses a part of the available memory to retain it. In fact, SMA* uses all available memory. Its stack will always be filled (at least after a stage of filling it up). When SMA* wants to expand a node, and no more memory is left, it will drop bad partial solutions from the stack, only retaining some quality information about them. In this way, it will only retrieve those bad solutions after having explored all the better ones.
The implementation of SMA* is rather technical and not necessary for our purposes (See Russel & Norvig, 1995). It is the most difficult algorithm described here, and it performs well in many applied domains. But when the problem is really hard, it may lose time by switching between partial paths. The simple A*, provided it has enough memory, may perform better then.
Iterative-improvement Search
If the only relevant thing is reaching the solution, no matter how, Iterative-improvement search algorithms may be a good idea. These algorithms basically start with a complete solution (possibly a very bad one) and try to improve its quality by taking certain steps. At each new step, the quality of the solution is assessed. When it fails, the algorithm will continue searching. When it succeeds, the algorithm announces success.
One way of doing is the well-known hill-climbing, with its well known problems of local maxima, plateaux, and ridges. The general idea of hill climbing is to always go in the direction of the highest improvement (steepest slope). Random-restart hill-climbing is sometimes used to avoid the local maxima problem. It comes in two flavours:
Do a fixed number of iterations, with in each iteration a hill-climb. Take the best solution at the end.
Do the iterative-hill climbing until no more improvements of the best solution are made for a fixed number of iterations.
The second type of algorithm, simulated annealing, has been constructed explicitly to avoid the local maxima problem, and partially the plateaux problem, of hill-climbing. The idea is to allow steps that lead to worse solutions, with a probability <1 that decreases in time. The algorithm derives its name from the process of annealing, the process of cooling down slowly a fluid to obtain an ordered solution. This can be proven to happen, at least if cooling is slow enough.
Quality of Search Strategies
Search quality depends on the strategy used
The discussion above makes clear that the choice of an appropriate search strategy is crucial for an effective search. Following five criteria are used to assess the quality of a search strategy:
Completeness: Does the strategy always find existing solutions?
Time complexity: Does the strategy use the least amount of time?
Memory complexity: Does the strategy use the least amount of memory?
Optimality: Does the strategy find the best possible solution?
Information use: Does the strategy use the smallest amount of information possible to find the solution?
Remark that there is usually a trade-off between 1-4 and 5. For example, “blind” methods, provided they find a solution, will require only little information, but not as good as informed methods on criteria 2 and 3. Algorithms that do not require great amounts of information are appealing for two reasons: they are simpler to implement, and they require less calculations. On the other hand, information may be used to keep the effective branching factor limited, such that an informed algorithm, although slower within each loop, may find the solution quicker than does a less informed algorithm.
Following table gives an overview of the search strategies discussed above, together with evaluation on the five criteria completeness, time and memory complexity, optimality and IU (Information Use).
Table 1.1 Evaluation of search strategies on five criteria. See text for discussion. Information use (IU) indicates how much information the strategy requires: 0 for uninformed strategies, 1 for strategies that only use an estimation of remaining distance or partial path cost, 2 for strategies that use both and 3 for SMA*, which uses in addition quality information about “forgotten” partial paths.
Search quality depends on the network topology The search success does not only depend on which strategy you use, but also on the topology of the network. Some graphs are easily crossed by breadth-first type algorithms, while others profit from depth-first exploration. In this paragraph, I describe some topological factors that make certain approaches more fruitful than others. Readers not familiar with graph theory are encouraged to read chapter II (p. 15).
Infinite branches may occur for two reasons: cycles in the graph, or infinite branches with all nodes different. The first problem is usually tackled by refusing successors that are also ancestors of a node, or by avoiding generating a state ever generated before. This strategy results in very high memory requirements, but it might pay when the graph has many cycles. This can be determined by calculating the cycle rank of the graph: the difference between the size of the graph and its spanning tree.
All depth-first strategies suffer severely from the problem of infinite branches. When the cycle rank g (G) of a graph is high, breadth-first search and its derivatives might be a better choice. Of course, determining the cycle rank of a graph implies that it has been traversed already. The only way out is to assume that the graph locally resembles the graph as a whole, such that an estimation of the cycle rank can be provided while the search algorithm is still running.
All search algorithms have difficulties with graphs that have a high branching factor (the average number of nodes that can be reached from a certain node). Using information in fact comes down to pruning the search tree in an optimal way, that is, to search first those branches that have best chances to lead to a solution.
In any case, machine search algorithms seem to cover the whole domain of searching in graphs such that study of these algorithms provides essential knowledge when dealing with human search behaviour in organised structures. The next chapter discusses mathematical graph theory, for this too might be necessary for the current purpose: Identifying the nature of human search behaviour.
Chapter II: Network Topologies
This second introductory chapter presents an overview of mathematical graph theory, which I find is a useful way to describe network structures. Readers already familiar with graph theory will not encounter many new ideas, so they can skip this chapter and proceed with chapter III.
Graph Theory
Graph theory is a mathematical tool that helps us representing various problems in science and daily life. Chemistry, linguistics, artificial intelligence, sociology, geography and electrical engineering all benefit from the development of graph theory. This chapter will draw upon a book by Wilson (1996).
Several well-known puzzles, as the Königsberg bridges problem, the four colour problem and the travelling salesman problem, can be viewed as typical graph problems. While the first one is already solved (in 1736 by Euler), the second and the third one still wait for a solution. It may even be possible that somebody comes up with a proof that these problems cannot be solved in general (Poundstone, 1989).
It is the last problem that is most relevant for our purposes. Finding the optimal path in a graph (i.e. the shortest distance covered) such that all cities are visited precisely once needs a search tree to be built. Exhaustive search becomes impossible for a sufficient large number of cities and roads. Therefore, blind methods will fail.
What we need are search methods that use information, thus pruning the search tree. Such methods are already described in the first chapter (p. 6). In this chapter, we will continue with a more rigid perspective on topology. This mathematical basis can help us state very clearly problems and their solutions when considering human search in graphs.
Definitions and Properties of Graphs
A graph is defined as a structure with two components: nodes and edges, or connections between these nodes. From a mathematical point of view, a graph G consists of a non-empty finite set of vertices (nodes) V(G) and a finite set E(G) of unordered pairs of elements of V(G) called edges. An edge is denoted as vu, which means that it connects vertices u and v.
Usually, we restrict to simple graphs, which means that no loops (edges of the form vv) neither multiple edges (two edges of the form vw) are allowed.
Furthermore, we can distinguish between directed and undirected graphs. Most network representations are directed graphs: an edge from u to v does not imply an edge from v to u. Directed graphs are also called digraphs. Arrows are used in figures to denote the edge’s direction.
Two graphs G1 and G2 are isomorphic iff there is a one-to-one correspondence between their vertices such that the number of edges joining any two vertices of G1 is equal to the number of edges joining the corresponding vertices in G2. Stated differently, G2 is a renaming of G1. Often, we do not care about the labels of the vertices and thus drop them: an unlabelled graph results.
The union of two graphs G1 and G2 is G if and only if (iff) V(G)=V(G1) È V(G2) and E(G)=E(G1) È E(G2) with V(G1) and V(G2) disjoint sets.
An important property of graphs is their connectedness. A graph is said to be connected iff it is not possible to express it as the union of two graphs, and it is disconnected otherwise. A disconnected graph has k components, the union of which constitutes the graph.
Several important theorems exist about connectedness. The most known is that any simple graph with n vertices and more than (n-1)(n-2)/2 edges is connected. Many other things can be said about connectedness, but I will not discuss these themes here because they are mostly irrelevant for our purpose. The reader can find excellent discussions in Wilson (1996), Andrasfai (1977), Harary (1969), Busacker and Saaty (1965) and Berge (1973).
We can ask whether it is possible to travel from vertex v to a vertex w in graph G. This is possible iff there is some vertex z, adjacent to v, from which it is possible to travel to w. This gives a recursive definition. And we still need to define the property of adjacency. We say that two vertices v and w are adjacent iff there is an edge of the form vw in E(G), while v and w are said to be incident on vw.
The degree of a vertex is equal to the number of edges it is incident with. In directed graphs, this number can be split up into indegree and outdegree (number of incoming c.q. outgoing edges). The so-called handshaking lemma states that in any graph the sum of the degrees of all vertices is even. The degree sequence of G is the set of degrees of the vertices of G written in increasing order.
A useful representation of adjacency and incidence of vertices consists in writing the matrices A and M. The adjacency matrix A is an n ´ n matrix whose ij-entry denotes the number of edges joining i and j, while the incidence matrix M is an n ´ m matrix whose ij-entry is 1 iff vertex i is incident to edge j and 0 otherwise. A third important matrix connected to graphs is the circuit matrix C, an n ´ m matrix whose ij-entry is 1 iff edge j lies on circuit (closed trail) i. A nice theorem (Busacker & Saaty, 1965, p. 105) combines A and C: A.C t =0.
In Eulerian graphs, a closed trail exists that contains every edge precisely once. This type of graph is named after the great mathematician Leonhard Euler. He proved in 1736 that a connected graph is Eulerian iff the degree of each vertex is even. There is even an algorithm (called “Fleury’s algorithm”) for constructing the Eulerian trail. You start at any vertex v of an Eulerian graph, and you traverse it in a random way, restricted only by following rules:
Erase any edge when it has been traversed, and each isolated vertex too.
Use only a bridge if there is no alternative. A bridge is defined as an edge without which the graph would be disconnected.
Hamiltonian graphs are important too for our purposes. Where Eulerian graphs have a trail containing each edge, a Hamiltonian graph has a cycle that contains every vertex precisely once. Unfortunately, a theorem comparable with the Euler Theorem for Eulerian graphs has not been found yet. Most existing theorems are of the form “If G has enough edges, then G is Hamiltonian”. Most famous is the Dirac Theorem (1952): if G is a simple graph with n>2 vertices, and if deg(v)>n/2 for each vertex, then G is Hamiltonian.
Developers of network structures can use the Hamiltonian property to design a structure in which one visits every page once, afterwards returning to the starting point. A special button could be reserved to follow the Hamiltonian trail, which is useful tool for novices. But I shall stick to Eulerian graphs, because an Eulerian walk also visits all nodes.
Among the most interesting properties of graphs is their “thickness”; namely, how many layers one needs to avoid crossings of edges. The relevance for network searching lies in the observation that planar graphs (graphs with thickness 1) are simpler, perhaps because it is easy to “map” such graphs: a planar graph is always euclidean, thus complying to the triangle inequality. The proof is based on Wagner’s Theorem (formulated in 1936), that every planar graph can be drawn with straight edges.
The centre of a graph (Wilson, p. 46) is defined as those vertices for which holds that the maximum of the distances between such a centre and the other vertices is as small as possible. An interesting property is that each tree (see p. 19) has either one centre or two adjacent centres. The notion of centre may be important for hypertext developers: if the starting page is also the centre of the graph, effort to reach each of the child nodes is minimised. Experimental data about this hypothesis can be found in the next chapter (p. 31). A related property is called the diameter of a graph, which is defined as the maximum distance from one node to another. In general, the more connections per node, the smaller the diameter.
Examples of Graphs and their specific Properties
Null graphs and complete graphs Both null graphs and complete graphs are uninteresting: null graphs have no edges, hence no search possibilities, while complete graphs do not impose structure on the information nodes.
A complete graph, however, may have applications in the domain of developing hypertext pages: when there are only few pages (at most five), the number of connections is smaller than ten, so it is not too difficult to keep an overview of the material presented. An advantage of such fully connected structures is that the distance between any two nodes is one.
Path graphs, cycle graphs and wheels
A path graph implies a serial topology. A good example are books: one can read from page one to the end, but no tools for jumping are available. The observation (Wright, 1993) that experienced readers usually avoid serial reading indicates that it might be an inferior strategy to read in a serial way.
A cycle is described formally as a graph in which all nodes have degree 2, such that there are n nodes and n edges. A path graph is then a cycle graph in which one edge is removed, such that n-1 edges remain.
A wheel is a cycle graph with one extra vertex, to which all other vertices are connected. If the number of vertices is n, the number of edges of a wheel graph is 2n-2.
Regular graphs Regular graphs are graphs in which every node has the same degree k. A thus formed graph is said to be of the k--th degree, or k-regular. Trivial examples are the null graph and the complete graph. More interesting examples include rings or cycles (degree 2) and the graph of the cube (4-regular graph with 8 nodes).
The main advantage of such graphs, when applied to the domain of hypertext developing, is that regular graphs are cognitively simple. A disadvantage is that it may be difficult to represent information in such a structured way.
Trees define a hierarchical topology with one node at the top called the root. Forests have multiple roots: therefore, they are unconnected. Each root with its descendants is said to be a component of the forest. A tree can thus be considered a forest with one component.
Trees have drawn special attention of several graph theorists because of their unique properties. For example: the number of edges in a forest is n-k-, with n the number of vertices and k the number of components. Or, maybe more interesting for a hypertext development context, the property that any two vertices are either unconnected (iff they belong to different components) either connected by precisely one path.
This observation will come back in the chapter on human network search, because it has been observed (Doomen et al., 1996) that people have difficulties finding information in such structures. The main reason is precisely this unique-path property: if one makes a wrong decision at a previous node, it becomes very difficult to return to the correct path.
A third important notion connected with trees is that of a spanning tree. A spanning tree of a connected graph is derived from the original graph by deleting edges in such a way that the resulting graph does not contain any cycles. The number of edges removed in this way is called the cycle rank of G or g (G). The cutset rank of G is the number of edges in a spanning tree, or x (G), which is equal to n-1 with n the number of vertices of G. Remark that g (G) = m - x (G), with m the number of edges of the original graph.
Randomly traceable graphs A randomly traceable graph from a certain vertex x is easy to search in: you can simply choose, starting from x, where to go next, provided the edge chosen is not crossed yet. You will then cross all edges precisely once. RTG’s, as we will abbreviate this special type of graphs, derive this property from the fact that at no stage of Fleury’s algorithm (see p. 16 and further) bridges occur, so that the graph might become disconnected. In what follows, we will assume that we deal with an RTG that is randomly traceable from a certain vertex x and we therefore omit “from vertex x”. Some RTG’s are randomly traceable from multiple vertices, but we do not need that special kind of graph here. See Andrasfai (1977, p. 78-82) for more details.
Which properties define a RTG? First, it is important to remark that RTG’s are always Eulerian graphs. Thus, each vertex of a RTG has an even degree. Second, the vertex x our RTG is randomly traceable from must be contained in every cycle of the graph. These two properties are both necessary and sufficient conditions for a graph to be an RTG (see Andrasfai, 1977, p. 79 for a proof).
Planar graphs Graphs that can be drawn in the plane such that their edges only cross at endpoints are called “planar graphs”. They have thickness 1.
Many theorems about planarity exist, the most famous of which is Kuratowski’s theorem. It states that a necessary and sufficient condition for a graph to be planar is that it does not contain subgraphs of the type K5 or K3,3. K5 is the complete graph with five vertices (and therefore 10 edges) and K3,3 is the bipartite graph with 3 ´ 3 vertices (3 vertices all connected with 3 other vertices, thus having 9 edges). This last graph is equivalent with the buildings in the well-known “three houses-three utilities” puzzle. Therefore, the puzzle (to connect all three houses with all three utilities without crossing of the conduits) is insoluble (at least when conduits within each other or going up and under a bridge are excluded).
Other theorems, like that of Whitney or that of MacLane) also characterise necessary and sufficient conditions for planar graphs (though not as elegant as Kuratowski’s), but their description falls outside the scope of this chapter. The interested reader can find them in Busacker & Saaty, 1965, p.73).
Arbitrary graphs Arbitrary graphs have no special property, unless the fact that they do not resort under the former graph types. Large parts of the Internet are arbitrarily structured. An advantage of building networks with arbitrary topology is that it is simple to add or delete nodes or change connections. Other graph types lose their special properties when changed, unless special attention is paid. Another advantage is their chameleon-like property: every piece of information can be represented with an arbitrary graph.
A disadvantage, however, is that they are cognitively demanding. Users get easily lost, because it is very difficult to maintain a mental map of the network. Empirical evidence for this hypothesis will follow in the third chapter (p. 24).
Search Algorithms in Graphs
A problem that almost always shows up when working with graphs is that of search. A search is defined as a procedure that gets as an input a starting point, a goal, a graph, and possibly some information about the graph such as distances, and that returns a list of nodes, serially connected, from start to goal.
More stringent requirements can be made. One could for example demand not to give any solution, but the shortest path. Or a solution in the shortest time possible, or with the least amount of memory required. The previous chapter dealt with the problem of searching (p. 7) and criteria for the quality of different search algorithms (p. 12).
In the literature on graphs (Poundstone, 1989, but see also Wilson, 1996, or Andrasfai, 1977), so-called “labyrinth rules” are described. These are specific search algorithms, developed especially to deal with planar graphs or labyrinths. Presumably the most famous rule is the right-hand rule. You simply enter a labyrinth (that can be of course represented as a graph) and you keep your right (or left, that does not matter, as long as you do not change hands while searching) against the wall. Always keep on walking, and you will find the exit. In fact, you will encounter every place in the labyrinth.
For two reasons, this approach, which comes down to a depth-first search, is not useful. First, it may take a long time in large labyrinths. And second, there exists a type of labyrinth, the so-called Chevening-labyrinth, named after the place were a mathematician built such a construction, that makes this rule fail. In fact, you will cycle in this type of labyrinth.
Another algorithm, the Trémaux-algorithm, avoids cycles by marking every visited edge, such that you can avoid passing any edge more than two times. Although this algorithm is guaranteed to find a solution (compare it with a depth-first cycle-detecting search rule) it can be very inefficient in finding the exit, namely when the labyrinth is very large (the “infinite branches” problem).
Still another algorithm, the Oré-algorithm, does a better job when the labyrinth is very large. This rule in fact mimics breadth-first search, and is therefore inefficient too. The problem with labyrinths is that you can never be sure that a not-explored path is not the correct one. It is difficult to define a useful heuristic for pruning the search tree.
Luckily, such abstract structures are not what we meet mostly in everyday life. Normally, straight-line distance provides a good estimation of remaining distance in euclidean space. In electronic environments, however, things are more complex. For more information about human search behaviour in electronic spaces, refer to chapter III (p. 24).
Cognitive Complexity of Graphs
The observation that navigation in our everyday environment involves mental maps (representations of the environment with help of concepts such as “here”, “up”, “right”, and “north”) has been made ever since Tolman’s work on the subject in 1930 (see Sternberg, 1996, p. 182). Some observations point out that more or less the same happens in electronic environments. Although an euclidean structure is not necessary in such environments, people often metaphorically use terms as “there”, “back” and “far away” when navigating through e-space. Therefore, the study of graph theory is a good starting point when dealing with human search in such structures.
Not all graphs are equally difficult to get a mental map of. Some are very easy: In a path graph, for example, one cannot get lost. Arbitrary graphs are most difficult, for the local structure of arbitrary graphs does not necessarily resemble the global structure. Other graph types are somewhere in between.
Which factors make some graphs distinctively simple or more difficult for users? In the first place, regular graphs are simpler than other ones, because it is more easy to imagine what will come if the local structure of a graph tends to resemble the global structure. We therefore expect that non-arbitrary graphs are more simple than arbitrary ones. Also, planarity (see p. 16) may be important, since every planar graph is euclidean and it may be assumed that we feel more comfortable in euclidean spaces such as our daily environment than in strange, euclidean-violating spaces such as those shown in science-fiction movies.
In the second place, the content of the nodes plays a role. If nodes are linked such that edges exist where people expect them to be, the graph will be easier to traverse. Furthermore, people expect that taking a link will bring them closer to where they want to be. Thus, semantically linked graphs are easier.
In sum, we expect to be the easiest graphs those who have a rigid and planar structure, though preserving semantical information through the links. Furthermore, an RTG seems a good idea, for it is very easy to visit all nodes in such a graph (by simply avoiding to take any edge twice). In the next chapter, a method will be described to combine these three methods. The resulting graph will be (almost) randomly traceable, semantically linked and planarity-preserving structured. A side effect will be that the start page is also the centre, and that the average distance between the start page and the others is small.
Chapter III: Human Network Search: An Experimental Approach
In the preceding chapters, I introduced some general notions, mostly based on formal descriptions of network search. From an AI point of view, however, we cannot limit to artificial algorithms: human capabilities are interesting, both from an applied as from a theoretical viewpoint. An experiment is set up in which several persons searched information in different topologies. An attempt is made to describe their behaviour and performance in terms of formal concepts introduced in the previous chapters.
Introduction
Certainly, humans do not search blindly. In an Internet setting, for example, they select anchors by perceived interest. So one possible measure for sorting the alternatives is semantical distance. This leads to a “semantical greedy search” method (Winston, 1993).
On the other hand, human memory capabilities are limited. We therefore propose that a kind of beam search method is used too, with a beam width of £ 7 (Miller’s magical number). Due to memory reservation for other processes, this number may be significantly smaller (1 to 3).
Note that individual differences in search capabilities may be due to two sources:
information processing differs according to Short Term Memory span and cognitive load of the environment
expertise in the topic covered leads to better estimates of semantical distance.
We therefore predict that three no necessarily independent features lead to improved performance:
having a good memory, because the beam width will be higher (although we did not find a memory effect in previous Internet study, but this was maybe due to a ceiling effect).
being expert in the field, because experts tend to posses better information on semantical relations between concepts
being expert in computer use, because then, memory can be devoted to the instead of to the system.
The goal of this experiment was to obtain evidence for the use of greedy search (extended with beam search) and to provide data about human search behaviour in different network structures. The next chapter will describe an algorithm that mimics this kind of network search.
Greedy Search in Humans
Human network search deviates from machine search with respect to the following. A person who traverses a network is physically present at one and only one node, while a computer algorithm can build up a stack (which size is limited by available memory only). Furthermore, the length of the path taken so far is not of interest for human network search, while computer algorithms are usually designed to find a short solution, which may only be possible by taking into account the total path cost. A* and its derivatives seem therefore inadequate to describe what people do when they try to find information in an electronic space.
Hill climbing and simulated annealing, since they try to optimise a given solution, are inadequate too. One thing, however, seems to be sure: that people use an informed strategy. The only remaining search strategy is greedy search, so it is at least a plausible candidate.
The heuristic used is to be determined then, which is not a difficult task: semantical similarity of the nodes is plausible here. Not only personal observations (reported in Doomen, d’Ydewalle and Van Rensbergen, 1996) but also the research of Johan Bollen (1995) points to this. People tend to choose links based on the perceived similarity between their goal and the nodes’ description.
An important qualitative finding in human network search experiment is the existence of “saw tooth” (Doomen et al., 1996) or “corona” (Raüterberg, 1996) search behaviour. This behaviour occurs in a setting were none of the nodes pointed at seems to provide a link to the goal. People accordingly expand all nodes in a breadth-first way, until they meet a link description with higher similarity to the goal. This behaviour is also desirable for an artificial algorithm that tries to mimic human behaviour.
This greedy search explains why the success of human network search is strongly dependent on the accuracy of the description of the nodes. An example (Doomen et al., 1996) is people’s failure to find “Digitale stad Leuven” starting from the K.U.Leuven’s home page (at least in the time this experiment was run: in September 1996 the universities’ webmaster changed the page structure). This link was hidden in a node pointed to as “Welcome to the K.U.Leuven”. A second example is that almost all Internet users chose the link to the library when the task was to find out about a famous writer (which pages were actually at the Faculty of Arts).
Beam Search in Humans
Ever since George Miller’s 1956 article, which served as a foundation for a computational approach to human cognition, the “Magical number seven” has been a cornerstone for computational models in cognitive science. The theory states that our short term memory, which works in a time scale of seconds, can hold 7±2 items. Delay or interference causes this capacity to drop to about three items (see Sternberg, 1996).
This short term store limits our processing capabilities, a fact that also holds in a network navigation context. The navigation model proposed here assumes that the greedy search is extended with a beam search (see p. 10). An estimation of the beam width is 1-3, depending on several factors among which
expertise with the electronic environment’s navigation toollength of the descriptions of the links (presumably the number of syllables is the crucial factor, since it has been demonstrated that information is mostly acoustically encoded)
chunking opportunities due to knowledge of the field
noise and interference out of the environment
The model reported in chapter IV (p. 35) therefore limits its stack to 1-3.
Experiment: How do people behave when searching information?
Assessing semantical distance in a group of related pages
I chose as subject for the network structure the topic “survival”, because
it is a subject about which I consider myself non-novice
many people find it an interesting topic
it is sufficiently compact, although it covers different subtopics
general knowledge about this topic is rather low
I could rely on an expert book on the topic, a book that is hierarchically organised. It is called “SAS survival guide” (Wiseman, 1993). The author is John Wiseman, who has extensive service with the Special Air Service, and may be thus considered an expert in the field.
First, I took out of the book 19 topics, divided into five categories: essentials, food, camp craft, reading the signs, and on the move. Each topic contains about half a page normal text (see Appendix B (p. 46).
Figure 3.1: Hierarchical structure. Numbers refer to page numbers; text can be found in Appendix B.
Second, a pairwise-comparison task was offered to 9 participants (friends of the author). They had to give a full pairwise-comparison for the 19 pages, thus obtaining 18.19/2 = 171 comparisons for each participant, each on a scale from 0 (no relationship) to 10 (fully comparable). Each protocol took about 2 h to fill out.
Third, these data were analysed in the following ways:
Standardisation over persons, to be sure each participant got an equal weight in the final analyses.
Item analysis on the standardised scores, to assess the reliability of the data. A Cronbach alpha of 0.90 resulted. Only for 1 subject, deletion would result in a higher alpha (0.91). This result confirmed that all nine participants used more or less the same criteria for estimating the similarity of the texts.
Principal component analysis, to get the main structure for the RTG network (see p. 29). Five factors were extracted. The rotation method uses was varimax normalized, chosen for its characteristic that it makes factor loadings extreme, such that each text would belong to one and only one group. This was indeed the result. Furthermore, in all but one case, all loadings in each group had the same sign, such that no bipolar groups resulted. The most typical page for each group was taken as guide page.
Dichotomisation, to get the main structure for the semantical network (see p. 29). Cut-off point was 0 (mean standard similarity higher than average causes pages to be linked; lower than average not to be linked). A total of 57 out of 171 links was strictly positive (precisely one third).
Multidimensional scaling (two dimensions) to visualise the topology of the semantical network (Figure 3.3 is based on this MDS). Only one page, “Organising the camp” is found near the origin of the graph, and this page had connections to all other groups. It was accordingly taken as start page for the semantical network.
Development of three network structures
Figure 3.2: RTG topology. Numbers refer to page numbers; text can be found in Appendix B.
A common start page was developed for both the hierarchical and the RTG network structures. Its contents was the title “Survival Handbook” by John Wiseman, together with a list of related topics. As explained before, the semantical network got as start page “Organising the camp”.
The network links were provided according to the topology by including a list of “related topics” at the end of each page. This was done to avoid the problem of how to link pages that did not share concepts.
The first network structure, the hierarchical one, was obtained by maintaining the author’s division into five categories. “Related topics” included first the parent page (if one existed) and then the list of child pages. Figure 3.1 shows the network.
A second network structure, the randomly traceable graph (RTG), was created with help of the PCA on the semantical distances. For each group, the most typical page was used as guide. The other ones were connected with both this guide page and the start page. Figure 3.2 shows the resulting graph.
The third network results out of a dichotomisation of the semantical distance matrix. The cut-off point was set at 0, thus obtaining an arbitrary graph with 57 out of 171 possible connections. Each page was related to at least one other page. The resulting graph is shown in Figure 3.3.
Figure 3.3: Semantical topology. Numbers refer to page numbers; text can be found in Appendix B.
We expect that those users who get the hierarchically organised pages will face most difficulties finding information; those who get the RTG pages least difficulties, and the semantical pages group in between. The reason for the RTG group to be the most effective is that the RTG topology is a compromise between a rigid structure (the hierarchical) and meaningfulness (the semantically organised pages). Two reasons for the semantical group to be better than the hierarchical group are that
the unique path property of hierarchical structures makes traps likely
semantical information has proven to be important in Internet search (Bollen, 1995).
Development of ten questions Ten questions were constructed as follows. First, ten out of the 19 pages were selected randomly. Then, for each text, one paragraph was selected randomly. A question about this paragraph was then constructed such that there would be no doubt on the correctness of the answer, once this would have been found by the participants.
A list of the ten questions can be found on disk (Appendix C, p. 48).
Comparison of human search effectiveness in three network structures
Twelve friends of the author participated individually in this experiment. They ranged 19-24 years old and all of them had moderate computer expertise. A PC with a recent Internet browser was used. After a short introduction of the goal and the design of the study, the experiment started. Before each question, the browser showed the start page. The experimenter asked a question, and the participant had to find the answer somewhere in the 20 pages on the browser. They were allowed to use forward and backward buttons on the browser’s toolbar, as well as links provided under the heading “Related topics”. The experimenter wrote down the pages they visited, and read the next question each time they were able to show the answer on the screen.
Participants were randomly assigned to one of the three conditions (H, S and R, for “hierarchical pages”, “semantical pages” and “randomly traceable graph”), with two restrictions: each condition had four participants assigned to it, and the temporal order was constructed in four blocks of three conditions each (thus obtaining four blocks with each one R, one S and one H condition).
Questions were shuffled randomly, with one restriction: each sequence was used in exactly one condition, thus having four different sequences, each used three times.
Table 3.1. Descriptive statistics on total length of action sequences for three groups of participants.. First table part indicates raw scores; second part the scores corrected for distance (1.2, 1.5 and 1.9 for R, S and H respectively).
| topology | Mean | Std. Deviation | confid. -95% | confid. 95% | |
| raw scores | R | 23.50 | 4.36 | 16.56 | 30.44 |
| H | 66.50 | 30.56 | 17.88 | 115.12 | |
| S | 40.50 | 15.29 | 16.18 | 64.82 | |
| corrected scores | R | 19.59 | 3.63 | 13.81 | 25.36 |
| H | 35.00 | 16.08 | 9.41 | 60.59 | |
| S | 27.00 | 10.19 | 10.79 | 43.21 |
In the first place, we are interested in the relative performance of the three groups. The main results are given in Table 3.1, separately for each group (H, S and R denoting the hierarchic, the semantical and the RTG group respectively). A one-way ANOVA shows a main effect of the topology factor (F=4.74; df=2; p=.04), indicating that indeed R did better than S and S better than H. Only the difference between H and R is significant (p=.01); both other pairwise comparisons failed to show significance (between H and S: p=.1 ; between S and R: p=.26 ).
One explanation for the effect is that the mean distance to cross for the H group is larger than the distance for the S group, which is in turn larger than that for the R group. When corrected for this factor (H divided by 1.9, S by 1.5 and R by 1.2), the global data pattern remains although the significance disappears (F=1.89, df=2, p=.2). This indicates that indeed the average distance from centre to all other nodes is a great advantage for the RTG (though it may be only a part of the story, since the data pattern remains unchanged). Needless to say, no pairwise comparisons reached significance (between H and R: p=.08 ; between H and S: p=.34 ; between S and R: p=.37 ).
In the second place, we are also interested in the learning effect for all three groups. We may assume that participants will have less difficulties finding the tenth answer than the first one, for at least two reasons:
Experience with the browser and the page structure
Content knowledge obtained while searching
To measure the learning effect, I did the following analysis. First, I standardised the (uncorrected) scores per person in the order of interrogation. Second, I averaged within each group over ten variables, obtaining three series of 10 values. Third, I correlated these series with the series 1, 2, ..., 10. The correlations for the groups were .006 (S group, p=.97), -.137 (H group, p=.70) and -.653 (R group, p=.41), respectively. Remark that only the R group did learn something about the structure of the pages, the H and the S group did not (though for the H group, this may be due to restricted power).
Since only twelve subjects participated in this small pilot study, I want to stress that the conclusions are nothing more than an indication of which factors may influence human network search. Two main, tentative, conclusions can be drawn:
About the role of topology: Clearly, topology is important when looking up information in the electronic space. An easy structure, such as the RTG, is better at two points. First, it seems to be more easy for simple lookup and second, it is easier to learn.
About the human search algorithm: It seems that indeed semantical distance, the perceived similarity between the node’s description and the goal, determines which nodes are chosen by a user at a certain time. The quantitative results may be too meagre evidence that this is the case; several remarks made by participants indicate that they indeed perform a kind of “greedy search” (“there has to be a link to that page on this one” and “strange, one would not expect ‘cooking’ to be missing here”). Furthermore, memory restrictions show up (“where was this link to ‘fire’ again”), indicating that also beam search is a good candidate for human network search.
Implications for a theoretical account of human network search
However limited this experiment may be, I think the original hypothesis of “greedy search” extended with “beam search” still holds. Therefore, the simulation study that is the main subject for the next chapter (p. 35) will concentrate upon these observations.
The algorithm needs two kinds of information to successfully reproduce people’s behaviour:
A measure of the beam width: actually, it may be that this is not a constant in people’s mind (people might adapt their memory use depending on the “intrestingness” of the topic) but for sake of clarity, I shall limit to a constant between 1 and 3.
A semantical distance matrix (that is available out of the preliminary study of this chapter, p. 27). Because working with the distance matrix of the preliminary study is not a good idea (it requires adaptation of all others when adding a node), I shall use a technique, called multidimensional scaling, for developing a multidimensional space that represents these distances as accurately as possible. Adding a node requires fixing its co-ordinates in this space, and the (extended) Pythagorean rule can be used for calculating distances.
If an algorithm, performing greedy search with a beam restriction, and using semantical distance as a heuristic, succeeds in mimicking people’s search activities, I shall consider this study as a provisional success. The next chapter outlines the algorithm and runs a simulation of the data obtained in the experiment reported of in this chapter.
Chapter IV: An Algorithm that mimics Human Network Search
Where the third chapter described human search in networks from a cognitive point of view, this chapter tries to outline an algorithm that mimics the behaviour of the participants in the experiment mentioned above. I consider it important to mimic the relative performances on the three topologies studied so far, as well as qualitative findings such as the saw tooth effect.
Description of the algorithm
The purpose of this algorithm will be to mimic the relative performance and the search behaviour of the participants in the experiment of the previous chapter; learning effects will be left out for simplicity.
The algorithm gets as input following variables:
A graph consisting of nodes and links. As in the experiment mentioned in chapter III, three such graphs are considered: the semantically linked, the hierarchically linked and the randomly traceable graph (abbreviated as S, H and R respectively).
A distance matrix, simplified to a co-ordinate system in nine dimensions, in which each node has its place (9-tuple created by MDS and set at (0, 0, 0, 0, 0, 0, 0, 0, 0) for node 0, for which quite logically no comparisons were available. The nine-dimensional MDS solution had a recovery error of less than 1 per cent).
A start node, which is fixes to node 0 for the R and the H graph, and 7 for the S graph.
A finish node, which depends on the question (finish nodes are 8, 10, 14, 11, 12, 2, 7, 18, 6 and 17 for questions 1 to ten, respectively).
A constant for the beam width, which will be set at 1, 2 and 3 successively.
The basic algorithm is a best-first (greedy) search, with an extension for beam search. The performance of this algorithm will be compared with the data for the human participants of the previous chapter.
Implementation
To avoid inventing the wheel twice, I adopted the algorithm by Winston (1992). Following changes to his beam-search algorithm were made:
The distance heuristic was changed to deal with nine-dimensional distances
The place definitions were changed accordingly
The beam width variable was varied from 1 to 3
Three data files were created (one for each topology)
How well does the algorithm perform?
The resulting algorithm and data files are given in Appendix A (p. 46) and on disk (p. 48). For each topology and for each beam width the algorithm was run ten times (once per finish node), thus obtaining 3×3×10=90 data cells. For each of these trials, the time (in clock ticks) and the extended paths were recorded. The output files are available on disk (p. 48). From the extended paths, an action sequence can be determined. The corresponding action sequence length is used as dependent variable in the analysis of the results, as was the case with the human participants.
In all but four cases, the algorithm found the finish node. The four missing cases were three H-1 trials (hierarchical group with beam width 1) and one H-2 trial. This finding indicates that indeed the H structure requires more memory than both other topologies.
Table 4.1. Results of the simulation study. Cells show mean action sequence length per trial for different beam widths (columns) and topologies (rows). First table part indicates raw scores; second part the scores corrected for distance (1.2, 1.5 and 1.9 for R, S and H respectively). Last column repeats mean scores for humans on the search task. Values between parentheses indicate that not all finish nodes were found such that the value is an underestimate.
| topology | Beam width 1 | Beam width 2 | Beam width 3 | Human scores | |
| raw scores | R | 2.2 | 2.5 | 3.0 | 2.35 |
| H | (2.6) | (5.5) | 8.1 | 6.65 | |
| S | 2.6 | 3.9 | 6.1 | 4.05 | |
| corrected scores | R | 1.83 | 2.08 | 2.5 | 1.96 |
| H | (1.37) | (2.89) | 4.26 | 3.5 | |
| S | 1.73 | 2.6 | 4.06 | 2.7 |
Table 4.1 gives the main results of the analysis. Shown are both the raw and the corrected mean action sequence lengths per trial for three beam widths and all three topologies. For easy comparison, the results of the human group are included in the last column of the table.
From Table 4.1, it is clear that the results of the machine and the human searches are fairly comparable, at least for a beam width of 2 or 3. The correlations between the machine data and the human data are .80 (beam width 1), .998 (beam width 2) and .97 (beam width 3) for the uncorrected data and -.96 (beam width 1), .98 (beam width 2) and .90 (beam width 3) for the corrected scores. The beam width 2 model provides the best fit with the human data; we can tentatively postulate that humans remember about two link descriptions at the same time while navigating through e-space.
Not only the qualitative aspect of human data is represented quite accurately by this computer algorithm. The saw tooth effect, too, (Doomen et al., 1996) occurs in the algorithm’s action sequences. As can be easily verified, the phenomenon mostly shows up when several nodes have an almost equal distance to the finish node. A good example is the model’s behaviour in most cells of the beam-3 model of the hierarchical group. For instance: cell 2 shows action sequence (0, 3, 0, 5, 0, 13, 0, 3, 10), where nodes 3, 5, 13 and 3 have almost the same (large) distance to the finish node 10.
A remark concerning the use of memory is on its place here. Except for the hierarchical pages (in which it is difficult to search) it does not pay to have more memory available, quite on the contrary. Comparing columns 2 and 3 in Table 4.1 shows that in all cases, the beam width 2 performs better. In hierarchical structures, however, memory restrictions make the probability of success lower. This is another reason why hierarchical structures are unfit for information retrieval purposes: the memory requirements are mostly higher.
That the computer results resemble the human data is fairly remarkable. One would not expect that a simple algorithm like the beam search with a semantical distance measure performs so well. However, some limitations have to be pointed out here. They concern the difference in cost of backtracking of humans versus the algorithm, the task requirements and the fixedness of the beam width.
The cost of backtracking differs: this is 1 for the computer but not for humans: Backtracking requires two steps (one back, one further). Therefore, the difference in performance for H versus S and R groups might be underestimated by the model.
The algorithm’s task is to find the finish node directly, while human participants had to perform an additional task (finding the required information within the page). In addition, the questions may suggest an answer can be found at page A while it is at page B. The computer does not have this problem. On the other hand, human participants might use more information than only semantical distance, such that both effects cancel each other.
Last but not least, humans might use a variable amount of memory depending on the “interestingness” of the link description. In almost all domains of cognitive science, people have proven to be very versatile in using strategies to cope with different problems. Therefore, I suspect the fixed beam strategy to be an approximation of a more complicated process.
But after all, the model seems to be valid for the experimental material presented here. Not only quantitatively (relative performance) but also qualitatively (saw tooth effect), it captures essential aspects of people’s behaviour when working in electronic environments. People seem to use a kind of “semantical greedy search” with a limited amount of memory, the “beam width”, which value is about two.
Chapter V: Conclusion
A simple search method, greedy search with as heuristic function semantical distance and extended with beam search (width 2) was able to capture human search behaviour in a quantitative as well as in a qualitative way. This conclusion is only tentative, though, for only an extensive test of the model will be able to corroborate the findings. A second type of conclusion concerns the quality of network topologies for information retrieval: it seems that hierarchical structures are inferior to semantical structures, which are in turn somewhat worse than a randomly traceable graph.
Comparison of human and machine search The first chapter, on search algorithms (p. 6), provided an essential base for studying human search. Several candidates for a “human search algorithm” were proposed, only two of which seemed meaningful. Uninformed strategies are not a good idea, for it is clear that people use some kind of information. Memory bounded strategies are technical solutions that do not correspond to human cognitive capabilities. Iterative improvement search starts with a complete solution, which is clearly not the case in human search.
The remaining, best-first type strategies, require a heuristic function to be determined, with only one condition: that its value is 0 at the finish. It was argued that people use semantical distance as a heuristic function, which seems intuitively clear (the condition is satisfied: it is generally agreed upon that the semantical distance between A and A is 0). Distance travelled so far does not play a role when finding a path to the solution, such that greedy search might be the machine equivalent of the human search strategy.
An extension of this greedy search, beam search, limits the number of partial paths to be considered. It was developed as a computer search strategy to cope with the memory problem that shows up when the branching factor of the search tree is high. In humans, a limitation of available memory is suggested by the “Miller’s seven”: The observation that our short term memory can hold about seven items. This capacity may drop to 1-3 when facing cognitive requirements of the environment.
The resulting model thus comes down to a semantical greedy search with a limited beam width. Evidence for this model was obtained by simulating data of human search (chapter III, p. 24) by an implementation of this algorithm (chapter IV, p. 35). It came out that both quantitative (average distance travelled per trial) and qualitative (the saw tooth effect) data were quite alike for human and computer, at least for the model with beam width 2. This suggests that people can retain about two items (node descriptions) while retrieving information in electronic space. However, the simulation deviated from the assumed cognitive processes of the user in at least three ways: the cost of backtracking, the task requirements (finding a specific piece of text versus finding the goal node) and the beam width’s fixedness.
Human search effectiveness in different topologies
Even nowadays, developers of hypertext stick to hierarchical topologies when developing materials for information retrieval, although this type of structure has clear disadvantages:
The semantical trap: people easily dive into the tree, towards the leaf nodes, and never return to the higher pages. While this is especially the case in deep structures, also in the experiment reported of in chapter III (p. 24) some people had difficulties leaving the branch just chosen. Usually, they explored it fully, only thereafter returning to the top nodes. This results in a bad average for these users. A lot depends on the classification made: if the user’s category structure deviates from the developer’s (which is often the case, since developers are expected to be experts, while most users are laymen) users will be caught in the semantical trap. This is a well-known problem in AI: how do you classify a football, a banana, a watermelon and a baseball bat?
Memory requirements: a lot of things have to be remembered in a hierarchical structure (“where do I come from?”, “did I already had the sub-branches of that one?”). Especially when the tree is deep, full exploration requires a careful registration of many facts (an uninformed strategy may perform best when full exploration is necessary, quite a weird strategy for humans). The algorithm too did perform badly in a hierarchical structure; in fact, the incompleteness of the beam search only showed up here.
Some researchers, like Bollen (1995), therefore turned to semantical structures. His approach to developing hypertext structures that are “easy to use” is as follows. Let people navigate through the structure, and build an algorithm that learns, based on the user’s behaviour. The link’s strength (priority in the “related topics” list) is determined by a Hebbian learning rule. The resulting graph is arbitrarily structured. In fact, this training procedure performs a cluster analysis on the topics included. Although this seems a good way of developing hypertext pages that do not lead to semantical traps, I have argued that it is possible to do still better. Not only one-trial lookup has to be considered; also learning processes are important. People want to get used to the structure they consult often, while it can be assumed that it is very difficult to build a mental map of such arbitrary structures. In fact, the learning results of chapter III (p. 31) indicate that the semantical structure, though developed via simple judgement of semantical distance, is far inferior to the RTG, and somewhat worse than the hierarchical structure.
What I have argued in this thesis, is that a compromise between the regularity of the topology and the meaningfulness of the semantical structure is a better base for constructing hypertext. The experimental results, though provisionally, support this hypothesis. People quicker find what they aim for, and also show that they internalise the structure of the RTG. The simulation study only corroborates the search results: the algorithm performed better on the RTG, regardless of the beam width. That it does better on the RTG than on the semantically structured pages, is remarkable, since the latter’s co-ordinate structure lies closer to the algorithm’s distance heuristic.
Which factors make the RTG both easier to internalise and better to search in? The experiment nor the simulation are able to provide a sufficient answer on this question, although some of these features of the RTG definitely play a role:
Shallow structure: The home page links lead to many different topics, which in turn lead to only a few. This is the reverse of the hierarchical structure, in which a few topics are connected to the home page on the one hand and to all remaining topics on the other. Therefore, the RTG causes the average distance from the centre to all other nodes to be very small. The observation that the experimental effect disappears when controlled for distance (although the general pattern remains) strengthens the conclusion that this is an important feature.
Centre of the graph: the home page, presumably the most important page in a navigation context, is also the node with smallest total distance to all other nodes. Although this seems an interesting feature, it does not explain the experimental data of the third chapter, nor the findings of the simulation study. Indeed, the home page was also the centre in both other structures.
Randomly traceable: the simple fact that not taking any link twice is sufficient to visit all nodes may explain why the RTG is better than other topologies, for people tend to avoid to take a link again after observing that this behaviour leads to failure (unless they are not sure having read everything on the destination page). It cannot explain why the simulation performed better on the RTG, since the algorithm avoids cycles.
Planarity: The planarity of the graph may be crucial for explaining the strong learning effect. Indeed, planarity makes a graph comparable to everyday euclidean space, such that the construction of a mental map is easier for planar graphs.
What I hope readers remember about this study
People use semantical distance between what they see (the node description) and what they want (their goal node) when searching information in electronic environments. At each stage, they remember about two node descriptions (necessary for navigation).
A hierarchical structure imposed on the information you want to present is unfit for different reasons: “semantical traps” are likely, more memory is needed to navigate in hierarchical topologies, and categorisation may be arbitrary.
A purely semantical topology does a better job, but since the corresponding graph is arbitrarily structured, it is not easy to build a mental map of such structures. Consequently, a semantically structured graph is difficult to get a mental representation of.
A combination of both a rigid structure and a meaningful organisation may be the best solution. People have less difficulties both with finding information and with building a mental map, such that future lookup is facilitated.
In addition, the randomly traceable graph is a good idea because one cannot “get lost” in such space. One will always return to the centre of the graph, while visiting all nodes (provided each time a fresh link is chosen).
Practical applications of what is described here, are ubiquitous in the domain of hypertext developing. If attention is paid both to the meaningfulness of the nodes and to the memory limitations people face, several computer related problems can be avoided.
References
Appendices
Appendix A: LISP Source Code for the Algorithm Proposed
The search procedure
(defun generalized-search
(start finish method &optional (queue (list (list start))))
"Purpose: General-purpose search procedure.
Arguments: Start, finish, and method-dependent queuing function.
Returns: Path from start to finish.
Remarks: Implements many search methods via various queuing methods."
(cond
;;If the queue is empty, lose; no path can be found:
((endp queue) nil)
;;If the first element terminates at the finish, win:
((eq finish (first (first queue))) ;Finish found; done.
(format t "~%The successful path is ~a." (reverse (first queue)))
(reverse (first queue)))
;;Otherwise, try again with a new, method-specific queue:
(t (generalized-search start finish method
(funcall method (extend (first queue)) (rest queue) finish)))))
(defun extend (path)
(format t "~%Extending the path ~a." (reverse path))
;;Produce a list of one-place partial-path extensions:
(mapcar #'(lambda (new-node) (cons new-node path))
;;Get rid of circular paths:
(remove-if #'(lambda (neighbor) (member neighbor path))
;;Get the neighbors of the terminal place:
(read-place-neighbors (first path)))))
(defmacro defplace (place neighbors coordinates)
`(progn
;;A place is an atom, which evaluates to itself:
(setf ,place ',place)
;;It has a list of neighbors on its property list:
(setf (get ',place 'neighbors) ',neighbors)
;;And it has a list of x-y coordinates on its property list:
(setf (get ',place 'coordinates) ',coordinates)
',place))
(defun read-place-neighbors (place)
(get place 'neighbors))
(defun read-place-coordinates (place)
(get place 'coordinates))
(defun closerp (path-1 path-2 finish)
(< (straight-line-distance (first path-1) finish)
(straight-line-distance (first path-2) finish)))
(defun straight-line-distance (node-1 node-2)
(let ((coordinates-1 (read-place-coordinates node-1))
(coordinates-2 (read-place-coordinates node-2)))
(sqrt (+ (expt (- (first coordinates-1) (first coordinates-2)) 2)
(expt (- (second coordinates-1) (second coordinates-2)) 2)
(expt (- (third coordinates-1) (third coordinates-2)) 2)
(expt (- (fourth coordinates-1) (fourth coordinates-2)) 2)
(expt (- (fifth coordinates-1) (fifth coordinates-2)) 2)
(expt (- (sixth coordinates-1) (sixth coordinates-2)) 2)
(expt (- (seventh coordinates-1) (seventh coordinates-2)) 2)
(expt (- (eighth coordinates-1) (eighth coordinates-2)) 2)
(expt (- (ninth coordinates-1) (ninth coordinates-2)) 2)))))
(defvar *beam-width* 3)
(defun beam (start finish &optional (queue (list (list start))))
"Purpose: Beam search.
Returns: Path from start to finish.
Remarks: Keeps *BEAM-WIDTH* best partial paths."
(cond ((endp queue) nil)
((eq finish (first (first queue)))
(format t "~%The successful path is ~a." (reverse (first queue)))
(reverse (first queue)))
;;Differs from generalized search; all partial paths extended:
(t (beam start finish (queue-beam (mapcan #'extend queue) finish)))))
(defun queue-beam (paths finish)
;;Sort all paths by distance to goal; keep only best:
(setf paths
(or (filter-out-losers paths finish)
(filter-out-dead-ends paths)))
(let* ((n (length paths)))
(butlast (sort paths
#'(lambda (p1 p2) (closerp p1 p2 finish)))
(max 0 (- n *beam-width*)))))
(defun filter-out-losers (paths finish)
(remove-if-not #'(lambda (path) (eq finish (first path))) paths))
(defun filter-out-dead-ends (paths)
(remove-if-not #'extendable-p paths))
(defun extendable-p (path)
;;Eliminate if circular:
(remove-if #'(lambda (neighbor) (member neighbor path))
;;Get the neighbors of the terminal place:
(read-place-neighbors (first path))))
Distances for the hierarchical topology
(defplace n00 (n03 n05 n09 n12 n13)
(0 0 0 0 0 0 0 0 0))
(defplace n01 (n05)
(8 -15 50 -49 -1 17 -12 26 -32))
(defplace n02 (n05)
(20 -1 54 -52 -31 14 -30 -17 21))
(defplace n03 (n00 n10 n19)
(-72 -36 -19 -7 29 -16 -42 33 -5))
(defplace n04 (n13)
(87 -47 -10 16 -15 10 18 -5 -24))
(defplace n05 (n00 n01 n02 n15)
(30 -12 57 -44 14 -25 14 -24 -1))
(defplace n06 (n09)
(16 82 -57 -44 16 -12 1 6 -21))
(defplace n07 (n09)
(-1 33 31 60 -30 -39 8 -14 -19))
(defplace n08 (n09)
(-87 -25 10 12 -33 -15 16 -2 8))
(defplace n09 (n00 n06 n07 n08 n14 n16 n17 n18)
(19 78 -4 33 13 9 -23 -21 -10))
(defplace n10 (n03)
(-62 -36 -46 -18 5 -16 26 -21 13))
(defplace n11 (n12)
(68 -53 -50 5 -5 -10 -6 19 21))
(defplace n12 (n00 n11)
(88 -37 -22 -1 -47 -12 3 11 12))
(defplace n13 (n00 n04)
(58 -30 -11 46 33 41 -20 -14 -9))
(defplace n14 (n09)
(-75 -23 -2 26 -9 8 -46 -20 13))
(defplace n15 (n05)
(19 -19 32 9 85 -28 22 -1 17))
(defplace n16 (n09)
(10 92 2 12 -16 -32 -6 26 14))
(defplace n17 (n09)
(-2 64 -25 -22 3 51 22 -7 30))
(defplace n18 (n09)
(-48 6 48 39 -4 42 35 36 5))
(defplace n19 (n03)
(-77 -20 -37 -19 -10 13 21 -12 -32))
Distances for the RTG topology
(defplace n00
(n02 n03 n05 n06 n09 n10 n11 n12 n13 n14 n16 n17 n18 n19)
(0 0 0 0 0 0 0 0 0))
(defplace n01 (n02 n05)
(8 -15 50 -49 -1 17 -12 26 -32))
(defplace n02 (n00 n01)
(20 -1 54 -52 -31 14 -30 -17 21))
(defplace n03 (n00 n08)
(-72 -36 -19 -7 29 -16 -42 33 -5))
(defplace n04 (n11 n12 n13)
(87 -47 -10 16 -15 10 18 -5 -24))
(defplace n05 (n00 n01)
(30 -12 57 -44 14 -25 14 -24 -1))
(defplace n06 (n00 n15)
(16 82 -57 -44 16 -12 1 6 -21))
(defplace n07 (n09 n16 n18)
(-1 33 31 60 -30 -39 8 -14 -19))
(defplace n08 (n03 n10 n14 n19)
(-87 -25 10 12 -33 -15 16 -2 8))
(defplace n09 (n00 n07)
(19 78 -4 33 13 9 -23 -21 -10))
(defplace n10 (n00 n08)
(-62 -36 -46 -18 5 -16 26 -21 13))
(defplace n11 (n00 n04)
(68 -53 -50 5 -5 -10 -6 19 21))
(defplace n12 (n00 n04)
(88 -37 -22 -1 -47 -12 3 11 12))
(defplace n13 (n00 n04)
(58 -30 -11 46 33 41 -20 -14 -9))
(defplace n14 (n00 n08)
(-75 -23 -2 26 -9 8 -46 -20 13))
(defplace n15 (n06 n17)
(19 -19 32 9 85 -28 22 -1 17))
(defplace n16 (n00 n07)
(10 92 2 12 -16 -32 -6 26 14))
(defplace n17 (n00 n15)
(-2 64 -25 -22 3 51 22 -7 30))
(defplace n18 (n00 n07)
(-48 6 48 39 -4 42 35 36 5))
(defplace n19 (n00 n08)
(-77 -20 -37 -19 -10 13 21 -12 -32))
Distances for the semantical topology
(defplace n01 (n02 n03 n04 n05 n12 n13 n15 n18 n19)
(8 -15 50 -49 -1 17 -12 26 -32))
(defplace n02 (n01 n05 n08 n12 n14 n17)
(20 -1 54 -52 -31 14 -30 -17 21))
(defplace n03 (n01 n08 n10 n14 n15 n18 n19)
(-72 -36 -19 -7 29 -16 -42 33 -5))
(defplace n04 (n01 n05 n07 n11 n12 n15)
(87 -47 -10 16 -15 10 18 -5 -24))
(defplace n05 (n07 n12 n15)
(30 -12 57 -44 14 -25 14 -24 -1))
(defplace n06 (n09 n16 n17)
(16 82 -57 -44 16 -12 1 6 -21))
(defplace n07 (n04 n05 n08 n09 n14 n15 n16 n18)
(-1 33 31 60 -30 -39 8 -14 -19))
(defplace n08 (n06 n07 n14 n18 n19)
(-87 -25 10 12 -33 -15 16 -2 8))
(defplace n09 (n06 n07 n13 n16 n17)
(19 78 -4 33 13 9 -23 -21 -10))
(defplace n10 (n03 n08 n14 n18 n19)
(-62 -36 -46 -18 5 -16 26 -21 13))
(defplace n11 (n04 n12 n13)
(68 -53 -50 5 -5 -10 -6 19 21))
(defplace n12 (n01 n02 n04 n05 n11 n13)
(88 -37 -22 -1 -47 -12 3 11 12))
(defplace n13 (n01 n04 n09 n11 n12 n15)
(58 -30 -11 46 33 41 -20 -14 -9))
(defplace n14 (n02 n03 n07 n08 n10 n15)
(-75 -23 -2 26 -9 8 -46 -20 13))
(defplace n15 (n01 n03 n05 n07 n13)
(19 -19 32 9 85 -28 22 -1 17))
(defplace n16 (n06 n07 n09 n17)
(10 92 2 12 -16 -32 -6 26 14))
(defplace n17 (n02 n06 n09 n16)
(-2 64 -25 -22 3 51 22 -7 30))
(defplace n18 (n01 n03 n07 n08 n10 n13)
(-48 6 48 39 -4 42 35 36 5))
(defplace n19 (n01 n03 n08 n10 n14 n17 n18)
(-77 -20 -37 -19 -10 13 21 -12 -32))
Appendix B: Hypertext Pages for the Experiment
1. Survival kit
Items you can use in a survival kit are:
matches: preferably waterproof (can be achieved by dipping the head into candle fat).
candle, flint and saw, and magnifying glass for making fire.
needles with thread. Wrap thread around needles
fish hook and line (as much line as possible)
compass: liquid-filled type with luminous button is best. Make sure it works and you know how.
small medical kit with pain reliever, anti-diarrhoea, anti-infections, anti-insect bites, water sterilising tablets, two scalpel blades, plasters and a condom (water container)
a sharp knife, a torch and food (butter, sugar, salt, meat and chocolate)
2. General equipment
Clothes should be well-fitting but non-restrictive, giving protection from cold and rain while keeping the body ventilated. Carry waterproofs, a change of clothes and extra warm garments. For cold climates layers are best. Break in new boots gradually; two weeks before depart start hardening your skin with surgical spirit.
Sleeping bags out of natural materials give better insulation than man-made fibres. When wet, however, they lose their insulating properties and are difficult to dry out.
A backpack must be strong and waterproof. A backpack with external frames is best; they can take heavy weights and even an injured person.
Pack so you know where everything is and the first things you need are on top. Keep damageable food in containers.
Ensure vehicles are in working order. Adjust them to deal with high altitudes and extreme conditions, carry spares and tanks for extra fuel and water.
3. Food plants
A balanced diet is as important as having enough to eat. Vary your diet: it must contain the right proportions of fat, protein, carbohydrates, minerals and vitamins. Therefore, plants are often a good choice.
Do not assume that because animals eat a plant, it is edible for you too. Take following test (one person, never take shortcuts!):
inspect: try to identify. Avoid slimy or worm-eaten plants, and old or withered plants.
smell: crush and smell a portion. Discard if it smells of bitter almonds or peaches.
Skin irritation: squeeze and rub on tender skin. Discard if discomfort, rash or swelling.
Lips, mouth, tongue: place a piece on these three, in given order, waiting 15 seconds before next step. Discard if discomfort (soreness of throat, irritation, burning, stinging).
Swallow: swallow small piece and wait five hours. Eat or drink nothing else.
Eating: if no reactions, plant may be considered safe. Drink plenty of hot water in case of stomach trouble. Induce vomiting if severe, or eat charcoal. White wood ash mixed with water relieves stomach pain.
Edible plants are white mustard, shepherd’s purse, primrose, dandelion, chicory, wild sorrel, buckwheat, curled dock, dead-nettles and stinging nettles. Fruit and nuts are edible in general. Some can be dried: lay them outside direct sunlight, protected from wet during 10 days.
4. Planning the route
Visibility will often be restricted and you must guess what lies ahead. Things you can see may be misleading. If you have them, use binoculars. Climbing a tree can help, but do not risk a fall.
Watercourses offer a route to civilisation and a life-support system on the way. Follow water unless difficult—then, move to higher grounds and cut off the bends.
Choose a distant landmark and head towards it. Try to skirt dense vegetation: orientation is difficult. A compass becomes a valuable asset.
In featureless terrain, if in a group of three or more, to maintain a straight line separate and follow at intervals in each other’s tracks. Look back frequently and move in relay, that is, the one who went ahead can rest while everyone else moves up from the rear. If travelling alone, place landmarks or align yourself at your own tracks.
Once on high grounds, stick to it until certain that you have found the spur down which allow you to make the best progress in the desired direction.
5. Be prepared
Make sure you are physically and mentally prepared before you set out and pack the appropriate gear for what you plan to do.
Ask yourself:
How long will I be away? How much food and water do I need to carry?
Have I the right clothing/footwear for the climate?
What special equipment do I need for the terrain? What medical kit is appropriate?
Have a thorough medical check-up and pack a medical kit to cover all your likely needs and those of each member of the group.
Consider the ability of each group member to deal with the challenges ahead. Nominate a medic, cook, mechanic, driver, navigator etc.
Carry out a research of terrain and people beforehand; study your maps and the climate, weather conditions, river directions, mountains and vegetation.
Be prepared if anything goes wrong. What if a vehicle breaks down, or the group is separated, or somebody falls ill?
6. Knots
Match the type of rope to the use you make of it. Vines, rushes, barks, palms and animal hairs can be used to weave a rope. It is important to select the right knots for the task in hand:
Reef knots can be used to tie ropes of the same thickness. Holds firm under strain, yet is easily untied. Disadvantage: is not reliable. Pass right end over left and then under it. Take left over right and under it. The two loops should slide on each other.
Overhand knot: make a loop and pass the live end back through it.
Overhand loop: fixed loop for throwing aver a projection. Double the end of the rope and tie an overhand knot with the loop.
Figure-of-eight: An end-stop. Make a loop. Carry live end first behind, then round, standing part. Bring it forward through the loop.
7. Organising the camp
If no command structure exists between a group of survivors, establish an organising committee with particular responsibilities. A roster is essential for daily chores. Everyone who is fit should take their turn at unpleasant tasks, unless their skills are in such demand that it would be a waste of their abilities. Keep busy to avoid boredom. Do not venture from the camp less than in pairs. A nightly gathering provides discipline, and is an opportunity to debrief and to discuss new strategies.
Strict hygiene should be practised in the camp. Latrines must be downhill of the camp and away from the water supply. Establish a collection point for drinking water. Ensure no-one washes upstream of it.
Do not prepare game in camp: bleed, gut and skin on the trap line to attract game to traps, not camp. Keep food covered and off the ground. Replace lids on containers immediately after use. Stow clothing and equipment where it cannot get wet or burned. Keep things tidy; hook mess tins and cooking utensils on twigs and branches. Never leave the fire unattended.
8. Cooking
Cooking makes food more appetising and easy to digest. It destroys bacteria and parasites that may be present, and neutralises poisons. But when heated, food loses nutritional value. Never cook longer than necessary.
Use the fire to boil water then let the flames die down and use embers and hot ash for cooking. Never let your fire unattended when cooking. Having lit a fire, always have water boiling—unless in short supply—for drinks, sterilising wounds etc. Do not just balance a can on the fire. Support vessels on rocks or suspend them over the fire with wooden sticks.
Boiling conserves natural juices—always drink the liquid unless boiling out toxic substances.
Meat can be roasted too. Skewer the meat on a spit and turn it over hot embers or beside a blazing fire. Continually turn to keep the fat moving over the surface.
Grilling wastes fat; only use when food is plentiful.
Baking requires an oven. Cook the meat on a dish and baste it with its own fat. Slow cooking on a steady heat tenderises meat. Baking is also ideal for root vegetables.
Steaming is a good way to cook fish and vegetables. Punch holes inside a larger can, or put something in the bottom of the large can to keep the inner one above water. Cover the outer can, but do not seal (risks explosion).
9. Shelter and making camp
In a survival situation it is vital to know where to set up camp and how to build a shelter from the materials available. An accident, sudden fog or exhaustion may leave you stranded. Local conditions and materials will determine the type of shelter you build. While there is still daylight, search for a good place, a natural shelter from wind and rain. Do not choose exposed hilltops, valley bottoms, deep hollows, places too close to water, near solitary trees, near bees’ or hornets nests.
There are different types of shelters:
Hasty shelter: make use of a natural cover. Turn your back to the wind and pile up any equipment as a windbreak.
Bough shelter: Branches that sweep down to the ground or partly broken boughs can provide shelter. Make sure they will not fall off the three.
Root shelter: the spreading roots and trapped earth at the base of a fallen tree make a good windbreak. Fill in the sides between extended roots for added shelter.
Natural hollows or fallen trunks also provide a good base for a shelter.
Shelter sheet: if you have a shelter sheet, you can make a triangular shelter with apex pointing to the wind. Do not lie on damp ground!
A more elaborated shelter can be built from wood. Build walls by piling sticks between uprights driven into the ground and tied at the top. If two stacks of sticks are used, space between them can be filled with earth to make a very sturdy wall.
Coverings can be made by weaving springy saplings, plant stems, grasses and long leaves. First make a framework from less pliable materials. If no rope is available, stick wood into the ground and weave between these sticks.
10. Animal hunting
If you can read the subtle signs that animals leave, you will know what hunting methods to use. Only large animals venture out by day; small animals wait until night. Trails to watering or feeding places are clearest on wet ground, snow and damp sand. Determine size of animal beforehand by looking for tunnels through undergrowth. Also droppings can indicate size, and even diet.
Small mammals, like weasels and wild dogs or cats, can be trapped with spring snares, bait bars and deadfalls. Minimise human scent.
Look out for rabbit starvation: rabbit meat does not provide certain minerals and vitamins you eat, so look out for vegetation too.
Reptiles can be eaten unless venomous. Birds are always edible, but do not always taste very good. Birds of prey must be boiled thoroughly. Cage traps, deadfalls and spring snares can be used to catch them.
Insects are rich in fat, protein and carbohydrates. Look in moist shady spots, and collect live specimens that do not look sick or have a bad smell. Most are edible raw, but taste better when cooked or roasted. Remove legs and wings of large insects, as well as armour casing of beetles. Squeeze innards of caterpillar, do not eat skin. Do not eat insects feeding on carrion, refuse or dung—they may carry infection. Brightly coloured insects are usually poisonous, and also grubs on underside of leaves.
11. Direction finding
The sun rises the east and sets in the west, roughly speaking. In the northern hemisphere, at noon, the sun will be due south, in the southern hemisphere it will be due north. The hemisphere is indicated by the way the shadows move: clockwise in the north, anticlockwise in the south.
In the northern hemisphere, hold your watch horizontal. Point hour hand at the sun. Bisect the angle between hour hand and 12 mark to give north-south line. In the southern hemisphere, point 12 towards the sun. The mid-point between 12 and the hour hand will give a north-south line.
An improvised compass can be made with a needle. Magnetise by stroking it with a magnet or with silk, in one direction only. Suspend the needle in a loop of thread without knots or twists. A needle can be made floating by putting it on a floating piece of paper or bark.
Plants can also give an indication of north and south. They tend to grow towards the sun. Moss on tree will be greener and more profuse on the sun side. If trees have been felled, the pattern of rings is more widely spaced on the side towards the equator.
If the moon rises before the sun has set, the illuminated side will be on the west. If it rises after midnight, the illuminated side will be the east.
12. Interpreting maps
Choose maps carefully, and make sure you understand the information given.
Altitude is represented by contour lines, which give a series of points at the same distance above sea level. But they do not record what happens between them! Closely grouped contours indicate steep slope, greater spaces indicate gentler inclines.
The typical scale of a walker’s map is 1:50.000. Not all features can be shown on scale: roads, paths, streams and rivers are usually given in standard width. Study the key and master the way information is presented (swamps, woodlands, buildings).
Map grids are based either on degrees of latitude and longitude or on ground measurements. The map reference is usually expressed as 6 digits: 155628 means 15.5x62.8 on the map.
Unless they are lines of longitude, grid lines do not indicate north and south. Remember that a compass points not to the true north but to the magnetic north—the difference varies according to where you are in the world. If your map does not indicate magnetic north, you can find it from the Pole star.
As-the-crow-flies distances can be measured using a straight edge, which is then matched against the scale. A gradient of 45° will add 8.2 m to a map distance of 200 m.
If you do not have a map, make your own. Find the best vantage point and study the terrain. Note the number and direction of ridges—you won’t be able to see what lies between them, so leave gaps to be filled in as you gain information from other vantage points. Mark anything of interest on your maps—do not rely on your memory.
13. The decision to move
In the short term, unless local dangers or a lack of food and water make it imperative to leave the site of your accident, stay close in the hope of rescue. If you have injured persons and limited resources, send a party to contact help while the others stay to take care for the sick.
In the long term, if no rescue comes, resources may become exhausted and there is an increased risk of disease from staying too long in one place. Such factors will make a move more advisable.
Where to go next will be determined by the information you have been able to gather, by the fitness and endurance of the group and the nature of the terrain. Remember: the most direct route may not be the easiest to travel.
If you have a clear idea of your location, make for the nearest settlement. If you have no idea where you are, follow waterways downstream—they generally lead to populated areas. Move at least three days ‘ journey from the old camp so that fuel, flora and fauna will be undisturbed in the new location.
Before you abandon the camp, leave signs to show you have moved on, and where you are heading. Rescuers then can follow you.
14. Preserving food
If food is not plentiful or is limited by season, ensure that stores keep safely. Do not store food in direct sunlight, near excessive warmth or moisture, nor where scavengers may ruin it. Wrap in airtight and waterproof materials—or store in containers with a good seal. Label stores and different foods to avoid cross-flavouring. Check occasionally to see all is well.
Wind and sun can dry food but it is easier to force-dry over a fire. Dried foods are less vulnerable to moulds and maggots. Cut off the fat and rub salt into the flesh. Hang the salted meat in a cool airy place.
Smoking dehydrates meat and coats it with a protective layer. Smoking can be carried out in a smoke teepee. Get a fire going to produce hot embers. Have a pile of green leaves ready. Drive 3 sticks into the ground and tie the top together. Build a platform between them and set a fire beneath.
Nuts and cereals can be dried on hot rocks from the fire. Turn them frequently. After drying, store them in damp-proof containers. Fruit and berries can be dried whole or cut into slices and dried by sun, smoke or heat. Funghi also dry easily. Fruit can be eaten dry. Add funghi to soups or soak in water for several hours.
15. Facing disaster
It is no use giving up. Only positive action can save you. People can survive seemingly impossible situations if they have the will to.
The survival situation will put you under physical and mental pressure. You will have to overcome some or all of the following stresses:
fear and anxiety
pain, illness and injury
cold and/or heat
thirst, hunger and fatigue
sleep deprivation
boredom, loneliness and isolation.
Can you? You will have to! Self-confidence is a product of good training and sound knowledge. Pain and fever call attention to an injured part and prevent you using it. It is important to treat any injury as soon as possible, but pain may have to be overcome and controlled in order to seek help and avoid the risk of further injury or death.
16. Furnishing the camp
Avoid lying on cold, damp ground. On dry ground, stones heated in the fire and then buried under a thin layer of soil beneath the bedding will keep you warm.
To make an A-frame bed, drive two pairs of posts into the ground at an angle, leaving a little more than your height between the pairs. Lash tops together. On hard ground cross-members will be needed between the feet of each A-frame and between the two A-frames.
To make a tube bed, make a tube of strong material. Make an A-frame support, stick two pieces of wood through the tube, lay it over the A-frames such that the tube becomes a horizontal plane.
Never sit on damp ground. Make a seat as follows. Use a log, or lash together a couple of low A-frame supports and rest a bough across them.
To make a ladder, lash cross-pieces to two long poles set at an angle not parallel, so rungs won’t slip down.
A travois (sled) can be made by making a ladder with extra cross-pieces. Pull the load on its runners like a sled.
17. Tools
Several tools can make life easier. Stone tools can be made by splitting a cobble with a blow from a hard, smooth pebble to form a flat face. Bones, antlers and horns make useful digging implements, gougers and hammers. Cut them with stone tools or grind with coarse stones.
To improvise a handle for an axehead use any straight, knot-free hardwood. The flukes of a buttress tree are ideal. Fit the head by whittling the handle into shape with one end cut to fit the hole in the axehead, cutting a notch in that end. Soak the axe to fix he head. Always check tightness before use. To use an axe, swing it in an arc that feels natural with a firm grip and always away from your body. Never throw an axe on the ground. Sheath it or bury the blade in a log.
Before felling a tree, check overhead for dead branches and hornets’ nests. Clear creepers and branches which could deflect blows. Cut branches off from the outside of the join. Cut from both sides of the tree, first chopping out a notch at an angle of 45° and another in the opposite side at a lower level, on the falling side. A tree with most of its branches on one side will fall in that direction regardless of the placing of the cuts.
18. Fire
Fire is essential to survival. It provides warmth, protection and a means of signalling. It boils water, cooks and preserves food; it heats metal to make tools and bake pots.
Remember the fire triangle: air, heat and fuel. If any of these is removed, the triangle collapses and the fire goes out. You can make fire in this way:
Prepare: Ensure adequate ventilation. Collect sufficient amounts of tinder, kindling and fuel. Prepare a fireplace so that you can control the fire.
Use tinder to make fire: tinder is any material that takes only a sparkle to ignite. Best is birch bark, dried grass, wood shavings, bird down, waxed paper, cotton fluff, fir cones, pine needles and dried funghi. Make teepee of dry wood and lay kindling round tinder bed. Then, ignite tinder.
Use kindling for raising the flames from tinder. Kindling is e.g. small dry twigs, resinous and softer woods. Shave wet kindling until it is dry.
Use dry wood to get a fire going. Once established, mix green and dried-out damp wood. Hard woods burn well, are long lasting and give off great heat.
Other fuels are animal droppings, peat, coal, shales, combustibles and animal fats.
19. Fish and fishing
Fish contain protein, vitamins and fats. All can be attracted and caught with appropriate bait. Angling is not the most efficient method to catch fish—the night line and gill net will give better results.
If it is hot and the water is low, fish retreat to deep, shaded waters. In cold weather, they seek shallow spots where the sun warms the water. At any time, fish like to shelter under banks and rocks. When a river is in flood, fish in slack water.
Leave lines out overnight and check them just before it breaks. Fishing is poor after heavy rain. Fish can see more on the bank than you think because of image refraction, so sit or kneel when fishing. Keep back from the edge and try to keep your shadow off the water.
Hooks can be improvised from all kinds of materials: a pin, a thorn, nails, bone, wood. Best bait is found in the fishes’ habitat: insects that breed in it, berries that overhang it.
All freshwater fish are edible. Those under 5 cm do not need preparation, larger ones can be prepared in the following way:
Immediately after catching, cut fish throat to bleed it, and remove gills.
To gut it, slit from anal orifice to the throat. Remove offal (use for bait) and keep roe. Fish skin can be eaten, so do not remove it.
Eat as soon as possible, unless you want to dry it.
Appendix C: Disc with Results, Demos and Source Code
The disk included offers various materials that were used in this study, such as experimental results, the stimuli of the experiment and the complete text of this thesis. All materials on disk are offered “As is”, without any warranty from the author or promotor.
You may use all materials offered on this disk, provided that
you include in your work a complete reference to this dissertation
the promotor of this thesis, Prof. Dr. G. van Outryve d’Ydewalle, gives his permission.
A complete overview of the disk’s content is offered in the file “readme.txt”.
Index
A
A* search, 9, 10
actual distance, 10
adjacency, 17
adjacency matrix, 17
Andrasfai, 17, 20, 21, 45
ANOVA, 32
arbitrary graphs, 22, 23
average distance, 32, 41, 43
B
beam search, 5, 10, 25, 26, 33, 36, 38, 40, 41, 42
beam width, 10, 25, 26, 33, 36, 37, 38, 41, 42
Berge, 17, 45
best possible solution, 12
bipartite graph, 20
blind methods, 16
Bollen, 26, 30, 42, 45
branching factor, 8, 12, 13, 41
bridge, 18, 21
Busacker, 17, 21, 45
C
camp craft, 27
centre of a graph, 18
chess horse puzzle, 7, 8
circuit matrix, 17
closed trail, 17
cognitive load of the environment, 25
complete graphs, 18
completeness, 9, 10, 12
component, 19, 28
computational approach, 26
connectedness, 17
Conrad, 2
corona, 26
cost contours, 10
Cronbach alpha, 28
cutset rank, 20
cycle graphs, 19
cycle rank, 13, 20
cycles, 13, 19, 20, 22, 43
D
d’Ydewalle, 1, 5, 26, 45, 55
data pattern, 32
degree, 1, 17, 19
degree sequence, 17
diameter, 18
dichotomisation, 30
digraphs, 16
directed graphs, 16, 17
distances, 18, 21, 30, 34
Doomen, 1, 20, 26, 38, 45
E
E(G), 17
edges, 16, 17, 18, 19, 20, 23
effective branching factor, 12
electronic environments, 22, 39, 43
essentials, 27
estimated cost, 10
estimation of remaining distance, 13, 22
euclidean space, 23, 43
Euler, 16, 17, 18
Eulerian graphs, 17, 18, 20
Eulerian trail, 17
evaluation function, 10
expanding, 7
experienced readers, 19
experiment, 5, 24, 26, 31, 33, 35, 36, 42, 43, 55
expert in computer use, 25
expert in the field, 25
expertise, 25, 27, 31
exponential growth of memory, 10
F
Fleury’s algorithm, 17, 20
food, 27
four colour problem, 16
G
goal, 10, 21, 26, 31, 33, 41, 43
graph of the cube, 19
graph theory, 5, 13, 15, 16, 22, 45
graphs, 13, 14, 16, 17, 18, 19, 20, 21, 22, 23, 36, 43, 45
greedy search, 5, 9, 10, 25, 26, 33, 34, 39, 40, 41
guide page, 30
H
Hamiltonian graphs, 18
Hamiltonian property, 18
Hamiltonian trail, 18
handshaking lemma, 17
Harary, 17, 45
heuristic, 9, 10, 13, 22, 26, 34, 40, 41, 43
heuristic methods, 10
hierarchical topology, 19
human search behaviour, 14, 40
human search effectiveness, 31
hypertext, 5, 19, 41, 42, 44
I
IDA* search, 9
indegree, 17
infinite branches, 8, 13, 22
information processing, 25
informed method, 9, 10, 12
input, 36
Internet, 5, 21, 25, 30, 31, 45
isolated vertex, 18
isomorphic, 16
iterative deepening search, 9
IU, 12, 13
K
K3,3, 20
K5, 20
Königsberg bridges problem, 16
Kuratowski’s theorem, 20
L
labyrinth rules, 21
learning effect, 32, 36
LISP source code, 5
local maxima, 11
M
machine search, 5, 14, 25, 41
MacLane, 21
mathematical graph theory, 5, 15
MDS, 29, 36
memory requirements, 9, 10, 38
mental map, 21, 22, 42, 43, 44
Miller’s magical number, 25
multidimensional scaling, 33
multiple roots, 19
N
navigation tool, 27
network navigation, 26
network topology, 13
nodes, 7, 8, 13, 16, 18, 19, 21, 23, 26, 32, 33, 36, 37, 38, 41, 43, 44
nondeterministic search, 8
Norvig, 7, 11, 45
null graphs, 18
O
on the move, 27
optimal path, 16
optimality, 9, 10, 12
outdegree, 17
P
partial path cost, 13
partial solutions, 7, 11
Parunak, 5, 45
PCA, 9, 30
performance, 24, 25, 32, 36, 38, 39
planar graphs, 18, 20, 21, 43
planarity, 20, 23, 43
plateaux, 11
Poundstone, 16, 21, 45
Principal component analysis, 28
pruning the search tree, 13, 16, 22
Pythagorean rule, 34
Q
quality information, 11, 13
R
random, 17
Raüterberg, 26, 45
reading the signs, 27
recursive definition, 17
regular graphs, 23
relative performance, 32, 35, 36, 39
reliability, 28
ridges, 11
rings, 19
role of topology, 33
root, 19
RTG, 20, 23, 28, 29, 30, 32, 33, 42, 43
Russel, 7, 11, 45
S
Saaty, 17, 21, 45
SAS survival guide, 27
saw tooth, 26, 35, 38, 39
search behaviour, 14, 26, 36, 40
search methods, 6, 7, 16
search space, 7, 8
search tree, 7, 8, 9, 13, 16, 22, 41
semantical greedy search, 25, 39, 41
semantical information, 23, 30
semantical network, 28, 29
semantically linked graphs, 23
serial topology, 19
Short Term Memory, 25
shortest path, 10, 21
shortest time, 21
simulated annealing, 9, 11, 26
simulation of the data, 34
SMA* search, 9
spanning tree, 13, 20
stack, 8, 11, 25, 27
standardised scores, 28
starting point, 18, 22
state space, 9
steepest slope, 11
step contours, 10
Sternberg, 22, 26, 45
survival, 27
Survival Handbook, 29
T
ten questions, 31
thickness, 18, 20
time requirements, 8
Tolman, 22
trail, 17, 18
traps, 30, 42, 43
travelling salesman problem, 16
trees, 19, 20
triangle inequality, 18
U
undirected graphs, 16
uninformed strategies, 13
union, 17
unlabelled graph, 17
V
V(G), 17
Van Rensbergen, 26, 45
vertex, 17, 18, 19, 20
vertices, 16, 17, 18, 19, 20
W
wheel graph, 19
wheels, 19
Whitney, 21
Wilson, 16, 17, 18, 21, 45
Wiseman, 29
Wright, 19, 45
wrong decision, 20
1
Tip: the left pane works like windows explorer!
Please send your comments to Peter Doomen.
This document was updated 27/05/00.