Free synthesis of the book "Knowledge Representation" by Han Reichgelt. You may use it, print it and distribute it provided you leave the text as a whole.
An AI Perspective
H. Reichgelt, 1991
Synthesis by Peter Doomen, 1997
Knowledge Representation (KR) is a major concern in AI, but no textbook exists. This book attempts to fill the gap.
This book discusses KR languages: logic, production rules, semantic networks and frames.
With the advent of connectionism, a new style of computing has emerged. A separate chapter discusses the connectionist approach.
Chapter 1: What is KR?
1. Artificial Intelligence and KR
KR is important for AI, in which one agrees upon only one thing; that AI programs involve symbolic information processing with aid of represented pieces of information.
The KR Hypothesis (Smith, 1982) states that any intelligent process contains structural ingredients which observers naturally take to represent the knowledge that the process exhibits. Thus, any intelligent system has a knowledge base, which knowledge is manipulated by an inference engine. Example: experte systems (most famous: GPS of Newell and Simon).
This KR hypothesis was challenged by the connectionist approach. But this is deferred to chapter 8.
While deep philosophical questions about KR spring to mind, they are not the subject of this book. We discuss strong and weak points of KR systems. An important question, that of what knowledge is (true justified belief), makes it more appropriate to talk about information representation systems, but we stick to the term KR systems.
2. Two aspects of a KR language
Two aspects of every KR language are the syntactic and the inferential aspect.
A KR is a formal language, therefore, syntax is important. But apart from this formal aspect, every KR language has an inferential aspect, which is done by the interpreter.
Two remarks must be made here. First, inference rules are independent of the actual knowledge in the knowledge base. Second, inference rules are not restricted to logical-based KR’s (see chapter 6).
Both aspects are closely related. The primitives of the language are those things that the interpreter is programmed in advance to understand (“CAR” in Lisp). But the distinction remains important: two arguments have been used to defend one KR above another: the simplicity with which to represent information, and the power of the inference mechanism.
3. Four levels of KR
(Note: this is based on Brachman, 1979. His fifth level, the linguistic level, is left out.)
1. The implementational level: How is the KR build into the computer? Which indexing mechanisms are used?
2. The logical level: What are the meanings of expressions in the formalism? Which expressive power do the inference rules have?
3. The epistemological level: Which type of primitives are needed for this KR? Which types of inference rules?
4. The conceptual level: Which primitives are actually needed?
Confusion can be avoided by separating these four levels.
4. Requirements of KR languages
1. At the implementational level, the main concern is efficiency (space and time).
2. At the logical level, one is concerned with two things: the syntax (Frege’s compositionality principle, which states that the meaning of a compound expression should be derivable from the meanings of its parts) and the soundness of the inference rules.
3. At the epistemological level, there are four main concerns: If there exists a natural way of organising knowledge, the KR language has to respect this; the KR language has to be modular; attention to the granularity (=size of the information chunks) is important; and the language should support the actual primitives of the conceptual level.
4. At the conceptual level, one is concerned with how concisely one can represent particular pieces of knowledge.
5. Outline of the book
See p. 11.
Chapter 2: Some General Problems in KR
1. The declarative versus procedural controversy
Procedural=knowing how (Winograd).
Arguments against procedural view:
· Requires specification of how a piece of information is used
· No separation between information and the procedures to use it.
· Difficult to change (nonmodularity)
Arguments pro procedural view:
· Seems naturally for many types of knowledge (e.g., adding of two numbers)
· Metaknowledge is more easily represented procedurally
· Procedural implementations work, declarative systems not (1975).
But these arguments are somewhat weakened by their age:
· Declarative systems now work too (expert systems).
· Epistemic logic challenges the view that metaknowledge is more easily represented procedurally
· Economy of storage is important; declarative knowledge is better here (idea of structured programming). However, also procedural knowledge can be represented in a modular way, at least in principle (OO-programming).
Procedural representations maintain a computational advantage: they are faster than the declarative counterparts. Conclusion: there is a trade-off between flexibility and speed.
Summarising, procedural representations have two advantages: the naturalness of expression for some types of knowledge, and their speed. Declarative representations, however, are much more flexible.
2. Types of knowledge
The first type we distinguish is domain knowledge. This can be split up in two types:
· Structural knowledge (type and structure of domain entities)
· Relational knowledge (relationships between facts or between domain entities).
The second type is strategic knowledge (cf. traineeship: putting textbook knowledge into practice).
This distinction is somewhat abstract. Often, the types are intertwingled (knowledge elicitation phase in development of expert system proves this).
3. The problem of control
Problem of control: what inference rules to use on what knowledge? Related to the problem of search in AI. Control regimes are important with respect to this.
3.1 Forward versus backward chaining
Forward chaining (normal way of doing): data-driven approach; bottom-up inference
Backward chaining: goal-driven approach; top-down inference
Each of them has disadvantages, but also strong points of course.
3.2 Depth-first versus breadth-first control regimes
Possible both with top-down or bottom-up approach; discussion here with top-down (easier to explain).
Depth-first and breadth-first: see Winston.
Advantage of depth-first is that it quicker leads to a solution; the advantage of breadth-first is that an optimal solution will be found (shortest one).
3.3. Heuristic search
Sometimes impractical or unnecessary to explore all paths. How to get around this problem? Use heuristic search to prune the search tree. This can be used in three ways.
1. Choice of the operator: conflict resolution: generate subgoal that looks easiest to solve.
2. Conjunct ordering: with backward reasoning, ordering of the different antecedents to search through. Generate subgoal that seems most difficult to solve, because if it fails, you know that the whole enterprise will fail.
3. Pruning of the tree: resource-limited search. This type, unlike the other 2, may result in an incomplete exploration of the tree. Example: impose limit on number of inferences to make in each branch.
3.4 Explicit control knowledge
The problem of control is related to strategic knowledge. Declarative systems that embody control strategies have been developed (TEIRESIAS, MECHO).
Those representations have advantages over procedural approaches:
· Clearness from an epistemological point of view
· Domain-specific heuristics can be used.
· One can use this knowledge for more than one task.
· An explanation system can be developed.
Capacity to perform reflective tasks (needing metaknowledge) is useful, but still an area of research. Usually, reflective programs work slower than non-reflective ones. But hopefully, decisions made are smarter, such that it pays to build reflective programs.
Three points were raised in this chapter: The controversy declarative-procedural, the distinction between types of knowledge, and the problem of control. Now, we are going to discuss specific KR languages, starting with logic.
Chapter 3: Logic-based KR languages
1. What is a logic?
Logic can be defined as the study of correct inference. A correct inference must be truth preserving. Thus, logic is the study of truth preserving inference.
Further explanation (introduction to logic): see p. 32-33.
1.1 An example logic: classical first-order predicate calculus
1. Set theory
2. The syntax of FOPC
3. The semantics of FOPC
4. A proof theory for FOPC
5. Soundnes, completeness and decidability
2. Logic-based KR languages
Mid ‘60: rise of problem solvers based on logic, after the work of Robinson (resolution theorem prover). But it quickly turned out that the enterprise was not manageable.
Now, since the development of Prolog, the standing of logic for KR has improved. Logic is again a fundamental part of AI.
2.1 Building a logic-based inference engine
We are concerned here with a resolution refutation theorem prover for FOPC, because of three reasons:
· they were among the first ones developed
· many techniques used here are also used in other types
· they are even today the most widely used.
(introduction to refutation and resolution: p. 45)
(note: Reichgelt discusses here the full process of translating ordinary logic expressions to Horn Clauses)
4. The problem of control again
How to avoid search problems? One way of doing is to make the language less expressive, such as happens in Prolog. There, clausal form is restricted to Horn Clauses (which have at most one positive literal) because:
· the nonnegated literal is always first in a clause
· the goal to be solved is always expressed negatively (refutation)
· therefore, you only have to try one literal per clause
Another way of doing is to hardwire a control strategy, but this approach has several weaknesses:
· difficult to implement domain-specific heuristics
· different tasks require different control strategies
· it becomes less clear whether the same logical sentence can be used in more than one way
Besides explicit declarative control regimes, these problems can be tackled by the approach of Genesereth and Smith: in-built control can be overwritten for particular sentences in the knowledge base (is both flexible and efficient).
3. The arguments in favor
3.1 Logics have a semantics
Why is semantics important?
· you can determine exactly what an expression in the language means
· it enables you to check the soundness of the inference rules
Why should you prefer the formality of logics? There are alternatives:
· pretend-it’s-English: meets counterarguments! Vagueness and difficulty with determination of what follows from a given piece of knowledge.
· Procedural account of meaning: meets counterarguments too! Impossible to prove the soundness of inference (serious problem, since that was the main reason for having a semantics). A further problem is that the control strategy becomes part of the meaning of an expression.
Logic is an important tool for these reasons. Newell even claims that logic is the only appropriate language at the knowledge level, which he distinguishes from the symbolic level.
This argument has two parts: that logic can deal with incomplete knowledge, and that there are many logics to choose from.
An example of the first is Moore’s three-blocks problem. No other KR language can deal with such situations.
Concerning the second part of this argument, there are temporal logics, epistemic logics (logic and belief) and others.
Apart from the declarative nature of logics, there are two advantages of logics: its having a semantics, and its expressiveness.
4. Problems and solutions
4.1 Logic is too inefficient
This argument stems from the initial success of problem solvers and the failure thereafter. But it becomes clear that theorem provers in those days used very inefficient control strategies. Prolog, for example, does a better job (?).
4.2 The undecidability problem
It can be proven that FOPC is semidecidable: it might be unable to tell you that a query does not follow from the knowledge base.
In fact, the problem is more general. The semidecidability is the price you pay for expressive power; for example, simple propositional calculus is decidable. Furthermore, one could rely on heuristics for deciding when to stop searching (if the heuristic leaves the soundness intact).
4.3 Procedural knowledge
Some problems are easier expressed in procedural language. It seems impossible for logic KR languages to to this.
In Prolog, this problem was solved by embodying procedural attachment (Y is X+Z),But a drawback is of course that values for X and Z must exist, and that the semantics become less clear.
4.4 Default reasoning
Following McDermott, some important types of reasoning cannot be done with logics: default reasoning and abduction.
Default reasoning is the drawing of plausible inferences on the basis of less than conclusive evidence in the absence of information to the contrary.
The main problem with default reasoning is its nonmonotonicity: the truth value of some expressions can change when new information is added.
How to cope with the problem that normal logic does not capture default reasoning? Development of other types of knowledge is one way of doing.
1. Default and nonmonotonic logics
Mainly two approaches exist here: default logic and circumscription.
Default logic has defaults, which are inference rules: For any x, if it can be deduced tha x is a bird, and it is consistent with the theory that x can fly, then infer that x can fly.
Circumscription says: a bird can fly if it is not abnormal (thus defaults are sentences in tha language itself here). We assume that things circumscribed as abnormal are the only things that are abnormal.
Although these seem useful ways of doing, there are four problems with the approaches of default logic and circumscription, of which the first is a fact and the others are controversial:
1. Consistency is undecidable; the use of heuristics would cause loss of soundness in these cases because it is possible that some sentences are viewed as consistent when they are not.
2. These logics might not have an adequate semantics. But Moore has presented autoepistemic logic, which has a clear semantics and is equivalent to default logic. And Shoham proposed to use a preference relation < between interpretations. But this approach is new and needs further work.
3. Hank and McDermott’s Yale Shooting Problem, a special case of the frame problem, challenge the argument that default logic can capture commonsense reasoning. But this point is still controversial.
4. (not important: argument of Perlis)
2. Reason maintenance system
The problem of inconsistency arises when a knowledge base can change. At first sight, the problem would be avoided by making reasoning systems recollective (not storing derived sentences). This is correct in theory, but does not work in practice.
A second way to avoid the problem is by retracting contradictory sentences. Systems that act in this way are called Reason Maintenance Systems (RMS).
RMS’s include knowledge about why a certain sentence was derived, called a justification. Sentences which are assumed to be true are said to have premise justification. If a contradiction arises, and a sentence without premise justification is withdrawn, at least one of the justifying sentences will have to be withdrawn too (backward belief revision). In addition, forward belief revision is the recalculation of belief in sentences that depend on a retracted sentence.
A proposal to let the system decide which sentences to withdraw stems from McAllester. He divides sentences into likelihood classes, and when a contradiction arises, the sentence of the lowest likelihood class is retracted first.
However, there are some efficiency problems with the basic RMS. An example is the unouting problem: the problem of repeated recalculation if a sentence is asserted, retracted, reasserted and so on. Doyle solves this problem by giving each sentence a support status, which can be in or out (not the same as belief that it is not true). Retracting a sentence is simply changing this status. He also extends the basis RMS in the following way. Out-lists are also of importance for justification of default inferences. If the out-list is not empty, the justification is nonmonotonic.
3. Default reasoning and RMS’s
There are many difficulties with the simple RMS, which are dealt with in de Kleer’s assumption-based truth maintenance system. Following de Kleer, it is not entirely clear to what extent using an RMS is an essential departure from the use of logic alone for KR.
Abduction (term coined by Peirce) is inference to the best explanation. If you know that (1) if it rains, the streets are wet, and (2) the streets are wet, then you could conclude that it might be raining.
There is a distinction between abductive inference and backward chaining use of inference rules. In the abductive case, X is concluded true, while in backward chaining, it is set up as a new goal to prove.
Abduction is an important type of inferencing, although it is not itself a form of logical inference. But following Pople, abduction is essentially like deduction: in deduction, you try to determine whether a statement is true, while in abduction, you determine why it is true. And answering the why question implies answering the whether question.
The problem is solved by positing a particular component that will generate hypotheses, which task it is to find the most plausible ones.
(last few rules: according to Bern Martens: not true anymore).
Chapter 4: Production Rules
1. Production rule systems
Production rules are useful for modelling pattern-directed inference. Pattern-directed inference systems have three components:
1. Working memory elements (knowledge acquired about the problem)
2. Pattern-directed modules (each consisting of a triggering pattern and an instruction to be executed; the result of an action is a change of working memory elements; usually, they are called “rules”, and this component is the rule base).
3. Interpreter (to solve the control problem, by deciding which instruction to execute). The interpreter goes through three stages: retrieval (resulting in a conflict set), refinement (conflict resolution) and execution.
1.1 Working memory
The working memory has two sources:
1. The reasoning that the system has performed so far
2. The user (treated as an external database)
Most production languages store elements in the working memory as object-attribute-value triples (e.g., <car, color, red>) or attribute-relation-value triples (e.g., <temperature, greater-then, 20>.
1.2 The rule base
This type of memory stores permanent knowledge. Production rules are also called IF-THEN rules. The IF-part is called the antecedent, or left-hand side, the THEN-part the consequent or right-hand side.
There are three main differences between production rule systems:
· The actions allowed in the execution part of the rule
· The possibility of allowing variables in both sides
· The number of statements allowed in the right-hand side of a rule.
Concerning the first distinction, the most simple systems allow only to add or replace elements in working memory. More complicated systems allow deletion or even execution of an arbitrary piece of code.
[Remark: the claim that production rules are nothing more than implications in FOPC, is at least misleading: the action part may specify actions whose logical status is less than clear]
The second difference is the possibility of variables. Production rule systems that do not allow variables at both sides are not very expressive. Most systems, however, do allow variables at both sides. This requires matching before execution of a consequent.
The third difference concerns the number of statements allowed in the right-hand side of the rule. This is closely related to 1.3.
1.3 The interpreter
The interpreter has to solve the control problem. Two types of control can be distinguished: global and local control (Jackson, 1986).
Global control is whether the system tries to match the left-hand side or the right-hand side of a production rule against working memory.
With the right-hand side, this can be more easily done by systems that allow only one statement at the right-hand side (right-hand side because this can be seen as a goal statement, while the left-hand side is a set of subgoals to fulfill in order to prove the goal). The problem-solving behavior is then goal reduction, or backward chaining.
With the left-hand side, the problem-solving behavior is forward chaining, or data-driven. This is a more natural way of doing, since with backward chaining, the interpretation of entities in working memory becomes problematic.
Local control implies decisions about which rule to choose if more than one rule matches the content of working memory.
· A simple strategy is to rely on textual order: take the first rule that matches. An advantage of this approach is that it can speed up the system.
· Another strategy is called refractoriness, which states that the same rule should not be applied twice to the data (or the same rule instantiation, when working with rules embodying variables). This strategy prevents loops when rules are applied that do not change their own antecedents.
· Still another is recency: apply the rule that matches against the most recent working memory element. This causes the system to pursue one line of reasoning first, only backtracking when it fails.
· A final strategy is specificity: prefer the more specific rules (in instantiation or number of antecedents). This causes rules that apply to exceptions to be applied first, before the general cases.
2. The representation of uncertain knowledge
Experts do not use hard and fast rules, so we do not expect that from an expert system either. A lot of work has been done in the domain of uncertainty representation. In two respects, the way of uncertainty representation is different from the default reasoning uncertainty handling: it is approached in a quantitative way, and with a parallel uncertainty inference process.
An example is MYCIN, which uses the Bayesian theorem for calculating the probability of some proposition depending on other propositions. But probability theory has some disadvantages:
· One needs a lot of data [PD: yes, but if the used statistics are sufficient, you cannot do any better]
· The probability axiom A + not(A) = 1 is unnatural [PD: nonsense, this unnaturalness is caused by human reasoning biases]
Therefore, MYCIN used an uncertainty handling mechanism known as the certainty factor model. The CF(P)=MB(P) - MD(P), which yields a result between -1 and 1, because MB and MD both are assigned values between 0 and -1. Now each rule R has a tally called CF(R), which indicated how strong the system believes the consequent of R given evidence that R’s antecedent is true. If we conclude P on the base of rule R with antecedent A, CF(P) = CF(R)*CF(A), where CF(A) can be calculated as the minimum of the certainty factors associated with the conditions.
Another approach is the Dempster-Shafer theory. This starts with a frame of discernment F, which is a set of mutually exclusive and exhaustive hypotheses. Its power set P (the set of all subsets of F) has elements which are assigned a probability value. Now based on incoming evidence, these probability assignments change.
The model of Dampster-Shafer has advantages over the CF model:
· It has a sound mathematical foundation
· It allows to assign uncertainty measures at the right level.
· One is not forced to assign a precise certainty value to each hypothesis (allows meta-uncertainty)
Still another approach is fuzzy logic. But the numerical uncertainty calculi have come under attack because of these reasons:
· Uncertainty is in fact metaknowledge, which is obscured by the calculation approach
· Humans do not naturally use numerical values
· Humans are not good at dealing with probability measures (experiments of Tversky and Kahneman) [PD: so what? If you can do it with calculation, why give that up?]
Therefore, other theories have been developed:
· Fox qualitative uncertainty calculus (possible, probably,...)
· Cohen’s endorsment representation (reasons for disbelief and belief in statements)
3. An example
See text (based on work of Young and O’Shea, who were interested in errors that 10 year old children make when they subtract two two-digit numbers from each other)
4. The advantages of using production rules
4.1 Naturalness of expression
Real experts rely on rules of thumb rather than on textbook knowledge. Therefore, expert knowledge can be expresses as a set of production rules. Besides, individual rules divide the knowledge into natural and readily intellegible chunks (right grain size).
Several architectural properties of production rule systems lead to a very modular organisation of the knowledge base:
· Separation of permanent and temporary knowledge
· Rules are structurally independent
· The interpreter is independent from the knowledge
This modularity has advantages:
1. Facilitates construction and maintaining of rule bases
2. Makes incremental construction possible
3. Easy to debug: possibility to track down which rules lead to wrong answers (because of their contents of because of the timing of their triggering).
4.3 Restricted syntax
Although this has some disadvantages concerning expressibility, there are positive consequences too:
1. One can write a program that can read and change a set of production rules
2. One can write a program that helps in the acquisition of knowledge from a domain expert
3. It becomes easy to write a program that translates a production rule in natural language
4.4 The problem-solving process
Production rules determine what to do next on base of the present state of working memory. This has advantages for the overall problem-solving process:
1. Minimal changes to working memory can lead to important focus-shifts in the process (especially when using a recency heuristic)
2. PR systems can pursue a large number of different reasonings at any one time.
So in fact, PR systems give you the best of two worlds.
The system should be able to hold an interactive dialogue. MYCIN is an example of a system that can do so. MYCIN, when asked “why” its asks a certain piece of information, will do two things:
· print the rule whose consequent it is currently trying to prove
· print the preconditions of the already established rules.
5. Disadvantages of production rule systems
5.1 The explanation problem
GUIDON, a system based on MYCIN, constructed as a tutoring program, failed because two types of knowledge have been omitted in MYCIN:
· The diagnostic approach (a kind of strategic knowledge, which is hard-wired in MYCIN and therefore cannot be explained by the system)
· Understanding of the rules (support knowledge, such as that it is the electricity that makes the rule “IF car not start THEN battery dim” true). Besides, support knowledge can sometimes be used to determine limitations of individual rules.
5.2 Efficiency problems: Determining the conflict set
Although the creation of a conflict set makes the problem-solving process very flexible, it may cause computational overload: both determining the conflict set and resolving it are computationally expensive.
A solution proposed by Forgy is the Rete-approach, based on two observations:
· many rules have conditions in common
· working memory changes only a little in each cycle.
The Rete algorithm constructs a tree-structured sorting network or index for the conditions in various rules, which will be compiled when a rule base is loaded. Furthermore, it keeps a list of which working memory elements were changed, such that successive calculations can be avoided if their elements do not change.
The Rete algorithm has also two minor disadvantages: first, compilation of the conditions of the rules may take a while. Second, rules constructed during a session give rise to a sorting network of their own, which reduces efficiency.
5.3 Efficiency problems: Conflict resolution
In large knowledge bases, the obtained conflict set may be too large to execute it fully. This is called “saturation” (Davis, 1980). There are various ways to overcome saturation:
· Refinement of the conditions, to make the conflict set as small as possible. A disadvantage of this approach is that the time needed to compose the conflict set grows, but this can be lessened by the Rete algorithm. A more serious problem is that modularity gets lost (rules depend on each other).
· Develop a mechanism that enables the system to cope with the problem, by pruning and reordering of the conflict set itself. An example is Davis’ approach, which uses meta-rules, explicit representations of pieces of strategic knowledge. Potential inefficiency due to meta-level overhead can be avoided by the technique of localization. His approach has a further advantage that strategic knowledge becomes explainable.
· Partition the system into relatively small sets of rules. An example are blackboard systems. In blackboard systems, the rule base is divided in different knowledge sources, which can communicate via the blackboard. The precise execution of the different sources is done by a global interpreter. Another advantage of these blackboard systems are the abillity to use multiple sources of knowledge.
5.4 Restricted syntax and expressibility problems
The restricted syntax causes some types of knowledge to become difficult or impossible to express:
1. Structural knowledge: if a patient has not undergone surgery, then he has not undergone neurosurgery either. Maybe this can buried in the rules.
2. Incomplete knowledge, especially negative and disjunctive knowledge. Negative knowledge can take two forms: not(car, red) and not(x, red) if (car, x). In OPS5, this type of knowledge can be expressed in the antecedent, but not in the consequent of a rule. Clever ways to solve this problem result in obscure rules.
Disjuntive knowledge too can only be encoded in the antecedent (by simply formulating two rules), but not in the consequent of a rule.
5.5 Is incremental construction of a knowledge base really possible?
Production rules have often been written with a conflict resolution strategy in mind. Adding new rules may cause therefore all sorts of problems.
For example, modularity gets lost if rules become dependent on each other.
Production rules are still very popular for constructing expert systems. They have three large advantages: they are modular, provide natural expressions of knowledge, and have a structured problem-solving process. But disadvantages are the inefficiency problems in large rule bases, the restricted syntax and the doubt whether incremental construction is really possible.
Chapter 5: Semantic nets
An important feature of human memory is the high number of associations between pieces of information. The topics of chapter 5 and 6 deal with representations for knowledge that are based on this intuition.
2. What are semantic nets?
There are two types of primitives: nodes and links. Nodes correspond to objects, or classes of objects, in the world, whereas links correspond to relationships between these objects. They are often labeled to indicate their meaning. No information is stored in a node as such; this is done with the links associating nodes.
Quillians original work was taken up by psychologists, but also work done in linguistics was important for the development of the semantic net approach.
2.1 Representing information in semantic nets
Quillian distinguished between type and token nodes, corresponding to the two ways in which a word can occur in a dictionary (as entry or as part of an explanation).
Quillian distinguished furthermore between five link types:
1. A is subclass of B
2. B modifies A
3. Conjunction of a number of nodes
4. Disjunction of a number of nodes
5. B is subject of the relation A which has as object C
(see p. 118 for a figure with an example)
Quillian did not have problems with the circularity caused by the fact that he did not set up primitives. Others took this primitive word approach (see further).
A second important source was the development of Case Grammar (Fillmore). He distinguished between a surface structure and a deep structure in each sentence, related by a number of transformations. The deep structure has a modality (tense, mood) and a proposition (a verb with a number of cases, i.e., roles to be filled in by other parts of the sentence).
2.2 Inferencing in semantic nets
The basic inference mechanism consists of following links between nodes. There are two types of strategies:
· Spreading of activation (or intersection search)
The first type tries to find a concept that is related to the words to be related by the inference. Each node can have an activation tag during this process. When a new node N is reached following a link from A or from a certain C that can be reached from A, three things can happen:
1. There is already an activation tag mentioning word B associated with N: in this case, N is a basis for comparison.
2. There is already an activation tag mentioning word A: no new tag is created
3. There is no tag: an activation tag is created.
Also here, the problem of control returns. In Quillians’ approach, this is solved by bidirectional search (first from A, then B, then A and so on). But there are other possibilities.
Not all associative network representations use spreading of activation: in frames, for example, this is never used. But inheritance is. Inheritance consists in deriving properties of subclasses from superclasses.
Some researchers do not distinguish between Inst links between tokens and their types and the Sub link between classes and their superclasses. This is a mistake; you have to. Furthermore, inheritance seems a good way of dealing with default reasoning (but not without controversies, see further).
3. Conceptual dependency theory
Schank developed this theory to avoid the circularity problem of Quillians’ primitive-less approach. He made a distinction between six categories: PP (a real world object), ACT (a real world action), PA (attributes of objects), AA (attributes of actions), T (times) and LOC (locations). These categories can be related, causing a number of link types to come to play. This distinction is at the epistemological level (see chapter 1). The actual primitives are the bussiness of the conceptual level. An example are the 14 different types of ACT that Schank distinguishes.
The advantages of a logic-based KR language using a small set of primitives are:
· Fewer explicit inference rules are necessary.
· Some inferences are already in the representation (canonical form of a sentence used to decide whether two sentences are paraphrases of each other)
But of course, there are drawbacks too:
· Translating everything into primitives is a lot of work
· It is difficult/impossible to find the right set of primitives
· A lot of inferential work remains to be done (e.g., whether A logically follows from B)
· Primitive representations are often complex and need a lot of storage space.
4. The logical foundation of semantic nets
In 1975, Woods published an influential paper that criticised then-current semantic net formalisms. One of his points is that nobody has made clear what the semantics of semantic nets really are. Some nets (see figure p. 126) can be interpreted in three different ways: the concept of a black telephone, an actual black telephone, or an assertion of the relation between telephones and blackness. This causes us to distinguish between two kinds of information:
· Definitional information: this would remain true even if the world changed
· Assertional information: this may change as the world changes
The absence of semantics is also reflected in problems with ISA links. Fig. p. 127 can be interpreted in at least four ways.
· One relevant factor is the intended meaning (intensional or extensional, or description of objects versus standing for objects).
· A second important factor is whether ISA means sentence forming or concept forming.
· A third factor is the way inheritance is used (all or only those properties inherited who are not explicitly cancelled; usually the latter, but this can lead to problems with three-legged elephants whose three-leg property is cancelled).
· (Fourth factor not in text?)
Another shortcoming is the fact that semantic networks are unable to represent propositions without commitment to their truth value, which is necessary to deal with belief-sentences. Furthermore, they cannot represent things without stating that they exist in the world (Mike knows versus dials Pats telephone number; what if Pat an Mary have the same number?). Some proposals to deal with Woods’ criticisms are described below.
5. Extending the expressive power of networks
5.1 Partitioned networks
The basic idea is to partition the network into spaces, which can be considered nodes too and thus be related with links (hierarchical organisation).
Hendrix gives examples of their utility:
· Language processing (syntactic subparts)
· Belief of one person in one space
· Representation of logical connectives and quantifiers
Partitioned networks have their difficulties too:
· Extensionality versus intensionality of nodes, and the status of the links is not always clear (assertional or definitional)
· Representation of arbitrarily quantified propositions: some nodes correspond to propositions, others to entities. Also, partitioned networks become large structures, such that the naturalness of simpler networks is given up.
5.2 The SNePS networks
(not in course)
6. KL-one, separating assertional and definitional information
The second point of Wood is the status of the links: they represent ananalytic or synthetic information. KL-one adresses this problem.
The basic elements of KL-one are structured conceptual objects, or concepts for short. They are intensional entities. KL-one distinguishes between generic and individual concepts. A description of an object must account for its internal structure, for which task roles are used. Roles correspond to two-place relationships in logic. Structural descriptions are included because quite often roles are interdependent. Figure p. 136 attempts to clarify this. Arcs can get the label v/r (value restriction).
The inference mechanism is structured inheritance, indicated by a emboldened link. Since the definitional part deals with necessary properties only (the rest is left for the assertional part), inheritance is straightforward. Assertions are made relative to a context, so they can be used to reason about beliefs, desires and so on.
The existence of something can be asserted with a description wire to a nexus within a context. The advantage of KL-one is its clear epistemological basis. Because it distinguishes between a definitional and an assertional part, it avoids some problems Wood describes. But it does not solve all problems: the assertional part was not fully developed, and it is not clear how to represent arbitrary quantified sentences.
7. The semantics of semantic networks
One has tried to provide for model-theoretic semantics of semantic networks. The easiest way is to try to translate them to some logic.
Within KL-one, this can be done by stating that i corresponds to the logical constant i, whereas G corresponds to the one-place predicate G. Various links can be translated in this way. This translation is dependent upon two things:
· the nature of the originating concept
· the nature of the destination concept
(see p. 140 for examples).
There are two problems with this translation:
· If a concept represents an action, the resulting logic sentence is unnatural. This is due to the particular translation mechanism, other mechanisms do not have this problem.
· With use of inheritance, inconsistent logic sentences may result. This can be solved by translating to Reiters default logic, but the question is still open whether this is an acceptable representation.
Semantic nets have been superseded by frame-based representation languages.
Chapter 6: Frame-based Representation Languages
Intuition about human memory: structured chunks of knowledge, more than there is in logic (Minsky) and also the declarative and procedural aspects of a given chunk must be tightly interconnected. Minsky introduces the frame concept for this, which is based on the schema concept of Bartlett (1932). The type of reasoning in frame systems is often recognition, the comparison of new objects by known ones.
2. What are frames?
Frames are structures that represent knowledge about a limited aspect of the world. Information is stored by associating descriptions with it (called “slots” and consisting of two parts: a name and a value or restriction of values).
A natural way of describing an entity is by comparison with a known entity. Therefore, frames have a hierarchical structure.
Most frame systems distinguish between class-frames and instance frames. A class frame can have subclasses below it. Instance frames are instances of class-frames.
Some situations can be seen from more than one perspective (e.g., kissing as a physical act vs. as a social act). This can be represented by allowing super links to connect to more than one other frame. In Minsky’s original conception, some slots cannot be overridden by other values when inheriting. This property is mostly lost in other frame systems.
A special kind of frames are scripts, which are prototypical sequences of events. Recently, these scripts are said to consist of MOPs (Memory Organization Packets), because scripts did overlap in specific subsequences of actions.
There are different kinds of slots:
· Attribute-value pair: can have different kinds of filler, like a primitive of the language or a pointer to another frame in the knowledge base.
· Restrictions on the values of a slot, such as logical connectives, functions or predicates, an instance of another class-frame.
· A procedure, which can be a servant (if it is possible to compute the value when needed) or a demon (which is called whenever the slot value changes). This type of representation is called procedural attachment.
· A superclass slot (for class frames) or a member-of slot (for instance frames)
4. Reasoning with frames
In order to retrieve information, the frame system must perform two kinds of reasoning: matching and inheritance.
As a first step in the reasoning process is to discover which frames can be applied to the current situation. Matching in frame systems is more difficult than for example in production rule systems or in FOPC, because (1) frames are complicated structures; (2) slots can have default properties and (3) perfect matches are rare (whereas in the other systems, it is clear-cut when one has a match or not).
For an example of matching, see p. 152.
The system will try to return a value for a slot even when it is not provided. The superclasses of the frame are searched. This can happen with breadth-first or with depth-first or with other search methods. Depth-first is more efficient, breadth-first will always return the most specific value. Another strategy is depth-first up to join.
[PD: See Winston, chapter 9, for a better discussion on how inheritance can be achieved]
5. Advantages of frame-based knowledge representation languages
Many of the advantages are at the epistemological level.
A first advantage is that frame-based KR languages capture the way in which experts think about their domain.
A second advantage is that domain experts also use comparison with known things for describing new ones.
A third advantage is the taxonomic structure of the inference mechanism (inheritance). For many applications, frame systems strike the right balance between expressive power and efficiency.
A fourth advantage is default reasoning. Furthermore, contrary information, if it exists, is stored with the object and thus retrieved first. It can be decided that this information is present, so undecidability does not arise here, although we will sometimes have to conclude that we do not know the answer. [PD: not with Winston-algorithm, chapter 9]
6. Problems with frame-based KR languages
6.1 The lack of semantics
Hayes attempts to fill this gap by defining an algorithm to translate KRL expressions to FOPC. For an example how this happens, see p. 159. Mainly:
· an instance frame corresponds to a logical constant
· a class frame corresponds to a one-place predicate
· restrictions on attribute value slots are expressed with existential quantification
· slots of class frames are unversally quantified
This approach has three problems:
1. Procedural attachment: this could be tackled by providing a translation for the procedures into logics as well, or by giving up the enterprise and handle the procedural aspects in another way (like Prolog does).
2. It is not clear how one could distinguish between member and own slots (in KRL, there was no such distinction and therefore no such problem). Own slots in class frames cannot be dealt with in Hayes’ proposal. Nado and Fikes propose to introduce class constants, but this complicates the logic, and moreover they do not describe the model-theory of their system.
3. (?) Both systems go against intuition that knowledge is organised around prototypes. Both Hayes and Nado and Fikes propose solutions, but these face difficulties. The proposal of Hayes, to represent references to the state of the knowledge base explicitly, meets three problems: (1) the intuitive meaning of KB entails ‘P’ is not clear, (2) we need a self-explaining system as language and (3) the system can give wrong solutions when two default rules are present. Nado and Fikes translate to default logic, and this solution therefore depends on the quality of default logic, which is not clear as well.
6.2 The perils of inheritance
Brachman rises a number of questions concerning inheritance:
1. What is not being inherited when a class frame or instance frame does not inherit a slot of a superclass? Do you cancel the slot itself, or do you simply cancel the value? If one wants to cancel the slot, problems arise, because then you do not distinguish anymore with frames that did not posses the slot before.
2. Lack of definitional capability: Clyde could be a three-legged elephant with only two legs. why would one define things anyway? Following Putnam, most natural kinds cannot be defined. But: (1) in mathematics, definitions exist and are important, (2) even in real life, definitions can be useful or necessary.
Not all frame-based KR languages use only default inheritance. UNITS, for example, has four types of inheritance roles which can be attached to slots:
1. S-role: necessary for all subclasses
2. R-role: restrictions on values of subclasses
3. U-role: not inherited
4. O-role: optional role (M-role: inheritance)
This allows for the definition of new composite concepts.
6.3 Expressive limitations
· Disjunctions between values of different slots cannot be expressed in frame-based languages.
· existential knowledge not in the scope of a universal quantifier is a problem (there is a murderer in the detective story, but you do not know who it is. In frame-based systems, an instance frame for the murderer would be distinct from all suspects’ frames). UNITS has a solution to this problem: indefinite nodes, with an anchor slot (indicating which instance it corresponds to) and a co-reference slot (which can be filled with other indefinite nodes, to express that they are equal).
7. Object-oriented programming
(not in course)
Chapter 7: Mixed Representation Formalisms
It seems attractive to combine the advantages of different KR languages and to avoid their disadvantages. So-called mixed representation languages attempt to do so. They have certain advantages of their own:
· The system may be computationally more efficient
· The system may offer naturalness of expression
· The system may extend the expressive power of any other KR language
· (maninly KRYPTON) The system may enable you to represent different types of knowledge in different languages (epistemological clearness).
There are various hybrid systems. An example is Cake (mixture of programs and predicate calculus). Others are KRYPTON, NIKL and KL-TWO (all derived from KL-ONE), and ART, KnowledgeCraft and KEE (the so-called AI Tool Kits).
2. AI Tool Kits
These commercial systems offer mostly an OO-programming language and a rule-based representation language, and mostly a logic-based language. These are more or less integrated.
Most AI Tool Kits are Lisp implementations. Therefore, the user can change most aspects (not so in KRYPTON). These systems have the following advantages:
1. They give a lot of freedom and power
2. They provide a good development environment
This also causes problems: Much freedom causes large differences in programmin style, which makes maintenance a problem.
KRYPTON grew out of KL-one and has two main assumptions:
· a functional view of knowledge
· the distinction between assertional and definitional information
The functional view of knowledge representation stems from computer science (see Synthesis LISP course, about abstract data structures). For knowledge bases, abstract operations are TELL and ASK. TELL adds knowledge to the knowledge base, ASK can return three possible answers: yes, no or unknown (the situation is more complicated when “wh” questions are asked). This functional view goes against the spirit of some object-centered representation languages by making implementation accessible to user [PD: I thought this style was developed for avoiding to do so].
The second assumption states that one needs to distinguish between assertional and definitional information. KRYPTON has two separate components: ABox (Assertional) and TBox (Terminological). Like in KL-one, concepts and roles are formed by combining or restricting other concepts or roles. Special primitives, like ConGeneric, can be used for defining new primitives.
The ABox uses a language comparable to FOPC, for dealing with incomplete knowledge. But here, the TBox specifies what primitive expressions exist. Thus, one can distinguish between assertions and definitions.
The model-theoretic semantics of KRYPTON are defined by Brachman, Gilbert and Levesque. The TBox’ semantics are defined as follows.
· Primitive concepts: set of individuals
· Roles: two-place predicates
· Defined concepts: in terms of primitive concepts
For the ABox, it is relatively straightforward to define semantics: this rests on FOPC.
The theorem prover cannot be a standard one. There are two approaches to handle this:
· Translation of definitions in the TBox into ABox. But there are two reasons to leave this one: the distinction between definitions and assertions would vanish, and the translation will slow down the theorem prover.
· Develop a theorem prover (Stickel). The actual theorem prover is based upon the connection graph proof procedure of Kowalski. Normal resolution is extended in two ways: (1) extension of the unification algorithm and (2) conditional resolution (resolution with “a rest”, a piece of information to prove yet)
KRYPTON also embodies other useful techniques, such as inference during building.
A problem is that KRYPTON cannot deal with default reasoning. Only the language in the ABox could be extended to do so, but this would cause changes in the inference mechanism.
Chapter 8: Parallel Distributed Processing
(not in course)
Chapter 9: Conclusion
In this chapter, we discuss some open problems and research issues in KR.
2. Analogical representation
Most representations in this book are “Fregean”, as opposed to analogical representations (Sloman). Many psychologists have argued that human KR is analogical: Kosslyn, Johnson-Laird, Gelernter, Funt.
Analogical representations have various advantages:
· Problem solving is easier (example: route finding on a map)
· It makes certain features self-evident (example: closeness of two blocks)
· One can use commonsense knowledge without formalizing it.
But there are problems as well:
· They are often not general
· They are to a large extent conventional
· They need difficult an complicated support machinery
3. Single or multiple KR languages
One may regret the abundance of KR languages, but Sloman argues that this abundance is useful. However, there are difficulties when using multiple languages: for example, when to use which language and at the same time avoid ad hoc programming? KRYPTON tries to do this by distinguishing between definitional and assertional languages, but the language is not without its difficulties. Another way is Chandrasekaran’s work, which decomposes complex tasks into generic tasks. Wielinga and Breuker present more or less the samen with their KADS methodology.
An example is Clancey’s heuristic classification. He observes that experts go through three different stages:
· abstraction from the data
· heuristically matching to a class of solutions
· Refinement of the solution abstraction into a proper solution
This can be viewed as generic tasks.
A second problem is that it would be necessary to reimplement the knowledge each time it will be used in another way. There are two ways around here:
· Deny that this is the case; simply chance the control knowledge base (wich results nowadays in an inefficient search)
· Store the knowledge in an application-independent format. But this approach meets three problems: (1) it does not explain how intelligent beings develop new representations, (2) it still requires an answer on the question which representation to use when and (3) the design of such an application-independent language is very difficult (all languages up to now have expressive limitations).
Connectionism goes against the knowledge representation hypothesis.
Both connectionists and KRH’s disagree with behaviourists. But with each other, they disagree on two points:
1. The existence of a knowledge base
2. The difference between representation and processing
There are three ways one could view a PDP model:
· An implementation for a traditional symbol processing theory
· Alternatives to traditional theories
· Proposal of a number of primitive operations (most interesting following HR)
Even if one of both theories proves wrong as a description of human intelligence, it can remain interesting from a technological point of view.
Despite the success of AI in certain areas, humans are still the most advanced intelligent systems that we know of, and a better understanding of human intelligence and KR will provide important insights that can be used to build artificial intelligent systems.
[PD: Or: there remains work to do for cognitive scientists]
Tip: the left pane works like windows explorer!