Knowledge Representation
Free synthesis of the book "Knowledge Representation" by Han Reichgelt. You may use it, print
it and distribute it provided you leave the text as a whole.
Knowledge Representation:
An AI Perspective
H. Reichgelt, 1991
Synthesis by Peter Doomen, 1997
Preface
Knowledge Representation (KR) is a major concern in AI, but no textbook exists. This book attempts to
fill the gap.
This book discusses KR languages: logic, production rules, semantic networks and frames.
With the advent of connectionism, a new style of computing has emerged. A separate chapter discusses
the connectionist approach.
Chapter 1: What is KR?
1. Artificial Intelligence and KR
KR is important for AI, in which one agrees upon only one thing; that AI programs involve symbolic information
processing with aid of represented
pieces of information.
The KR Hypothesis (Smith, 1982) states that any intelligent process contains structural ingredients
which observers naturally take to represent the
knowledge that the process exhibits. Thus, any intelligent system has a knowledge base, which knowledge
is manipulated by an inference engine.
Example: experte systems (most famous: GPS of Newell and Simon).
This KR hypothesis was challenged by the connectionist approach. But this is deferred to chapter 8.
While deep philosophical questions about KR spring to mind, they are not the subject of this book. We
discuss strong and weak points of KR
systems. An important question, that of what knowledge is (true justified belief), makes it more appropriate
to talk about information representation
systems, but we stick to the term KR systems.
2. Two aspects of a KR language
Two aspects of every KR language are the syntactic and the inferential aspect.
A KR is a formal language, therefore, syntax is important. But apart from this formal aspect, every
KR language has an inferential aspect, which is
done by the interpreter.
Two remarks must be made here. First, inference rules are independent of the actual knowledge in the
knowledge base. Second, inference rules are
not restricted to logical-based KR’s (see chapter 6).
Both aspects are closely related. The primitives of the language are those things that the interpreter
is programmed in advance to understand (“CAR”
in Lisp). But the distinction remains important: two arguments have been used to defend one KR above
another: the simplicity with which to represent
information, and the power of the inference mechanism.
3. Four levels of KR
(Note: this is based on Brachman, 1979. His fifth level, the linguistic level, is left out.)
1. The implementational level: How is the KR build into the computer? Which indexing mechanisms are
used?
2. The logical level: What are the meanings of expressions in the formalism? Which expressive power
do the inference rules have?
3. The epistemological level: Which type of primitives are needed for this KR? Which types of inference
rules?
4. The conceptual level: Which primitives are actually needed?
Confusion can be avoided by separating these four levels.
4. Requirements of KR languages
1. At the implementational level, the main concern is efficiency (space and time).
2. At the logical level, one is concerned with two things: the syntax (Frege’s compositionality principle,
which states that the meaning of a compound
expression should be derivable from the meanings of its parts) and the soundness of the inference rules.
3. At the epistemological level, there are four main concerns: If there exists a natural way of organising
knowledge, the KR language has to respect
this; the KR language has to be modular; attention to the granularity (=size of the information chunks)
is important; and the language should support
the actual primitives of the conceptual level.
4. At the conceptual level, one is concerned with how concisely one can represent particular pieces
of knowledge.
5. Outline of the book
See p. 11.
Chapter 2: Some General Problems in KR
1. The declarative versus procedural controversy
Declarative=knowing that
Procedural=knowing how (Winograd).
Arguments against procedural view:
·
Requires specification of how a piece of information is used
·
No separation between information and the procedures to use it.
·
Difficult to change (nonmodularity)
Arguments pro procedural view:
·
Seems naturally for many types of knowledge (e.g., adding of two numbers)
·
Metaknowledge is more easily represented procedurally
·
Procedural implementations work, declarative systems not (1975).
But these arguments are somewhat weakened by their age:
·
Declarative systems now work too (expert systems).
·
Epistemic logic challenges the view that metaknowledge is more easily represented procedurally
·
Economy of storage is important; declarative knowledge is better here (idea of structured programming).
However, also procedural knowledge can
be represented in a modular way, at least in principle (OO-programming).
Procedural representations maintain a computational advantage: they are faster than the declarative
counterparts. Conclusion: there is a trade-off
between flexibility and speed.
Summarising, procedural representations have two advantages: the naturalness of expression for some
types of knowledge, and their speed.
Declarative representations, however, are much more flexible.
2. Types of knowledge
The first type we distinguish is domain knowledge. This can be split up in two types:
·
Structural knowledge (type and structure of domain entities)
·
Relational knowledge (relationships between facts or between domain entities).
The second type is strategic knowledge (cf. traineeship: putting textbook knowledge into practice).
This distinction is somewhat abstract. Often, the types are intertwingled (knowledge elicitation phase
in development of expert system proves this).
3. The problem of control
Problem of control: what inference rules to use on what knowledge? Related to the problem of search
in AI. Control regimes are important with
respect to this.
3.1 Forward versus backward chaining
Forward chaining (normal way of doing): data-driven approach; bottom-up inference
Backward chaining: goal-driven approach; top-down inference
Each of them has disadvantages, but also strong points of course.
3.2 Depth-first versus breadth-first control regimes
Possible both with top-down or bottom-up approach; discussion here with top-down (easier to explain).
Depth-first and breadth-first: see Winston.
Advantage of depth-first is that it quicker leads to a solution; the advantage of breadth-first is that
an optimal solution will be found (shortest one).
3.3. Heuristic search
Sometimes impractical or unnecessary to explore all paths. How to get around this problem? Use heuristic
search to prune the search tree. This can
be used in three ways.
1. Choice of the operator: conflict resolution: generate subgoal that looks easiest to solve.
2. Conjunct ordering: with backward reasoning, ordering of the different antecedents to search through.
Generate subgoal that seems most difficult to
solve, because if it fails, you know that the whole enterprise will fail.
3. Pruning of the tree: resource-limited search. This type, unlike the other 2, may result in an incomplete
exploration of the tree. Example: impose limit
on number of inferences to make in each branch.
3.4 Explicit control knowledge
The problem of control is related to strategic knowledge. Declarative systems that embody control strategies
have been developed (TEIRESIAS,
MECHO).
Those representations have advantages over procedural approaches:
·
Clearness from an epistemological point of view
·
Domain-specific heuristics can be used.
·
One can use this knowledge for more than one task.
·
An explanation system can be developed.
Capacity to perform reflective tasks (needing metaknowledge) is useful, but still an area of research.
Usually, reflective programs work slower than
non-reflective ones. But hopefully, decisions made are smarter, such that it pays to build reflective
programs.
4. Conclusion
Three points were raised in this chapter: The controversy declarative-procedural, the distinction between
types of knowledge, and the problem of
control. Now, we are going to discuss specific KR languages, starting with logic.
Chapter 3: Logic-based KR languages
1. What is a logic?
Logic can be defined as the study of correct inference. A correct inference must be truth preserving.
Thus, logic is the study of truth preserving
inference.
Further explanation (introduction to logic): see p. 32-33.
1.1 An example logic: classical first-order predicate calculus
1. Set theory
2. The syntax of FOPC
3. The semantics of FOPC
4. A proof theory for FOPC
5. Soundnes, completeness and decidability
1.2 Conclusion
2. Logic-based KR languages
Mid ‘60: rise of problem solvers based on logic, after the work of Robinson (resolution theorem prover).
But it quickly turned out that the enterprise was
not manageable.
Now, since the development of Prolog, the standing of logic for KR has improved. Logic is again a fundamental
part of AI.
2.1 Building a logic-based inference engine
We are concerned here with a resolution refutation theorem prover for FOPC, because of three reasons:
·
they were among the first ones developed
·
many techniques used here are also used in other types
·
they are even today the most widely used.
(introduction to refutation and resolution: p. 45)
1. Skolemisation
(note: Reichgelt discusses here the full process of translating ordinary logic expressions to Horn Clauses)
2. Unification
3. Resolution
4. The problem of control again
How to avoid search problems? One way of doing is to make the language less expressive, such as happens
in Prolog. There, clausal form is
restricted to Horn Clauses (which have at most one positive literal) because:
·
the nonnegated literal is always first in a clause
·
the goal to be solved is always expressed negatively (refutation)
·
therefore, you only have to try one literal per clause
Another way of doing is to hardwire a control strategy, but this approach has several weaknesses:
·
difficult to implement domain-specific heuristics
·
different tasks require different control strategies
·
it becomes less clear whether the same logical sentence can be used in more than one way
Besides explicit declarative control regimes, these problems can be tackled by the approach of Genesereth
and Smith: in-built control can be
overwritten for particular sentences in the knowledge base (is both flexible and efficient).
3. The arguments in favor
3.1 Logics have a semantics
Why is semantics important?
·
you can determine exactly what an expression in the language means
·
it enables you to check the soundness of the inference rules
Why should you prefer the formality of logics? There are alternatives:
·
pretend-it’s-English: meets counterarguments! Vagueness and difficulty with determination of what follows
from a given piece of knowledge.
·
Procedural account of meaning: meets counterarguments too! Impossible to prove the soundness of inference
(serious problem, since that was
the main reason for having a semantics). A further problem is that the control strategy becomes part
of the meaning of an expression.
Logic is an important tool for these reasons. Newell even claims that logic is the only appropriate
language at the knowledge level, which he
distinguishes from the symbolic level.
3.2 Expressiveness
This argument has two parts: that logic can deal with incomplete knowledge, and that there are many
logics to choose from.
An example of the first is Moore’s three-blocks problem. No other KR language can deal with such situations.
Concerning the second part of this argument, there are temporal logics, epistemic logics (logic and
belief) and others.
3.3 Conclusion
Apart from the declarative nature of logics, there are two advantages of logics: its having a semantics,
and its expressiveness.
4. Problems and solutions
4.1 Logic is too inefficient
This argument stems from the initial success of problem solvers and the failure thereafter. But it becomes
clear that theorem provers in those days
used very inefficient control strategies. Prolog, for example, does a better job (?).
4.2 The undecidability problem
It can be proven that FOPC is semidecidable: it might be unable to tell you that a query does not follow
from the knowledge base.
In fact, the problem is more general. The semidecidability is the price you pay for expressive power;
for example, simple propositional calculus is
decidable. Furthermore, one could rely on heuristics for deciding when to stop searching (if the heuristic
leaves the soundness intact).
4.3 Procedural knowledge
Some problems are easier expressed in procedural language. It seems impossible for logic KR languages
to to this.
In Prolog, this problem was solved by embodying procedural attachment (Y is X+Z),But a drawback is of
course that values for X and Z must exist,
and that the semantics become less clear.
4.4 Default reasoning
Following McDermott, some important types of reasoning cannot be done with logics: default reasoning
and abduction.
Default reasoning is the drawing of plausible inferences on the basis of less than conclusive evidence
in the absence of information to the contrary.
The main problem with default reasoning is its nonmonotonicity: the truth value of some expressions
can change when new information is added.
How to cope with the problem that normal logic does not capture default reasoning? Development of other
types of knowledge is one way of doing.
1. Default and nonmonotonic logics
Mainly two approaches exist here: default logic and circumscription.
Default logic has defaults, which are inference rules: For any x, if it can be deduced tha x is a bird,
and it is consistent with the theory that x can fly,
then infer that x can fly.
Circumscription says: a bird can fly if it is not abnormal (thus defaults are sentences in tha language
itself here). We assume that things
circumscribed as abnormal are the only things that are abnormal.
Although these seem useful ways of doing, there are four problems with the approaches of default logic
and circumscription, of which the first is a fact
and the others are controversial:
1. Consistency is undecidable; the use of heuristics would cause loss of soundness in these cases because
it is possible that some sentences are
viewed as consistent when they are not.
2. These logics might not have an adequate semantics. But Moore has presented autoepistemic logic, which
has a clear semantics and is equivalent
to default logic. And Shoham proposed to use a preference relation < between interpretations. But
this approach is new and needs further work.
3. Hank and McDermott’s Yale Shooting Problem, a special case of the frame problem, challenge the argument
that default logic can capture
commonsense reasoning. But this point is still controversial.
4. (not important: argument of Perlis)
2. Reason maintenance system
The problem of inconsistency arises when a knowledge base can change. At first sight, the problem would
be avoided by making reasoning systems
recollective (not storing derived sentences). This is correct in theory, but does not work in practice.
A second way to avoid the problem is by retracting contradictory sentences. Systems that act in this
way are called Reason Maintenance Systems
(RMS).
RMS’s include knowledge about why a certain sentence was derived, called a justification. Sentences
which are assumed to be true are said to have
premise justification. If a contradiction arises, and a sentence without premise justification is withdrawn,
at least one of the justifying sentences will
have to be withdrawn too (backward belief revision). In addition, forward belief revision is the recalculation
of belief in sentences that depend on a
retracted sentence.
A proposal to let the system decide which sentences to withdraw stems from McAllester. He divides sentences
into likelihood classes, and when a
contradiction arises, the sentence of the lowest likelihood class is retracted first.
However, there are some efficiency problems with the basic RMS. An example is the unouting problem:
the problem of repeated recalculation if a
sentence is asserted, retracted, reasserted and so on. Doyle solves this problem by giving each sentence
a support status, which can be in or out
(not the same as belief that it is not true). Retracting a sentence is simply changing this status.
He also extends the basis RMS in the following way.
Out-lists are also of importance for justification of default inferences. If the out-list is not empty,
the justification is nonmonotonic.
3. Default reasoning and RMS’s
There are many difficulties with the simple RMS, which are dealt with in de Kleer’s assumption-based
truth maintenance system. Following de Kleer,
it is not entirely clear to what extent using an RMS is an essential departure from the use of logic
alone for KR.
4.5 Abduction
Abduction (term coined by Peirce) is inference to the best explanation. If you know that (1) if it rains,
the streets are wet, and (2) the streets are wet,
then you could conclude that it might be raining.
There is a distinction between abductive inference and backward chaining use of inference rules. In
the abductive case, X is concluded true, while in
backward chaining, it is set up as a new goal to prove.
Abduction is an important type of inferencing, although it is not itself a form of logical inference.
But following Pople, abduction is essentially like
deduction: in deduction, you try to determine whether a statement is true, while in abduction, you determine
why it is true. And answering the why
question implies answering the whether question.
The problem is solved by positing a particular component that will generate hypotheses, which task it
is to find the most plausible ones.
5. Conclusion
(last few rules: according to Bern Martens: not true anymore).
Chapter 4: Production Rules
1. Production rule systems
Production rules are useful for modelling pattern-directed inference. Pattern-directed inference systems
have three components:
1. Working memory elements (knowledge acquired about the problem)
2. Pattern-directed modules (each consisting of a triggering pattern and an instruction to be executed;
the result of an action is a change of working
memory elements; usually, they are called “rules”, and this component is the rule base).
3. Interpreter (to solve the control problem, by deciding which instruction to execute). The interpreter
goes through three stages: retrieval (resulting in
a conflict set), refinement (conflict resolution) and execution.
1.1 Working memory
The working memory has two sources:
1. The reasoning that the system has performed so far
2. The user (treated as an external database)
Most production languages store elements in the working memory as object-attribute-value triples (e.g.,
<car, color, red>) or attribute-relation-value
triples (e.g., <temperature, greater-then, 20>.
1.2 The rule base
This type of memory stores permanent knowledge. Production rules are also called IF-THEN rules. The
IF-part is called the antecedent, or left-hand
side, the THEN-part the consequent or right-hand side.
There are three main differences between production rule systems:
·
The actions allowed in the execution part of the rule
·
The possibility of allowing variables in both sides
·
The number of statements allowed in the right-hand side of a rule.
Concerning the first distinction, the most simple systems allow only to add or replace elements in working
memory. More complicated systems allow
deletion or even execution of an arbitrary piece of code.
[Remark: the claim that production rules are nothing more than implications in FOPC, is at least misleading:
the action part may specify actions
whose logical status is less than clear]
The second difference is the possibility of variables. Production rule systems that do not allow variables
at both sides are not very expressive. Most
systems, however, do allow variables at both sides. This requires matching before execution of a consequent.
The third difference concerns the number of statements allowed in the right-hand side of the rule. This
is closely related to 1.3.
1.3 The interpreter
The interpreter has to solve the control problem. Two types of control can be distinguished: global
and local control (Jackson, 1986).
Global control is whether the system tries to match the left-hand side or the right-hand side of a production
rule against working memory.
With the right-hand side, this can be more easily done by systems that allow only one statement at the
right-hand side (right-hand side because this
can be seen as a goal statement, while the left-hand side is a set of subgoals to fulfill in order to
prove the goal). The problem-solving behavior is then
goal reduction, or backward chaining.
With the left-hand side, the problem-solving behavior is forward chaining, or data-driven. This is a
more natural way of doing, since with backward
chaining, the interpretation of entities in working memory becomes problematic.
Local control implies decisions about which rule to choose if more than one rule matches the content
of working memory.
·
A simple strategy is to rely on textual order: take the first rule that matches. An advantage of this
approach is that it can speed up the system.
·
Another strategy is called refractoriness, which states that the same rule should not be applied twice
to the data (or the same rule instantiation,
when working with rules embodying variables). This strategy prevents loops when rules are applied that
do not change their own antecedents.
·
Still another is recency: apply the rule that matches against the most recent working memory element.
This causes the system to pursue one line
of reasoning first, only backtracking when it fails.
·
A final strategy is specificity: prefer the more specific rules (in instantiation or number of antecedents).
This causes rules that apply to exceptions
to be applied first, before the general cases.
2. The representation of uncertain knowledge
Experts do not use hard and fast rules, so we do not expect that from an expert system either. A lot
of work has been done in the domain of
uncertainty representation. In two respects, the way of uncertainty representation is different from
the default reasoning uncertainty handling: it is
approached in a quantitative way, and with a parallel uncertainty inference process.
An example is MYCIN, which uses the Bayesian theorem for calculating the probability of some proposition
depending on other propositions. But
probability theory has some disadvantages:
·
One needs a lot of data [PD: yes, but if the used statistics are sufficient, you cannot do any better]
·
The probability axiom A + not(A) = 1 is unnatural [PD: nonsense, this unnaturalness is caused by human
reasoning biases]
Therefore, MYCIN used an uncertainty handling mechanism known as the certainty factor model. The CF(P)=MB(P)
- MD(P), which yields a result
between -1 and 1, because MB and MD both are assigned values between 0 and -1. Now each rule R has a
tally called CF(R), which indicated how
strong the system believes the consequent of R given evidence that R’s antecedent is true. If we conclude
P on the base of rule R with antecedent A,
CF(P) = CF(R)*CF(A), where CF(A) can be calculated as the minimum of the certainty factors associated
with the conditions.
Another approach is the Dempster-Shafer theory. This starts with a frame of discernment F, which is
a set of mutually exclusive and exhaustive
hypotheses. Its power set P (the set of all subsets of F) has elements which are assigned a probability
value. Now based on incoming evidence,
these probability assignments change.
The model of Dampster-Shafer has advantages over the CF model:
·
It has a sound mathematical foundation
·
It allows to assign uncertainty measures at the right level.
·
One is not forced to assign a precise certainty value to each hypothesis (allows meta-uncertainty)
Still another approach is fuzzy logic. But the numerical uncertainty calculi have come under attack
because of these reasons:
·
Uncertainty is in fact metaknowledge, which is obscured by the calculation approach
·
Humans do not naturally use numerical values
·
Humans are not good at dealing with probability measures (experiments of Tversky and Kahneman) [PD:
so what? If you can do it with calculation,
why give that up?]
Therefore, other theories have been developed:
·
Fox qualitative uncertainty calculus (possible, probably,...)
·
Cohen’s endorsment representation (reasons for disbelief and belief in statements)
3. An example
See text (based on work of Young and O’Shea, who were interested in errors that 10 year old children
make when they subtract two two-digit
numbers from each other)
4. The advantages of using production rules
4.1 Naturalness of expression
Real experts rely on rules of thumb rather than on textbook knowledge. Therefore, expert knowledge can
be expresses as a set of production rules.
Besides, individual rules divide the knowledge into natural and readily intellegible chunks (right grain
size).
4.2 Modularity
Several architectural properties of production rule systems lead to a very modular organisation of the
knowledge base:
·
Separation of permanent and temporary knowledge
·
Rules are structurally independent
·
The interpreter is independent from the knowledge
This modularity has advantages:
1. Facilitates construction and maintaining of rule bases
2. Makes incremental construction possible
3. Easy to debug: possibility to track down which rules lead to wrong answers (because of their contents
of because of the timing of their triggering).
4.3 Restricted syntax
Although this has some disadvantages concerning expressibility, there are positive consequences too:
1. One can write a program that can read and change a set of production rules
2. One can write a program that helps in the acquisition of knowledge from a domain expert
3. It becomes easy to write a program that translates a production rule in natural language
4.4 The problem-solving process
Production rules determine what to do next on base of the present state of working memory. This has
advantages for the overall problem-solving
process:
1. Minimal changes to working memory can lead to important focus-shifts in the process (especially when
using a recency heuristic)
2. PR systems can pursue a large number of different reasonings at any one time.
So in fact, PR systems give you the best of two worlds.
4.5 Explanation
The system should be able to hold an interactive dialogue. MYCIN is an example of a system that can
do so. MYCIN, when asked “why” its asks a
certain piece of information, will do two things:
·
print the rule whose consequent it is currently trying to prove
·
print the preconditions of the already established rules.
4.6 Summary
5. Disadvantages of production rule systems
5.1 The explanation problem
GUIDON, a system based on MYCIN, constructed as a tutoring program, failed because two types of knowledge
have been omitted in MYCIN:
·
The diagnostic approach (a kind of strategic knowledge, which is hard-wired in MYCIN and therefore cannot
be explained by the system)
·
Understanding of the rules (support knowledge, such as that it is the electricity that makes the rule
“IF car not start THEN battery dim” true).
Besides, support knowledge can sometimes be used to determine limitations of individual rules.
5.2 Efficiency problems: Determining the conflict set
Although the creation of a conflict set makes the problem-solving process very flexible, it may cause
computational overload: both determining the
conflict set and resolving it are computationally expensive.
A solution proposed by Forgy is the Rete-approach, based on two observations:
·
many rules have conditions in common
·
working memory changes only a little in each cycle.
The Rete algorithm constructs a tree-structured sorting network or index for the conditions in various
rules, which will be compiled when a rule base is
loaded. Furthermore, it keeps a list of which working memory elements were changed, such that successive
calculations can be avoided if their
elements do not change.
The Rete algorithm has also two minor disadvantages: first, compilation of the conditions of the rules
may take a while. Second, rules constructed
during a session give rise to a sorting network of their own, which reduces efficiency.
5.3 Efficiency problems: Conflict resolution
In large knowledge bases, the obtained conflict set may be too large to execute it fully. This is called
“saturation” (Davis, 1980). There are various
ways to overcome saturation:
·
Refinement of the conditions, to make the conflict set as small as possible. A disadvantage of this
approach is that the time needed to compose
the conflict set grows, but this can be lessened by the Rete algorithm. A more serious problem is that
modularity gets lost (rules depend on each
other).
·
Develop a mechanism that enables the system to cope with the problem, by pruning and reordering of the
conflict set itself. An example is Davis’
approach, which uses meta-rules, explicit representations of pieces of strategic knowledge. Potential
inefficiency due to meta-level overhead can be
avoided by the technique of localization. His approach has a further advantage that strategic knowledge
becomes explainable.
·
Partition the system into relatively small sets of rules. An example are blackboard systems. In blackboard
systems, the rule base is divided in
different knowledge sources, which can communicate via the blackboard. The precise execution of the
different sources is done by a global
interpreter. Another advantage of these blackboard systems are the abillity to use multiple sources
of knowledge.
5.4 Restricted syntax and expressibility problems
The restricted syntax causes some types of knowledge to become difficult or impossible to express:
1. Structural knowledge: if a patient has not undergone surgery, then he has not undergone neurosurgery
either. Maybe this can buried in the rules.
2. Incomplete knowledge, especially negative and disjunctive knowledge. Negative knowledge can take
two forms: not(car, red) and not(x, red) if (car,
x). In OPS5, this type of knowledge can be expressed in the antecedent, but not in the consequent of
a rule. Clever ways to solve this problem result
in obscure rules.
Disjuntive knowledge too can only be encoded in the antecedent (by simply formulating two rules), but
not in the consequent of a rule.
5.5 Is incremental construction of a knowledge base really possible?
Production rules have often been written with a conflict resolution strategy in mind. Adding new rules
may cause therefore all sorts of problems.
For example, modularity gets lost if rules become dependent on each other.
5.6 Summary
6. Conclusions
Production rules are still very popular for constructing expert systems. They have three large advantages:
they are modular, provide natural
expressions of knowledge, and have a structured problem-solving process. But disadvantages are the inefficiency
problems in large rule bases, the
restricted syntax and the doubt whether incremental construction is really possible.
Chapter 5: Semantic nets
1. Introduction
An important feature of human memory is the high number of associations between pieces of information.
The topics of chapter 5 and 6 deal with
representations for knowledge that are based on this intuition.
2. What are semantic nets?
There are two types of primitives: nodes and links. Nodes correspond to objects, or classes of objects,
in the world, whereas links correspond to
relationships between these objects. They are often labeled to indicate their meaning. No information
is stored in a node as such; this is done with the
links associating nodes.
Quillians original work was taken up by psychologists, but also work done in linguistics was important
for the development of the semantic net
approach.
2.1 Representing information in semantic nets
Quillian distinguished between type and token nodes, corresponding to the two ways in which a word can
occur in a dictionary (as entry or as part of
an explanation).
Quillian distinguished furthermore between five link types:
1. A is subclass of B
2. B modifies A
3. Conjunction of a number of nodes
4. Disjunction of a number of nodes
5. B is subject of the relation A which has as object C
(see p. 118 for a figure with an example)
Quillian did not have problems with the circularity caused by the fact that he did not set up primitives.
Others took this primitive word approach (see
further).
A second important source was the development of Case Grammar (Fillmore). He distinguished between a
surface structure and a deep structure in
each sentence, related by a number of transformations. The deep structure has a modality (tense, mood)
and a proposition (a verb with a number of
cases, i.e., roles to be filled in by other parts of the sentence).
2.2 Inferencing in semantic nets
The basic inference mechanism consists of following links between nodes. There are two types of strategies:
·
Spreading of activation (or intersection search)
·
Inheritance.
The first type tries to find a concept that is related to the words to be related by the inference.
Each node can have an activation tag during this
process. When a new node N is reached following a link from A or from a certain C that can be reached
from A, three things can happen:
1. There is already an activation tag mentioning word B associated with N: in this case, N is a basis
for comparison.
2. There is already an activation tag mentioning word A: no new tag is created
3. There is no tag: an activation tag is created.
Also here, the problem of control returns. In Quillians’ approach, this is solved by bidirectional search
(first from A, then B, then A and so on). But there
are other possibilities.
Not all associative network representations use spreading of activation: in frames, for example, this
is never used. But inheritance is. Inheritance
consists in deriving properties of subclasses from superclasses.
Some researchers do not distinguish between Inst links between tokens and their types and the Sub link
between classes and their superclasses.
This is a mistake; you have to. Furthermore, inheritance seems a good way of dealing with default reasoning
(but not without controversies, see
further).
3. Conceptual dependency theory
Schank developed this theory to avoid the circularity problem of Quillians’ primitive-less approach.
He made a distinction between six categories: PP
(a real world object), ACT (a real world action), PA (attributes of objects), AA (attributes of actions),
T (times) and LOC (locations). These categories
can be related, causing a number of link types to come to play. This distinction is at the epistemological
level (see chapter 1). The actual primitives
are the bussiness of the conceptual level. An example are the 14 different types of ACT that Schank
distinguishes.
The advantages of a logic-based KR language using a small set of primitives are:
·
Fewer explicit inference rules are necessary.
·
Some inferences are already in the representation (canonical form of a sentence used to decide whether
two sentences are paraphrases of each
other)
But of course, there are drawbacks too:
·
Translating everything into primitives is a lot of work
·
It is difficult/impossible to find the right set of primitives
·
A lot of inferential work remains to be done (e.g., whether A logically follows from B)
·
Primitive representations are often complex and need a lot of storage space.
4. The logical foundation of semantic nets
In 1975, Woods published an influential paper that criticised then-current semantic net formalisms.
One of his points is that nobody has made clear
what the semantics of semantic nets really are. Some nets (see figure p. 126) can be interpreted in
three different ways: the concept of a black
telephone, an actual black telephone, or an assertion of the relation between telephones and blackness.
This causes us to distinguish between two
kinds of information:
·
Definitional information: this would remain true even if the world changed
·
Assertional information: this may change as the world changes
The absence of semantics is also reflected in problems with ISA links. Fig. p. 127 can be interpreted
in at least four ways.
·
One relevant factor is the intended meaning (intensional or extensional, or description of objects versus
standing for objects).
·
A second important factor is whether ISA means sentence forming or concept forming.
·
A third factor is the way inheritance is used (all or only those properties inherited who are not explicitly
cancelled; usually the latter, but this can lead
to problems with three-legged elephants whose three-leg property is cancelled).
·
(Fourth factor not in text?)
Another shortcoming is the fact that semantic networks are unable to represent propositions without
commitment to their truth value, which is
necessary to deal with belief-sentences. Furthermore, they cannot represent things without stating
that they exist in the world (Mike knows versus
dials Pats telephone number; what if Pat an Mary have the same number?). Some proposals to deal with
Woods’ criticisms are described below.
5. Extending the expressive power of networks
5.1 Partitioned networks
The basic idea is to partition the network into spaces, which can be considered nodes too and thus be
related with links (hierarchical organisation).
Hendrix gives examples of their utility:
·
Language processing (syntactic subparts)
·
Belief of one person in one space
·
Representation of logical connectives and quantifiers
Partitioned networks have their difficulties too:
·
Extensionality versus intensionality of nodes, and the status of the links is not always clear (assertional
or definitional)
·
Representation of arbitrarily quantified propositions: some nodes correspond to propositions, others
to entities. Also, partitioned networks become
large structures, such that the naturalness of simpler networks is given up.
5.2 The SNePS networks
(not in course)
6. KL-one, separating assertional and definitional information
The second point of Wood is the status of the links: they represent ananalytic or synthetic information.
KL-one adresses this problem.
The basic elements of KL-one are structured conceptual objects, or concepts for short. They are intensional
entities. KL-one distinguishes between
generic and individual concepts. A description of an object must account for its internal structure,
for which task roles are used. Roles correspond to
two-place relationships in logic. Structural descriptions are included because quite often roles are
interdependent. Figure p. 136 attempts to clarify
this. Arcs can get the label v/r (value restriction).
The inference mechanism is structured inheritance, indicated by a emboldened link. Since the definitional
part deals with necessary properties only
(the rest is left for the assertional part), inheritance is straightforward. Assertions are made relative
to a context, so they can be used to reason about
beliefs, desires and so on.
The existence of something can be asserted with a description wire to a nexus within a context. The
advantage of KL-one is its clear epistemological
basis. Because it distinguishes between a definitional and an assertional part, it avoids some problems
Wood describes. But it does not solve all
problems: the assertional part was not fully developed, and it is not clear how to represent arbitrary
quantified sentences.
7. The semantics of semantic networks
One has tried to provide for model-theoretic semantics of semantic networks. The easiest way is to try
to translate them to some logic.
Within KL-one, this can be done by stating that i corresponds to the logical constant i, whereas G corresponds
to the one-place predicate G. Various
links can be translated in this way. This translation is dependent upon two things:
·
the nature of the originating concept
·
the nature of the destination concept
(see p. 140 for examples).
There are two problems with this translation:
·
If a concept represents an action, the resulting logic sentence is unnatural. This is due to the particular
translation mechanism, other mechanisms
do not have this problem.
·
With use of inheritance, inconsistent logic sentences may result. This can be solved by translating
to Reiters default logic, but the question is still
open whether this is an acceptable representation.
8. Conclusion
Semantic nets have been superseded by frame-based representation languages.
Chapter 6: Frame-based Representation Languages
1. Introduction
Intuition about human memory: structured chunks of knowledge, more than there is in logic (Minsky) and
also the declarative and procedural aspects
of a given chunk must be tightly interconnected. Minsky introduces the frame concept for this, which
is based on the schema concept of Bartlett
(1932). The type of reasoning in frame systems is often recognition, the comparison of new objects by
known ones.
2. What are frames?
Frames are structures that represent knowledge about a limited aspect of the world. Information is stored
by associating descriptions with it (called
“slots” and consisting of two parts: a name and a value or restriction of values).
A natural way of describing an entity is by comparison with a known entity. Therefore, frames have a
hierarchical structure.
Most frame systems distinguish between class-frames and instance frames. A class frame can have subclasses
below it. Instance frames are
instances of class-frames.
Some situations can be seen from more than one perspective (e.g., kissing as a physical act vs. as a
social act). This can be represented by allowing
super links to connect to more than one other frame. In Minsky’s original conception, some slots cannot
be overridden by other values when inheriting.
This property is mostly lost in other frame systems.
A special kind of frames are scripts, which are prototypical sequences of events. Recently, these scripts
are said to consist of MOPs (Memory
Organization Packets), because scripts did overlap in specific subsequences of actions.
3. Slots
There are different kinds of slots:
·
Attribute-value pair: can have different kinds of filler, like a primitive of the language or a pointer
to another frame in the knowledge base.
·
Restrictions on the values of a slot, such as logical connectives, functions or predicates, an instance
of another class-frame.
·
A procedure, which can be a servant (if it is possible to compute the value when needed) or a demon
(which is called whenever the slot value
changes). This type of representation is called procedural attachment.
·
A superclass slot (for class frames) or a member-of slot (for instance frames)
4. Reasoning with frames
In order to retrieve information, the frame system must perform two kinds of reasoning: matching and
inheritance.
4.1 Matching
As a first step in the reasoning process is to discover which frames can be applied to the current situation.
Matching in frame systems is more difficult
than for example in production rule systems or in FOPC, because (1) frames are complicated structures;
(2) slots can have default properties and (3)
perfect matches are rare (whereas in the other systems, it is clear-cut when one has a match or not).
For an example of matching, see p. 152.
4.2 Inheritance
The system will try to return a value for a slot even when it is not provided. The superclasses of the
frame are searched. This can happen with
breadth-first or with depth-first or with other search methods. Depth-first is more efficient, breadth-first
will always return the most specific value.
Another strategy is depth-first up to join.
[PD: See Winston, chapter 9, for a better discussion on how inheritance can be achieved]
5. Advantages of frame-based knowledge representation languages
Many of the advantages are at the epistemological level.
A first advantage is that frame-based KR languages capture the way in which experts think about their
domain.
A second advantage is that domain experts also use comparison with known things for describing new ones.
A third advantage is the taxonomic structure of the inference mechanism (inheritance). For many applications,
frame systems strike the right balance
between expressive power and efficiency.
A fourth advantage is default reasoning. Furthermore, contrary information, if it exists, is stored
with the object and thus retrieved first. It can be
decided that this information is present, so undecidability does not arise here, although we will sometimes
have to conclude that we do not know the
answer. [PD: not with Winston-algorithm, chapter 9]
6. Problems with frame-based KR languages
6.1 The lack of semantics
Hayes attempts to fill this gap by defining an algorithm to translate KRL expressions to FOPC. For an
example how this happens, see p. 159. Mainly:
·
an instance frame corresponds to a logical constant
·
a class frame corresponds to a one-place predicate
·
restrictions on attribute value slots are expressed with existential quantification
·
slots of class frames are unversally quantified
This approach has three problems:
1. Procedural attachment: this could be tackled by providing a translation for the procedures into logics
as well, or by giving up the enterprise and
handle the procedural aspects in another way (like Prolog does).
2. It is not clear how one could distinguish between member and own slots (in KRL, there was no such
distinction and therefore no such problem).
Own slots in class frames cannot be dealt with in Hayes’ proposal. Nado and Fikes propose to introduce
class constants, but this complicates the
logic, and moreover they do not describe the model-theory of their system.
3. (?) Both systems go against intuition that knowledge is organised around prototypes. Both Hayes and
Nado and Fikes propose solutions, but these
face difficulties. The proposal of Hayes, to represent references to the state of the knowledge base
explicitly, meets three problems: (1) the intuitive
meaning of KB entails ‘P’ is not clear, (2) we need a self-explaining system as language and (3) the
system can give wrong solutions when two
default rules are present. Nado and Fikes translate to default logic, and this solution therefore depends
on the quality of default logic, which is not clear
as well.
6.2 The perils of inheritance
Brachman rises a number of questions concerning inheritance:
1. What is not being inherited when a class frame or instance frame does not inherit a slot of a superclass?
Do you cancel the slot itself, or do you
simply cancel the value? If one wants to cancel the slot, problems arise, because then you do not distinguish
anymore with frames that did not
posses the slot before.
2. Lack of definitional capability: Clyde could be a three-legged elephant with only two legs. why would
one define things anyway? Following Putnam,
most natural kinds cannot be defined. But: (1) in mathematics, definitions exist and are important,
(2) even in real life, definitions can be useful or
necessary.
Not all frame-based KR languages use only default inheritance. UNITS, for example, has four types of
inheritance roles which can be attached to
slots:
1. S-role: necessary for all subclasses
2. R-role: restrictions on values of subclasses
3. U-role: not inherited
4. O-role: optional role (M-role: inheritance)
This allows for the definition of new composite concepts.
6.3 Expressive limitations
·
Disjunctions between values of different slots cannot be expressed in frame-based languages.
·
existential knowledge not in the scope of a universal quantifier is a problem (there is a murderer in
the detective story, but you do not know who it
is. In frame-based systems, an instance frame for the murderer would be distinct from all suspects’
frames). UNITS has a solution to this problem:
indefinite nodes, with an anchor slot (indicating which instance it corresponds to) and a co-reference
slot (which can be filled with other indefinite
nodes, to express that they are equal).
7. Object-oriented programming
(not in course)
8. Conclusion
Chapter 7: Mixed Representation Formalisms
1. Introduction
It seems attractive to combine the advantages of different KR languages and to avoid their disadvantages.
So-called mixed representation languages
attempt to do so. They have certain advantages of their own:
·
The system may be computationally more efficient
·
The system may offer naturalness of expression
·
The system may extend the expressive power of any other KR language
·
(maninly KRYPTON) The system may enable you to represent different types of knowledge in different languages
(epistemological clearness).
There are various hybrid systems. An example is Cake (mixture of programs and predicate calculus). Others
are KRYPTON, NIKL and KL-TWO (all
derived from KL-ONE), and ART, KnowledgeCraft and KEE (the so-called AI Tool Kits).
2. AI Tool Kits
These commercial systems offer mostly an OO-programming language and a rule-based representation language,
and mostly a logic-based
language. These are more or less integrated.
Most AI Tool Kits are Lisp implementations. Therefore, the user can change most aspects (not so in KRYPTON).
These systems have the following
advantages:
1. They give a lot of freedom and power
2. They provide a good development environment
This also causes problems: Much freedom causes large differences in programmin style, which makes maintenance
a problem.
3. KRYPTON
KRYPTON grew out of KL-one and has two main assumptions:
·
a functional view of knowledge
·
the distinction between assertional and definitional information
The functional view of knowledge representation stems from computer science (see Synthesis LISP course,
about abstract data structures). For
knowledge bases, abstract operations are TELL and ASK. TELL adds knowledge to the knowledge base, ASK
can return three possible answers:
yes, no or unknown (the situation is more complicated when “wh” questions are asked). This functional
view goes against the spirit of some object-centered representation languages by making implementation
accessible to user [PD: I thought this style was developed for avoiding to do so].
The second assumption states that one needs to distinguish between assertional and definitional information.
KRYPTON has two separate
components: ABox (Assertional) and TBox (Terminological). Like in KL-one, concepts and roles are formed
by combining or restricting other concepts
or roles. Special primitives, like ConGeneric, can be used for defining new primitives.
The ABox uses a language comparable to FOPC, for dealing with incomplete knowledge. But here, the TBox
specifies what primitive expressions
exist. Thus, one can distinguish between assertions and definitions.
The model-theoretic semantics of KRYPTON are defined by Brachman, Gilbert and Levesque. The TBox’ semantics
are defined as follows.
·
Primitive concepts: set of individuals
·
Roles: two-place predicates
·
Defined concepts: in terms of primitive concepts
For the ABox, it is relatively straightforward to define semantics: this rests on FOPC.
The theorem prover cannot be a standard one. There are two approaches to handle this:
·
Translation of definitions in the TBox into ABox. But there are two reasons to leave this one: the distinction
between definitions and assertions
would vanish, and the translation will slow down the theorem prover.
·
Develop a theorem prover (Stickel). The actual theorem prover is based upon the connection graph proof
procedure of Kowalski. Normal resolution
is extended in two ways: (1) extension of the unification algorithm and (2) conditional resolution (resolution
with “a rest”, a piece of information to prove
yet)
KRYPTON also embodies other useful techniques, such as inference during building.
A problem is that KRYPTON cannot deal with default reasoning. Only the language in the ABox could be
extended to do so, but this would cause
changes in the inference mechanism.
Chapter 8: Parallel Distributed Processing
(not in course)
Chapter 9: Conclusion
1. Introduction
In this chapter, we discuss some open problems and research issues in KR.
2. Analogical representation
Most representations in this book are “Fregean”, as opposed to analogical representations (Sloman).
Many psychologists have argued that human KR
is analogical: Kosslyn, Johnson-Laird, Gelernter, Funt.
Analogical representations have various advantages:
·
Problem solving is easier (example: route finding on a map)
·
It makes certain features self-evident (example: closeness of two blocks)
·
One can use commonsense knowledge without formalizing it.
But there are problems as well:
·
They are often not general
·
They are to a large extent conventional
·
They need difficult an complicated support machinery
3. Single or multiple KR languages
One may regret the abundance of KR languages, but Sloman argues that this abundance is useful. However,
there are difficulties when using multiple
languages: for example, when to use which language and at the same time avoid ad hoc programming? KRYPTON
tries to do this by distinguishing
between definitional and assertional languages, but the language is not without its difficulties. Another
way is Chandrasekaran’s work, which
decomposes complex tasks into generic tasks. Wielinga and Breuker present more or less the samen with
their KADS methodology.
An example is Clancey’s heuristic classification. He observes that experts go through three different
stages:
·
abstraction from the data
·
heuristically matching to a class of solutions
·
Refinement of the solution abstraction into a proper solution
This can be viewed as generic tasks.
A second problem is that it would be necessary to reimplement the knowledge each time it will be used
in another way. There are two ways around
here:
·
Deny that this is the case; simply chance the control knowledge base (wich results nowadays in an inefficient
search)
·
Store the knowledge in an application-independent format. But this approach meets three problems: (1)
it does not explain how intelligent beings
develop new representations, (2) it still requires an answer on the question which representation to
use when and (3) the design of such an
application-independent language is very difficult (all languages up to now have expressive limitations).
4. Connectionism
Connectionism goes against the knowledge representation hypothesis.
Both connectionists and KRH’s disagree with behaviourists. But with each other, they disagree on two
points:
1. The existence of a knowledge base
2. The difference between representation and processing
There are three ways one could view a PDP model:
·
An implementation for a traditional symbol processing theory
·
Alternatives to traditional theories
·
Proposal of a number of primitive operations (most interesting following HR)
Even if one of both theories proves wrong as a description of human intelligence, it can remain interesting
from a technological point of view.
5. Conclusion
Despite the success of AI in certain areas, humans are still the most advanced intelligent systems that
we know of, and a better understanding of
human intelligence and KR will provide important insights that can be used to build artificial intelligent
systems.
[PD: Or: there remains work to do for cognitive scientists]
Tip: the left pane works like windows explorer! Please send your comments to Peter Doomen. This document was updated 27/05/00.
|