Psychology Approach to Theistic Science
An architecture is proposed in which connectionist links and pattern-directed rules are combined in a unified framework, involving the combination of distinct networks in layers. Piaget's developmental psychology is used to suggest specific semantic contents for the individual layers.
The primary difference between connectionist and symbolic processing, I argue (with Fodor & Pylyshyn [1988]), is not the existence of symbols, but rather the existence of rules with general pattern-based applicability. The reason for this is that connectionist systems can readily model the presence of symbols by, for example, having links from all the occurrences of each symbol to one central 'symbol table' node. (This is in fact the way that symbols are implemented in list-processing systems). Thus symbolic structures, semantic nets for example, can be generally modelled as kinds of connectionist systems.
Semantic nets (see Quillian [1968/1980], Schank [1973,1975], Simmons [1973] and Findler [1979]) are an attempt to represent the meanings of sentences by means of a formal network structure. Various forms of these structures are possible (for recent systemisations see Brachman [1979] or Sowa [1984]), but typically objects are represented by nodes in a network, their relations by arcs between the nodes, and inferences involving them by production rules which manipulate the network. The nodes and arcs are usually labelled with English words we can recognise. Strictly, however, this should not be necessary as the semantic net is an attempt to represent the whole meaning of a sentence or concept by its position in an extensive network, and by operations on that network. That is, attempts are made to define meanings entirely in terms of relations and operations in a formal network. Once this is done, the original semantic net will have become a connectionist system.
In this transformation of semantic nets into connectionist networks, the place of the rules remains problematic. In the original semantic net, it is often thought that rules can be included on an equal footing with relations, because production rules can be explicitly included as kinds of nodes or substructures in the network (as in Anderson [1983]). Specific interpretive mechanisms might still be needed, however, to apply the 'rule' components to the 'data' components of the network. Alternatively, the rules could be converted into certain links in the connectionist network, for example along the lines of Rumelhart & McClelland [1986a] or Hinton & Lang [1985]. The 'application' of a 'rule' for a given pattern in the network then amounts to the activation of that node by reason of the activation of the suitable pattern of nodes linked explicitly to the 'rule node'. I now argue, however, that this is in general an inadequate account of rules, and that an extended connectionist architecture is necessary to take them properly into account. As a specific but simple example, consider the recognition of a first four-leaf clover. No such object will ever have been seen before, yet the idea-node for 'four' will clearly have to be activated for a collection of objects that may never have occurred together in the system. It would appear that the activation of the node for 'four' has to be activated by a rule or a function that counts whatever nodes are activated, and not just by those nodes with which it has been explicitly linked to in the past. A similar conclusion is reached from linguistic considerations in Lachter & Bever [1988].
It appears that the generality of application of rules is something that cannot be adequately captured in a purely connectionist network, although a weak imitation of it may be simulated in small specific problem domains. The ability to recognise new instances of a specific concept means that the concept-node cannot be activated by purely historical links, and means that some essential kind of rule-based operation will have to be built into the system. Once this is done, I argue that both the symbol and the processing features of traditional symbolic systems can be incorporated within connectionist networks.
The architecture proposed to handle both connectionist links and pattern-directed rules involves layers of distinct networks, so that the relations within a layer are given explicitly by the links of the graph, whereas the relations between layers have a functional or rule-based interpretation. Specific types of semantic contents will be proposed for the distinct layers, guided to some extent by Piagetian theories of development. Further evidence for the specific contents will then be presented from logical, psychological and computational considerations. It should be pointed out, however, that the general layered architecture is compatible with a range of hypotheses concerning the specific contents of the various layers. Furthermore, because of the wide scope of the hypothesis, this paper should be regarded not as a minimal inductive generalisation of the evidence to hand, but as an exercise in theoretical psychology.
Note that the 'layers' being proposed here are distinct networks, and should not be confused what we might call the different 'levels of explanation' for any given network. The layers here are like the distinct floors of a house. A house has different kinds of connections within layers and between layers. It is also possible to define the multiple 'levels' at which semantic networks may be considered (see e.g. Brachman [1979] and Brachman et al. [1985]). Brachman insists, for example, that the implementational, logical, knowledge (epistemological), conceptual and linguistic levels should not be conflated. These levels, however, are not the 'layers' of the present paper, as his levels merely different ways of looking at the operations of any given network. They like the parallel levels of hardware, logic gates, functional units, machine instructions, software etc., in the operation of a digital computer.
The general idea of multiple layers has been around for some time: Greenwald [1988] surveys the support for what he calls 'levels of representation' in many different kinds of psychological theories (his 'levels' are here 'layers'). He also formulates his own system of layers: this will be examined in section 3.2 below. Our proposed 'layering hypothesis' has a broad scope, as it attempts to bring together and formalise disparate developments in cognitive psychology, developmental psychology, and in the computational modelling of knowledge. Rather than produce new experimental results bearing on the layering hypothesis, we have chosen to collect existing evidence from these different fields, and bring under one roof structures invented for a variety of purposes. Neither of us has produced a computational model embodying the layering hypothesis in a programmed form. This is because, I will argue, with the help of the hypothesis it is possible to pinpoint the most difficult areas in computational modelling and in AI as being precisely the succinct embodiment of realistic relations between the layers.
Section 2 outlines the details of the layering hypothesis, and discusses specific proposals for the contents of the different layers, and how the relations between layers might be set up. Section 3 compares the layering hypothesis with other kinds of theories attempting to cover the same material. The initial PDP work in the area of connectionist systems (Rumelhart et al. [1986]), for example, treats all processes as the activity of one-layer 'neural networks'. Although the PDP work explores 'multi-level' networks as the generalisation of the original 'one-level' perceptrons, the networks are all within one layer in the sense of the present paper. This is because they have only fixed connections, and no functional or algorithmic operations. We must therefore consider to what extent these one-layer neural net models efficiently explain an adequate range of cognitive phenomena.
A number of authors (e.g. Minsky [1975], Boden [1978]) have pointed out the extensive similarities between the enterprises of Piaget and of cognitive modelling, and suggest that closer cooperation should prove fruitful. Production systems have been applied to the analysis of specific Piagetian tasks by Baylor & Gascon [1974] and Young [1976], but only Minsky [1975], to my knowledge, as touched on the overall phenomenon of stages of development which Piaget has described.
The psychology of development is relevant both to cognitive psychology and to the goals of artificial modelling of intelligence (AI) because an understanding of the order in which cognitive structures develop is an important guideline to simulating them realistically. Piaget [1926, 1962] has described several stages of cognitive development, and has characterised them by the performance (or otherwise) of a variety of simple tasks. In section 4.1, therefore, these tasks are summarised, and it is suggested how they lead to and interrelate with the proposed layering scheme.
From the point of view of layering, section 4.2 will then examine the ways that AI workers have approached some of the problems in the computational representation of knowledge. It will be seen that, in the light of the deficiencies of the earlier formal schemes which assumed all the structures existed at 'one level', the more successful approaches have tended to build in the possibility of multilayer structures.
The task of the present paper is not to propose a detailed formalism, but to propose and examine the architecture of multiple layers. We take, therefore, individual layers to be network structures, the nodes of which represent discrete mental sensations or concepts. The contents or meanings of these sensations or concepts are thus defined by the relations to the other nodes, and thus recursively to the whole structure. Various psychological processes are then assumed to correspond to operations on the network. The recall in memory of associated meanings and associated episodes from the past, for example, can correspond to travelling around the network. A psychological model of attention could be based on giving each node an 'activation level', to be passed on to those other nodes with which it is connected, as in the 'spreading activation' models of Collins et al. [1975], or the more recent PDP models. A model of short-term memory could be based (see e.g. Cunningham [1972]) on regulating the total level of activation so that typically at most 7 +/- 2 nodes are active (Miller [1956]).
| Net Level | Network Relations of |
| 5 | meta-theories, paradigms |
| 4 | plans, models, formalisms |
| 3 | classes, series, numbers |
| 2 | events, single relations, sentences |
| 1 | objects |
| 0 | images, motor movements |
Coming now to the specific stages of the layering hypothesis, nodes in the first layer (called layer 0) are taken to represent components of sensations: images, sounds, tastes, textures, etc., and their relations in the perceptual fields as well as in their temporal orders. Nodes in the next layer '1' then represent the concepts of material objects and their relations in space. The crux of the layering hypothesis is that the connections within the layers 0 and 1 are qualitatively different from the functional mechanisms needed to relate the two layers together. Support for this claim will be given from logical considerations (below), from the psychology of development (section 4.1), and from AI work (section 4.2).
Because a given object can appear under very many sensory appearances, due to the possible operations of the continuous rotation and translation groups, not to mention ranges of lighting and occlusions, the relations between the concept of an object and its sensory appearances are quite different in character from the connections that exist between objects, or between appearances. It seems implausible that even a young child has a separate set of network connections for every rotation and translation, although such schemes have been proposed (see Hinton et al. [1985] and section 3.1 below). This implausibility suggests that the mechanism relating the two layers must be distinct from the explicit arc mechanism that relates nodes within a given layer. Section 2.3 will discuss how these new mechanisms between layers might operate.
The layer '2' is assumed to represent the structure of events, episodes, single relations, and ideas of simple causality. The meanings represented in this layer are especially those of simple subject - verb - object sentences, using names which have been attached to the layer 1 concepts of objects. These names and other words must also be related to their pronounciations i.e. to sounds, which are layer 0 structures. This means that linguistic features are attached to all the three layers described so far, and that the correspondences between layers can, at least in part, be considered a linguistic phenomenom. Indeed, much of the initial impetus for the development of the theory of semantic nets came from Fillmore's [1968] work on case grammars.
The layer 2 contains not only the logic for simple action sentences, but also the logic supporting ideas of time and causality. It might contain, for example, the logic of temporal successions (see e.g. Allen et al. [1985]), and the logic of 'naive causality' (see e.g. de Kleer et al. [1984]).
The next layer '3' is used to describe various abstractions from what is concretely observable, such as classes, series, multiple relations, and numbers, along with their mathematical relationships. The ability to move freely backwards and forwards around the network of these concepts means that operations can be compared, reversed, and transitively compounded. Again it should be clear that the relation between the abstractions at this level and the structures at previous levels is not one of static connections by arcs in a network, but has a more dynamic or computational basis. It is implausible that there are connections between say, the number 'four' at this level and all the sets of four objects, four events, etc. Rather, there must be general processes whereby, whenever there are four things present and counting becomes required or relevant, the concept of 'four' is activated. From the logical viewpoint, a concept on the next layer must be defined intentionally (by some functional criterion), rather than extensionally (by giving the set of all configurations on the present layer which satisfy that criterion).
The layer '4' describes and relates, as entities, whole sequential structures of possible operations. It deals with plans, scripts, programs (on computers and elsewhere), and in general, sets of possibilities that are not directly related to what is actually present.
The final layer '5' is postulated to deal with formal structures as 'objects' in their own right. This allows interpretations, world-views, and general paradigms (see Kuhn [1970]) to be considered explicitly. It now becomes possible not only to use formal systems flexibly, but also to think about theories, to construct them, and to realise that formal theories are to some extent independent of the paradigms used to interpret them. From the logical point of view, it appears that the novelty in this stage is the construction of meta-theories. This enables intelligent control over the way the formal theories of the previous stage are formulated, used, and evaluated.
There are two positions one can take concerning the rules needed for the manipulation of a given layer. They can be regarded as structures either in the same layer as their 'data', or in a different layer. If they are in the same layer, then computationally they are easier to model, but explicit interpretative mechanisms must be introduced to enable the 'rule structures' to be applied to their data. It is difficult to see how to do this if each layer is regarded as an unlabelled connectionist network. A stronger hypothesis would be to regard these rules as special cases of the inter-layer rules, as will be discussed in section 3.5.
Between layers, modelling the connective mechanisms is not so easy. However, we can be guided initially by work in linguistics, because linguistic features are attached to nodes in the first three layers (0, 1 and 2), and hence we might look for parallels between the mechanisms which connect the conceptual layers and those which connect the linguistic layers. The relevant linguistic features are the sentence meanings of layer 2, the names attached to the layer 1 concepts of objects, and then the layer 0 structures of the pronounciations, sounds and written forms.
Chomsky [1965] has proposed that 'transformational grammars' are necessary to relate the sentence meanings at layer 2 with sets of words at layer 1, and that similar 'phonological grammars' appear to be necessary (Chomsky & Halle [1968]) to relate words to their pronounciations, the layer 0 'images'. Transformational grammars are sets of transformational rules that relate sequences of words (at layer 1) with a 'deep structure' at layer 2, and as Chomsky points out, these rules can have considerable structural complexity. The complexity arises from the need to map portions of semantic networks onto linear sentences, so that the semantic content of arbitary network relations in the speaker may be verbally encoded in a kind of 'linearised' form for communication to the listener.
Some of the apparent properties of these 'generalised transformational grammars' are listed below:
In the case of the relations between an object and its visual appearances under arbitrary translations and rotations, connectionist schemes have been proposed (Hinton & Lang [1985]), but these are schemes which require a number of inter-layer connections which rises as the product of the number of nodes on layer 0 (the number of image 'pixels') with the number of possible transformations. That is because there has to be an explicit arc joining two nodes for every possible logical connection which may be required. The number of connections is in fact so large that Hinton and Lang [1985] do not even include them explicitly in their simulations, but write procedures which connect just those arcs which are relevant to given input configurations. This use of procedures or rules is more in line with the layering hypothesis.
There may of course be some explicit arc connections between separate layers, to record facts such as the particular instances of more general concepts. These facts are like the associations of classical psychology, and Lachter & Bever [1988] point out that these must be distinguished from the rules needed to model linguistic competence. For example, such associative facts that particular images arise from a given object, or that particular objects were involved in a given event, may be represented by specific arc connections between those images and the given object, and so on. But this does not imply that the object is specified by those connections. We must draw a distinction between relations of instantiation, as recorded for example in episodic memory, and relations of specification, which define the meaning of a higher level concept sufficiently definitely that new instances can be recognised.
Table 2: Greenwald's five levels of human mental representation (Greenwald [1988]).
Greenwald [1988] also postulates a scheme of distinct levels or layers of cognitive representation. His scheme (see figure above) is similar to that of the present paper in the following respects: (a) the approximate number and identity of the levels, (b) the critique that much existing representation theory deals with within-level relations, rather than between-level relations, (c) the relation to Piaget's theory (see section 4.1 below), (d) the assumption of bidirectional inter-level relations, and (e) the distinction between levels of analysis and levels (i.e. my layers) of representation.
However, there is one noticeable difference between the layered structure of here and of Greenwald. He focuses on a 5-level scheme of features, data, objects, categories, propositions, and schemata. This is different in that he has categories at a lower level than propositions & case structures, the reverse of the order I propose. This detail is clearly an empirical question independent of the question of the existence of layered architectures. The difference may, however, be more apparent than real. This is because I do have single relations and categories in layer 2 (alongside events). It may be that they form a 'sublayer' that is 'below' the layer of events and case structures as such, but it is clear that there must be another layer (my layer 3) in which there are generalised relations between events and relations, in order to represent numbers, correspondences, and the reversibility of operations. It is this last layer that does not appear to be clearly present in Greenwald's scheme.
Anderson tries to assimilate this distinction to a somewhat similar but not identical distinction, namely bottom-up versus top-down processing, as it is called in computational work. (Bottom-up processing starts with the data and tries to work up to the high level. Top-down processing tries to fit high-level structures to the data.) Lindsay and Norman [1977], in their introductory psychology text, use a related distinction between data-driven and conceptually driven processing. The distinction is quite clear in language parsers, where control can be in response to incoming words or to knowledge of a grammar. However, the distinction is also found in many models of perceptual processing, although more recently the trend is to combine the two processes so that processsing occurs in response to goals and data jointly.
The original hypothesis of two modes of processing may be explained as the result of bottom-up versus top-down strategies, but it is perhaps better explained as the distinction between intra-layer and inter-layer operations in a multi-layered architecture. For the operations within one layer are serial, require conscious control, and are limited in capacity to what can be held in short-term memory. The transformations between layers, on the other hand, appear to be automatically invoked with some kind of parallel pattern matching triggered directly by the appearance of the appropriate patterns. The second process does not require conscious control in its detailed operation, and is used at a "preattentive" stage to form the perceptions which do reach conscious attention.
If this interpretation is correct, it is then possible for top-down and bottom-up processing (and their combinations) to occur for both the operations within layers, and operations between layers. The parallel construction of perceptions, for example, could be 'data-driven' by the appearance of sensory stimuli, or it could be 'conceptually-driven' such as in the production of images in dreams and hallucinations. Similarly, the serial operations of conscious planning or problem solving (of GPS-like problems) could be 'data-driven' by the succession of partial solutions, or they could be 'conceptually-driven' by explicit goals or strategies concerning what constitutes an appropriate solution.
Manipulation of symbols and structural combinations may be concerned with image processing (layer 0), with object relations (layer 1), with simulating sequences of events (layer 2), with classification or number problems (layer 3), or with modelling formal systems (layer 4), but they have very rarely succeeded in relating these different semantic layers in a realistic fashion.
It is still possible to generate lower layers moderately realistically, for example to generate images of objects performing various actions, but the recognition problem, of activating a higher layer by the 'correspondence rules', is more difficult. The recognition problem only becomes manageable in environments where the syntax has been systematically specified beforehand, for example in optical character recognition, or in the 'block worlds' of various AI projects.
A cursory inspection of the kinds of items in human focal attention shows that there are significant differences between operations within one layer, and operations which relate two layers. We perceive considerable quantities of detail during the successive operations within one layer, and it is possible for us to 'observe' the processes in action, and to guide them by means of explicit rules. A large amount of AI work has involved taking these observed rules and modelling their effects on symbol structures in computers. However, in contrast to the detail we see in the solution of the combinatorial problems that can be posed within only one layer, the transformations between layers seem to occur almost 'invisibly' for us. This lack of detail is deceptive, as the transformations embody a great deal of tacit knowledge (Polanyi [1958], Berry [1987]), no less a quantity, it is claimed, than the explicit knowledge visible in transformations within one layer. This has the consequence that the AI problems of perception and pattern recognition are difficult for us to program, in contrast to solving combinatorial problems: we cannot see the details of our own inter-layer transformations, and so cannot write down simple rules to describe them.
We would then regard rules as simply two-node fragments of the next-higher layer, in the following way. Call these two nodes in layer n+1 as p and q, say, with the rule being p -> q. After the initial p node is be activated by an appropriate pattern in layer n, it feeds its activation to q, which then generates a second pattern of activity in the lower layer n. In this way, arbitrarily complex production rules p -> q for a layer ncan be generated by suitable combinations of the inter-layer transformations described previously. There is now no need to distinguish 'rules' from 'data' in each layer, and no need for any specific interpretive mechanisms beyond what is already necessary to link the layers together.
If this 'strong layering hypothesis' were correct, there would be interesting implications concerning the status of heuristic rules that are currently used in many branches of cognitive modelling. The implication is that there are no specific heuristic rules. There are associations in episodic memory, which sometime guide searches heuristically, and there are inter-layer transformation rules. There is a certain phenomenological plausibility to this hypothesis. If we are asked to manipulate images (layer 0) in a certain way, for example, we tend to think of them as objects (layer 1). After then manipulating those objects, the images are then regenerated from the new imagined appearance of the objects. There may still of course be associations between images, but these are based on historical connections, and do not have the generality of application that is characteristic of rules. Similarly, if we are asked to manipulate objects (layer 1), it is plausible that we do so by means of imagined actions (layer 2) performed by those objects (e.g. "object a wants to move from x to y"), rather than by abstract move predicates such as move(a,x,y). Whether these implications are in fact true must be the subject of empirical investigation.
According to this stronger hypothesis, there would be an almost complete reversal of the roles of rule-based and associative processing. For now the network associations within a given layer need not be 'massively parallel', but would be regulated so that they operate in a memory-limited fashion requiring conscious control, with a parallelism of at most 2 to 10 processes. Futhermore, the rule-governed processes, which now operate between layers, would not act according to conscious serial control, but in an automatic parallel manner as in Anderson [1983]. The principal justification for this counter-intuitive reversal is that, for reasons given above, the connections between layers must have the generality of rule-based transformational grammars, and that once this is granted, there need be only simple and limited associations within each layer. In contrast to the PDP networks proposed for visual and linguistic processing, there can now be a much closer correlation between the active nodes in a layer of the network, and the psychological items in short-term memory.
Table 3: Relation between Network Layers and Piagetian Stages
| During each stage, at the approximate ages shown, the child is learning to relate the concepts listed in the second column. That is, (s)he is constructing relations in a network at the given level. | |||
| Net Level | Network Relations of | Developed in
Piaget/ Gowan Stage |
during ages |
| 5 | meta-theories, paradigms | creative | 17- |
| 4 | plans, models, formalisms | formal | 12-16 |
| 3 | classes, series, numbers | operational | 7-11 |
| 2 | events, single relations, sentences | preoperational (preconceptual & intuitive) | 2-6 |
| 1 | objects | sensorimotor | 0-1 |
| 0 | images, motor movements | (initial) | -0 |
Piaget's observations of conservation vs. non-conservation can be explained on the layering hypothesis, if the child is beginning to have concepts of relations such as 'more' or 'less' in the layer 3, but not the general coordination of these concepts among themselves within that layer. This means that there appear to be procedures for recognising and naming the given concepts as nodes in a nascent layer 3, but these nodes are like isolated islands, and not as connected among themselves as they will be in the next 'operational stage'.
These abilities seem to start with the ability to see relations of relations, even if only of objects actually or recently present. The name 'operational stage' does not mean that this is the first stage for recognising operations and relations, for these are recognised as single events in the preconceptual stage. Rather, this is the age at which there are explicit relations (e.g. of 1:1 correspondences & numbers) between operations and relations themselves.
For the representation of more general kinds of knowledge in semantic networks, there have been a number of formalisms which use distinct layers:
Sowa surmises that 'sensory icons' (layer 0 entities) are classified by an 'associative comparator' (searching long-term memory) into 'percepts', which are then assembled into 'working models' (perhaps better called 'perceptual graphs', after Morton et al. [1987]). However, no guidance is given as to how such processes as translational invariance, varying illumination and partial blocking are to be incorporated into the associative comparator. That is, although there is an extensive library of operations within a given layer, they do not enable the production or recognition of networks in adjacent layers.
This generalised method amounts to using layer-2 information about events and movements to constrain the search for layer-1 concepts of objects which are to be detected from layer-0 images. The success of this method reinforces the hypothesis that successive inter-layer transformation mechanisms operate simultaneously in a cooperative manner.
It is notable, however, that even for his restricted problems, Anderson has found it useful for psychological realism to postulate that the pattern-matching mechanisms for many different production rules all operate in parallel. In fact, he has the rate of the pattern-matching processes proportional to an 'activation level' for the different rules. Although it is time-consuming to simulate these parallel matching mechanisms on a serial computer, such fast mechanisms appear to be necessary. According to the layering hypothesis, such parallel transformation mechanisms are operative between each pair of adjacent layers.