You are on page 1of 6

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICSPART B: CYBERNETICS, VOL. 31, NO.

5, OCTOBER 2001

729

A Cognitive Architecture That Solves A Problem Stated by Minsky


Andrs Pomi and Eduardo Mizraji

AbstractWe named Minskys problem the challenge of building up a cognitive architecture able to perform a good diagnosis based on multiple criteria that arrive one by one as successive clues. This is a remarkable human information processing capability, and a desired ability for an artificial expert system. We present here a general cognitive design that solves the Minskys problem and a neural network implementation of it that uses distributed associative memories. The type of architecture we present is based on the interaction between an attribute-object associator (AOA) and an intersection filter (IF) of successive evoked objects, with the intermediation of a working (short-term) memory. Index TermsDistributed associative memories, human information processing, multiple criteria decision making, neural systems.

I. INTRODUCTION HE HUMAN mind is capable of identifying an object after the reception of successive partial cues. As an example, if somebody says a living being, that moves with its four legs, with a tail, we can justly consider a tiger, an elephant, or a horse. Here, the information is clearly not enough to obtain a unique diagnosis. Now, if he informs us that the being is friendly with persons, capable of living inside home, highly loyal we suspect he is describing a dog (or perhaps, a cat). Finally, if he adds that this being is capable of barking, we are sure about the dog. This ability to arrive to a diagnosis after successive partial cues is an important feature to implement in artificial cognitive systems. However, it is obviously a fundamental physiological ability of real brains. The importance of this problem has been clearly identified by Marvin Minsky. In The Society of Mind he wrote: How do we recognize our own ideas? At first, that must seem a strange question. But consider two different situations. In the first case, I hold up an apple and ask, What is this? ( ) such a sight could lead to activating polynemes for words like apple or fruit. In the second case, there is no apple on the scene, and I ask instead, What do we call those round, red, thin-peeled fruits? Yet this time, too, you end up with an apple thought. Isnt it remarkable that one can recognize a thing merely from hearing words? [1, Sect. 19.9]. The importance

of this human cognitive performance has received a further emphasis in the fiction book The Turing Option, that Minsky wrote with Harry Harrison (in this book, the ability of retrieving an apple from a serial presentation of its attributes, points out an important advance in the reconstruction of the injured brain of the young and brilliant scientist Brian Delaney) [2]. Let us now define the Minskys problem as the problem of building up a modular cognitive architecture that is potentially able to arrive to a good diagnosis starting from a sequence of partial cues. The adverb potentially reflects the fact that an appropriate device can fail the diagnosis because of misleading clues or insufficient information in its databases. Up to the present time, the real biological architectures that perform this function remain largely unknown. However, we would like to emphasize that recent findings coming from empirical and computational research, support the emerging view of neural cortical processing as performed by modular structures [3]. The objective of this note is to show a class of devices capable of solving the Minskys problem. We begin with a description of a modular architecture that is potentially able to solve the stated problem. Then, we show how each one of these blocks can be materialized using (artificial or biological) matrix associative memories.

II. CONNECTIONIST DESIGN FOR CLUE-GUIDED DIAGNOSIS A. General Cognitive Architecture The block diagram presented in Fig. 1 shows a modular design able to perform a progressive diagnosis process that is driven by the arriving of clues. It behaves as a dynamical search in an object space that narrows step-by-step until a single decision is reached. The time is assumed as a discrete variable. This model takes into account three classes of functional blocks: 1) an attribute-object associator (AOA); 2) a kind of working memory (WM); 3) an intersection filter (IF). Its cognitive potential will be the result of particular memories connected with an adequate topology and the correct feedbacks. As happens usually with cognitive models, it is underdetermined by the task; many other configurations display potential solutions. The particular solution we implement here, with a loop interacting with a WM influencing the information processing, adopts the basic strategy of several dynamical models for linguistic performances [4], [5]. In what follows, we present a general blueprint of the model. To illustrate its functioning with a concrete example, consider

Manuscript received February 13, 2000; revised November 19, 2000 and May 31, 2001. This work was supported in part by PEDECIBA (Uruguay). This paper was recommended by Associate Editor L. O. Hall. The authors are with the Biophysics Department, Universidad de la Repblica, Montevideo, Uruguay (e-mail: pomi@fcien.edu.uy; mizraj@fcien.edu.uy). Publisher Item Identifier S 1083-4419(01)08542-9.

10834419/01$10.00 2001 IEEE

730

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICSPART B: CYBERNETICS, VOL. 31, NO. 5, OCTOBER 2001

Fig. 1. Modular architecture for the Minskys problem: Attribute-object associator (AOA); intersection filter (IF); working memory (WM). IF yields the intersection between the objects associated with the attributes presented to the memory AOA, and the objects coming from the WM (these are the objects remaining from the last step in the intersection process).

the data given in Table I in the form of a property-list [6]. Each row is a very schematic description of a fruit (its character), based on the presence or absence of some of the seven given attributes. In the frame of our example, an attribute enters a modular memory that associates attributes with objects. Consequently, our first module is an AOA. The attribute round fruit can display the association with orange, melon, apple, watermelon, grape, and plum. This set of associated outputs defines the potential-objects, recalled after the presentation of the attribute. These objects are sent to a second module, an IF, a memory capable of extracting the intersection between its two entries. Besides the output of the AOA, the IF receives in parallel the contents of a third module, a WM, that only stores the output displayed by the IF in the previous time unit. This IF works as follows: in the time unit t it receives in parallel the potential-objects(t) displayed by the AOA, and the objects(t-1) stored in the WM at time t-1. The IF displays a new intersection, the objects(t), and sends this output to the WM. There, this new pattern displaces the previously stored object(t-1) (see Fig. 1). Imagine that the previously mentioned round fruits are stored in the WM and now we introduce as attribute thinpeeled fruits. The potential objects can be grape, apple, and plum. Hence, the IF excludes orange, melon, and watermelon and the objects become grape, apple, and plum. Finally, if we introduce the attribute fisted-sized fruit, the associated output can be orange and apple. Then, the IF retains apple, and here the system arrives to a unique diagnosis. Remark that in this kind of problem the value of the information provided by the input depends on the nature of the database and on the history of any particular diagnosis process. In the trivial case in which the AOA memory is empty, any input has a null value. The role of the history can be illustrated using the previous example: the information fisted-sized fruit solves the problem because orange had been previously discarded. B. Neural Network Implementation Neural network models are a broad and interdisciplinary field. In this field, the search for a link between neurobiology

and cognition on the one hand, and the race for increasing the computational power of engineering devices on the other hand, meet. An illustration of this large diversity of neural computational models can be found in The Handbook of Brain Theory and Neural Networks [7]. As occurs with the phenomenological cognitive solution, also its neural network implementation is underdetermined, and among others, powerful computational abilities can be reached using some of the variants of ART models [8] or bidirectional associative memories [9]. In what follows, we are going to discuss the implementation of the described solution to the Minskys problem using variations of distributed associative memories [10], [11]. Some of the characteristics of these pattern associators maintain them attractive from a neuromimetic point of view. In this class of associative memory models, the cognitive-relevant functional units are the patterns of activity of large neural assemblies (not the individual neurons). These assemblies are naturally represented by vectors that carry information via a particular distributed population code. This view has been recently receiving a strong support in evidence coming from the neurofunctional imaging techniques [12][14] and the new electrophysiological methods of simultaneous neural ensemble recording [15]. Memories are installed in matrices whose coefficients represent synaptic weights. These matrix memories store many pairs of vectorial inputoutput (I/O) associations. The information represented by the pairs of associated vectors is scattered over the matrix coefficients. At the same time, these coefficients superimpose data coming from different pairs of associated vectors. Hence, in these matrix memories the stored information is distributed, superimposed, and becomes content-addressable [11], [16]. In the biological realm, the dimensions of the vectors are related with the size of the neural modules that support the memories. If the dimensionality is enough, the matrix neural models also conserves the robustness found in neurobiological systems [11], [17]. Using the theory of matrix associative memories, Kohonen designed a device with the remarkable ability of acting as a novelty filter [11], [17]. This filter extracts novelty from input data, and was built up using the projections of the entries onto the orthogonal complement space of the linear subspace spanned by a set of vectors stored (auto-associated) in the memory. These matrix memories can be adapted to become contextsensitive, via a tensor preprocessing. Context-dependent associative memories are memories with two entries; in the presence of a key input vector the association retrieved by the memory depends upon a second input vector acting as its context. These memories can be viewed as classical matrix memories that associate the output vectors with the Kronecker product of their two entries: the key input and the context [16], [18]. They are particularly apt to carry out the symbolic calculations [19], [20] characteristic of artificial intelligence (AI) systems, bridging the gap between symbolic AI and neural networks (see [21] for a comment). We would like to mention, in passing, that this fact helps us to approach, in a modern neural-network framework, the McCullochPitts program (the possession of a tool for rigorous symbolic treatment of known nets and an easy method of constructing hypothetical nets of required properties [22]). In addition, the matrix memories are capable to deal with fuzzy

POMI AND MIZRAJI: COGNITIVE ARCHITECTURE THAT SOLVES MINSKYS PROBLEM

731

TABLE I FRUITS AND THEIR ATTRIBUTES

data (represented by weighted sums of the basic vectors), due to the interpolation potentialities of matrix operators [16], [19]. The AOA module can be fulfilled by a classical correlation memory, a matrix that in the simplest case can be represented as follows [11]: (1) The attributes presented to this memory map on -dimensional vectors , and each one of them can be associated with a set of different objects represented by -dimensional vectors . The subindex represent a set of integers, , where is the number of objects that share the attribute . For instance, if the attribute red (let ) can be associated with objects like apple (let it it be be ) and watermelon (let it be ) then the subindexes are and . is processed by the memory as follows: An input vector

inject (all these vectors stored in . memory ), then the output is their intersection: We would like to mention that whereas the IF, besides having two entries, responds with the information stored in the memory, the Kohonens novelty filter is passed through only by the portion of the stimulus unknown for the memory, the information stored in it producing no response [11], [17]. As we communicated elsewhere [16], the frame of context-dependent associative memories allows the construction of context-sensitive novelty filters that resembles certain properties of attentional phenomena. A fully performant IF module can be built up adding to another auto-associative the previously described memory memory with the structure (5) , where This memory performs the operation the vector is a normalized column vector containing the same components in all its positions. This vector defines a kind of white context that allows to transport vector through the filter defined by memory . The operation of the module IF is performed by a matrix memory that is the sum of matrix plus matrix (6) In the configuration displayed in Fig. 1, the memory IF receives at each time step two inputs, one coming from memory AOA and another, coming from the WM. In our model, the WM is a critical module. We are going to assume that this module is not a matrix associative memory, but a device that stores only one vector at a time, and that is capable of producing two classes of signals: a) in the absence of any input, the spontaneous activity of WM produces a white vector that is introduced into the memory filter IF and b) in the presence of an input coming from the IF (usually a summatory of many vectors), the WM stores it, displacing the existing information. In the next time step, this WM reinjects this stored vector into memory IF. It is natural to imagine that a real WM is open to the influences of inputs coming from many different neural networks. The filter IF of our heuristic model is only one of these networks. To sum up, the operation of this modular neural network occurs as follows.

(2)

is the scalar product). Let us assume that the code is se( lected in such a way that different attributes map on orthonormal is stored in the memory, it provectors . Then, if the input duces the following output: (3) We are going to assume that the vectors are orthonormal. Under this condition, the IFs can be constructed using context-dependent associative memories with the following structure [18]: (4) In these memories, the input is the Kronecker product (see, is for example, [23], [24]) of the vectors . The memory an auto-associative memory with the vectors acting also as their own contexts. This memory is capable of computing in parallel two simultaneous inputs, and it is easy to see that if in , and in the other you one branch you enter

732

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICSPART B: CYBERNETICS, VOL. 31, NO. 5, OCTOBER 2001

TABLE II COMPONENTS OF THE WALSH COLUMN VECTORS ASSIGNED TO OBJECTS AND ATTRIBUTES

TABLE III AOA MEMORY MATRIX

Erratum: a last row identical with the second is lacking.

1) An attribute enters the matrix memory AOA at time t . and generates the outputs 2) These outputs enter in the IF. If the WM is empty, the IF receives as its second entry a white vector . In this case, the component of the IF makes its first output to be the . This output is sent same initial set to the WM and stored. 3) At time t 1, a second attribute, coded by vector , pro, duces the outputs the vectorial representation of the associated objects. 4) In this step, the IF generates the intersection beand and this new set tween is stored in the WM. enters at time t 2 into the associator 5) A new attribute AOA and at the end a new intersection set will appear as output, and so on. Remark that all this process converges to a diagnosis only if the appropriate databases exist inside the memory matrix AOA and the IF. Thus, databases that are almost empty prevent almost

all useful diagnosis, because the required information does not exist. However, note that an excess of information (with respect to the nature of the inputs) can also obstruct diagnosis. In our fruit example, the knowledge of other exotic, red, round, and fist-sized fruits prevents the easy identification of the common apple and additional attributes are needed. C. An Example Let us show a numerical example. The objects (fruits) described in Table I are mapped on 8-dimensional orthogonal column vectors and the attributes map on 16-dimensional orthogonal column vectors. We selected these orthogonal vectors from the set of Walsh functions [25] (see Table II). In what follows (for typographical simplicity), we will represent these column vectors as rows containing their elements. We instructed the matrix memory AOA according to (1) with the associations established in Table I, using a normalized version of the corresponding Walsh vectors. This procedure gener-

POMI AND MIZRAJI: COGNITIVE ARCHITECTURE THAT SOLVES MINSKYS PROBLEM

733

ates a matrix. Table III shows the aspect of this distributed memory. The IF module was instructed adding an IF, that is the self[(4)] of the 8-dimencontextual auto-associative memory sional Walsh column vectors corresponding to the seven fruits of our example, with the matrix according to (5). , , and The vectors acting as inputs (but not IF have dimension the outputs) were normalized for our computation. The trans) is shown in Table IV. pose of this memory matrix IF ( During the computation of this system the arriving clues and the contents of the WM are normalized. In the following, the outputs of the AOA memory have been denormalized (multiplying them by 8) in order to facilitate the interpretation of the outputs using Table II. Notice that in the following example, the arriving of the attributes defines the time steps. For each step we describe the vectors codes and their interpretation: Step 1: First clue: 11111111 1 1 1 1 1 1 1 1 (Attribute: round fruit) AOA output: 6 2 0 0 0 0 2 2 (Associated objects: melon grape plum waterapple melon orange) WM contents: 1 1 1 1 1 1 1 1 (initializing vector) IF output: melon 6 2 0 0 0 0 2 2 (apple grape plum watermelon orange) [AOA(Note that IF-output ). output WM-contents] Step 2 Second clue: 1 1 111 1 11 111 1 111 1 (Attribute: thin-peeled) AOA output: 3 1 1 1 1 3 1 1 (Associated objects: apple grape plum) WM contents: 6 2 0 0 0 0 2 2 (IF output from step 1): apple melon grape plum watermelon orange) IF output: 3 1 1 1 1 3 1 1 (Intersection: apple grape plum) Step 3 Third clue: 1 11 1 11 11 11 111 11 1 (Attribute: medium-sized) AOA output: 3 1 1 1 1 1 3 1 (Associated objects: apple banana orange) WM contents: 3 1 1 1 1 3 1 1 (IF output from step 2): apple grape plum) IF output: 1 1 1 1 1 1 1 1 (Intersection: apple).

TABLE IV IF MATRIX TRANSPOSE

In this case, the device has been able to arrive to a unique diagnosis in three steps. The interpolation capacities of the matrix memories involved in this architecture, allow (in some cases) to produce useful diagnosis even when the system is confronted with situations not included in the databases. As a simple example, imagine that our previous device is confronted with a tropical fruit not present

734

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICSPART B: CYBERNETICS, VOL. 31, NO. 5, OCTOBER 2001

in the memories. This new fruit has the following properties: round, medium-sized and with a peel not so thick as the banana or the orange, say 0.3 thin 0.7 thick (where attributes thin and thick have been previously normalized). The final output produced by the system is a fruit 0.3 apple 0.7 orange. III. COMMENTS AND EXTENSIONS As a further refinement of this network of modules, we can assume that the AOA memories can display the following structure: (7) measure the deIn these memories, the coefficients ). In gree of familiarity with the particular association ( this way, the historical experience of a memory determines the relative importance of the associations. In these memories the diagnosis can be decided using nonlinear filters that only retain output vectors with their weights above a certain threshold (e.g., the holographic memories, [26]). In the presence of some amount of nonlinear processing, a drastic reduction of dimensionality became possible using, instead of orthonormal vectors, random unitary Gaussian vectors (a pseudo-orthonormal basis under certain threshold conditions), [27, Ch. 7, p. 239]. Matrix associative memories like AOA or IF imply the existence of learning processes. Some kind of Hebbian learning can be invoked in order to produce the installation of the memories (in particular, WidrowHoff or similar gradient-descent algorithms). Note that a sustained input on memory IF can produce at the beginning the associative term (because the WM provides the white context ) and later can construct the term (because now WM gives as context the vector ). However, the learning processes capable of leading to the associative memories are beyond the scope of the present communication. We presented a heuristic solution to an important cognitive problem: the progressive refinement in a searching process while new data are arriving. Notice that this ability of narrowing an initial set of possible decisions obeying multiple constraints, is one of the basis of medical diagnosis. There are several combinations of modular topologies and feedback structures able to give a performant capability in focusing diagnosis. We show a kind of devices whose principal elements are the memories banks and a kind of IF of successive retrievals. We remark that the implementation of short-term memories, that allows the coincidence of asynchronously elicited neural activities, appears to be a master key in the material resolution of these types of nonequilibrium information processes. REFERENCES
[1] M. Minsky, The Society of Mind. New York: Simon and Schuster, 1988. [2] H. Harrison and M. Minsky, The Turing Option. New York: QuestarWarner, 1992. [3] H. Eichenbaum, Thinking about brain cell assemblies, Science, vol. 261, pp. 993994, 1993. [4] J. L. Elman, Language as a dynamical system, in Mind as Motion. Explorations in the Dynamics of Cognition, R. F. Port and T. van Gelder, Eds. Cambridge, MA: MIT Press, 1995, pp. 195225.

[5] M. Spitzer, The Mind Within the Net. Cambridge, MA: MIT Press, 1999. [6] M. Minsky, Steps toward artificial intelligence, Proc. IRE, vol. 49, pp. 830, 1961. [7] M. A. Arbib, Ed., The Handbook of Brain Theory and Neural Networks. Cambridge, MA: MIT Press, 1995. [8] G. A. Carpenter and S. Grossberg, Adaptive resonance theory (ART), in The Handbook of Brain Theory and Neural Networks, M. A. Arbib, Ed. Cambridge, MA: MIT Press, 1995, pp. 7982. [9] B. Kosko, Neural Networks and Fuzzy Systems. A Dynamical Systems Approach to Machine Intelligence. Englewood Cliffs, NJ: PrenticeHall, 1992. [10] T. Kohonen, Correlation matrix memories, IEEE Trans. Comput., vol. C-21, pp. 353359, 1972. , Associative Memory. A System-Theoretical Approach. New [11] York: Springer-Verlag, 1977. [12] A. Martin et al., Neural correlates of category-specific knowledge, Nature, vol. 379, pp. 649652, 1996. [13] M. I. Posner and M. E. Raichle, The neuroimaging of human brain function, in Proc. Nat. Acad. Sci. USA, vol. 95, 1998, pp. 763764. [14] M. E. Raichle, Visualizing the mind, Sci. Amer., vol. 270, pp. 5864, 1994. [15] M. A. L. Nicolelis et al., Hebbs dream: The resurgence of cell assemblies, Neuron, vol. 19, pp. 219221, 1997. [16] E. Mizraji et al., Multiplicative contexts in associative memories, BioSyst., vol. 32, pp. 145161, 1994. [17] T. Kohonen et al., Storage and processing of information in distributed associative memory systems, in Parallel Models of Associative Memory, G. E. Hinton and J. A. Anderson, Eds. Hillsdale, NJ: L. Erlbaum, 1989, pp. 129167. [18] A. Pomi and E. Mizraji, Memories in context, BioSyst., vol. 50, pp. 173188, 1999. [19] E. Mizraji, Vector logics: The matrix-vector representation of logical calculus, Fuzzy Sets Syst., vol. 50, pp. 179185, 1992. [20] E. Mizraji and J. Lin, A dynamical approach to logical decisions, Complexity, vol. 2, pp. 5663, 1997. [21] J. A. Barnden, Artificial intelligence and neural networks, in The Handbook of Brain Theory and Neural Networks, M. A. Arbib, Ed. Cambridge, MA: MIT Press, 1995, pp. 98102. [22] W. S. McCulloch and W. Pitts, A logical calculus of the ideas immanent in nervous activity, Bull. Math. Biophysics, vol. 5, pp. 115133, 1943. [23] R. Bellman, Introduction to Matrix Analysis. New York: McGrawHill, 1960. [24] A. Graham, Kronecker Products and Matrix Calculus with Applications. Chichester, U.K.: Ellis Horwood, 1981. [25] S. Barnett, Matrices. Oxford, U.K.: Clarendon, 1990. [26] Y.-H. Pao, Adaptive Pattern Recognition and Neural Networks. Reading, MA: Addison-Wesley, 1989. [27] T. Kohonen, Self-Organizing Maps. New York: Springer-Verlag, 1997.

Andrs Pomi received the M.D. degree and the M.S. degree in biophysics from the Universidad de la Repblica, Montevideo, Uruguay, in 1991 and 1995, respectively. He is an Assistant Professor with the Biophysics Department, Universidad de la Repblica. His current research interests are in the field of cognitive neurosciences with the approach of distributed associative memories models.

Eduardo Mizraji received the M.D. degree from the Universidad de la Repblica, Montevideo, Uruguay, and the DEA degree in applied mathematics from the University of Paris V, France, in 1977. He is a Professor of Biophysics and Head of the Department of Cell and Molecular Biology, Universidad de la Repblica. His current research interests include information processing in extended neural systems and the neural processing of symbolic operations.

You might also like