You are on page 1of 16

Micro-net - The Parallel Path Artificial Neuron

A dissertation presented for fulfilment of the requirements for the award of Doctor of Philosophy September 2006

A. G. W. Murray (2006) All rights reserved

By Andrew Gerard William Murray Faculty of Information and Communications Technology Swinburne University of Technology

Acknowledgements
Foremost, Id like to thank those who have acted, formally or informally, as my supervisors. In chronological order (and possibly, the amount of damage sustained), Id like to express my appreciation to Prof. Tim Hendtlass, who as my coordinating supervisor has provided rigorous academic challenge, expert guidance, insightful management and heaps of financial support. Thank you to Dr. Kevin Bluff for filling the void until my then brand spanking, new secondary supervisor was ready. Thanks heaps to Dr. John Podlena for the innovative perspective on the art of object oriented coding, financial support through imaginative grant writing and the many simulated deaths I suffered at his hands playing in one of his MUDs.

Clinton Woodward, a colleague, a collaborator, a genuine contributor and my friend. Thank you for hearing me out, discussing and realizing some of the countless possibilities this project has presented. In reality, without your considerable input H.U.N.T.E. would have never seen a bug free light of day.

Thank you also to Clintons wife, Zanyta for continually letting me back into her home, for allowing us to occupy her dining room on any given Sunday at the drop of a hat and for baking the best muffins. We really were working. X-Box was all just a part of our cunning

experimental strategy.

My other collaborator who contributed to the completion of this thesis was Dr. Anthony Bartel. Sometimes the only voice of reason in a cacophony of white noise concerning issues of progress, content and the foibles of institutions. Thank you also for the sanctuary of the

nucleonics laboratory and, like Clinton, listening to my obtuse ideas.

I would like to thank Prof. Peter Cadusch for his interest and invaluable insight into the pure mathematical intricacies of universal approximation (for this slightly smaller universe) and his suggestions for generalising the formulation of micro-net topology.

Thank you to Dr. Joe Chiorchari for finding me teaching hours when I first began my teaching assistantship and the continued supplementing of my training in the biophysics lab. Thank you to Mr. Bill Clune, Mr. Manny Kourondourous, Mrs. Mary Roberts, Mrs. Sarah Porter and (especially) Mrs. Jodie Hopkins, the lab managers who fixed, fabricated or found all the bits to make the laboratory practicals I was involved with work. Thank you also to Mr. Chris Anthony who always tried hard under sometimes very difficult circumstances.

iv

Much appreciation is extended to Dr. Rob Bucchan, who showed me the teaching assistantship ropes, and to Mr. Simon Dankert, Mr. Chris Wright, Dr. Mardi Sait, Dr. Charles Cranfield, Dr. Melis Senova, Dr. Recep Ullusoy and Mr. Bob Mair, my fellow teaching assistants, for providing countless hours of entertainment.

Also providing stuff and therefore deserving of my gratitude is both Mr. Cameron Young and Mr. Steve Burrows who always found me everything from stationery to laptops and projectors. Also worthy of special mention is Mr. Darren Spenser. Thank you for all hardware and network support, and allowing me multiple login privileges to trial H.U.N.T.E. (even if deep down you were probably thinking that this was all going to end in tears).

To Neil Cole, the first and best manager I ever worked for. Id like to say thank you for the concept (and implementation thereof) of the four pot lunch, talking parts per million not a sack of spuds, Fuzzy Wuzzy the Worm, the notion that research is a worthwhile pursuit and the inspiration to continue that pursuit under trying circumstances.

The last thing to thank Neil for is employing Ralph Nischwitz, chemist, master of the 5-wood and the fine practitioner and purveyor of home brew. Who was and remains the provider of many welcome distractions when really, focus would have been more prudent and then asking, is it (the thesis) finished yet? Special thanks to Sharon (Mrs. Nischwitz) Cleaves for infinite

patience and putting up with all the golf shenanigans.

Thank you to Dr. William H. Leadston, the clinicians you mentored and your office staff (Heather and Melita) for all the patience, generosity and effort you put into dealing with my sweet disposition.

Finally, to my family and close friends who made up the short fall in terms of every other possible resource that this self indulgence could consume, especially my mother Josephine (Castleton) Murray and my sister Sandra. There are not enough words to express the debt of gratitude I owe.

Alls well that ends!

Abstract
A feed forward architecture is suggested that increases the complexity of conventional neural network components through the implementation of a more complex scheme of interconnection. This is done with a view to increasing the range of application of the feed forward paradigm.

The uniqueness of this new network design is illustrated by developing an extended taxonomy of accepted published constructs specific and similar to the higher order, product kernel approximations achievable using parallel paths. Network topologies from this taxonomy are then compared to each other and the architectures containing parallel paths. In attempting this comparison, the context of the term network topology is reconsidered.

The output of channels in these parallel paths are the products of a conventional connection as observed facilitating interconnection between two layers in a multilayered perceptron and the output of a network processing unit, a control element, that can assume the identity of a number of pre-existing processing paradigms.

The inherent property of universal approximation is tested by existence proof and the method found to be inconclusive. In so doing an argument is suggested to indicate that the parametric nature of the functions as determined by conditions upon initialization may only lead to conditional approximations. The property of universal approximation is neither, confirmed or denied. Universal approximation cannot be conclusively determined by the application of Stone Weierstrass Theorem, as adopted from real analysis.

This novel implementation requires modifications to component concepts and the training algorithm. The inspiration for these modifications is related back to previously published work that also provides the basis of proof of concept.

By achieving proof of concept the appropriateness of considering network topology without assessing the impact of the method of training on this topology is considered and discussed in some detail.

Results of limited testing are discussed with an emphasis on visualising component contributions to the global network output.

vi

Foreward
Partly because of its origins and partly because of the diverse fields it derives its contributions from the field of artificial neural computation itself rarely yields precise and concise definitions. These partials are also in no small way responsible for the lack of formalism that precludes this field being called a discipline.

Given this imprecision of definitions, this candidate proposes that a feed forward, artificial neural network approximates functions. Analytic functions can be derived empirically through the

study and complete characterisation of real world, physical systems or hypothetical, theoretical systems. Here the operative terms are complete characterisation. There are two aspects of feed forward artificial neural systems that render them more expedient and thereby possibly more attractive to the empirical derivation of analytic functions.

A feed forward, artificial neural network will approximate a function without complete evaluation and characterisation of the system of interest. This is possible when there is little information in the way of training data, when the input vector has missing dimensions and in the extreme case where no genuine, real world relationship exists between the input vector and the output vector.

The inclusion of the randomly initialised, weight vector attenuates the input signals to allow training to proceed when smooth, continuous transfer functions are employed and by providing a supplemental stochastic, parametric substrate in the absence of known empirical information.

These aspects combine to facilitate a single strategy. The feed forward artificial neural network is a means of mapping an input vector to an output vector by expanding solution space to create an almost infinite number of possible mappings, with the only constraint being the characteristics of the transfer functions chosen. The precision and accuracy of the mapping to the analytic function is purely arbitrary.

As suggested in the literature, a feed forward, artificial neural network arrives at a function approximation by performing a regression on a global error estimate when backward error propagation is used to train the weights. In that sense the conventional feed forward, artificial neural network yields a statistical model.

With this framework and a conventional set of feed forward, artificial neural network components the candidate has constructed a new network architecture. This is essentially a proof of concept thesis and as such it adopts a structure that takes the accepted and

vii

conventional ideas as outlined in the survey of the literature and distils these into a single proposition. This single proposition is then expanded upon with respect to implementation and an attempt is then made to link the new implementation back to the field.

With respect to the survey of the literature there are two trends that are expanded upon here. Firstly, published works that describe new topologies assume two forms. Either a deliberate and specific topology is provided and aspects of uniqueness are expounded or a generalisation of a topology is presented with collective properties of similarity explained. Then, secondly, works are completed emphasising one or two broad aspects from a set of four topical explorations. These explorations include divulging new topologies, creating of modifying

existing training paradigms, proving theoretically the performance of new topologies by robust mathematics and empirical proof of performance through cursory analytical methods.

This thesis appears, upon superficial sighting, to be well over the allowed limit of 100,000 words. Actually, the total word count for the ten chapters that comprise the body of work in this thesis is approximately 85,727 (as counted with the application provided as a function in MSWord 2000). A breakdown is tabulated below.

Chapter 1 2 3 4 5 6 7 8 9 10 Total

Number of words 2569 22237 4655 15033 12722 9558 10917 3716 689 3631 85727

Number of pages 7 66 17 44 49 31 38 14 3 10 279

An exact correlation between the number of pages and the number of words relates the accuracy of the word count. This is shown by plotting both the number of pages and the number of words as a function of each chapter.

Aside from the peripheral pages and the body of chapters, the bulk in this contribution to thesis size comes from the inclusion of an extensive set of appendices. Because of the recursive mathematics required to derive the delta rule, to improve readability a tabular summary of the delta rules derived is provided. excluded. The derivations themselves are significant and cannot be

viii

chapter statistics 5 4
log(stats)

3 2 1 0 1 2 3 4 5 6 7 8 9 10
chapter num ber log(w ords) log(pages)

All 2D-data was examined using the Krigging algorithm to generate a surface topography that is employed in this thesis as a control against which the output surface from the networks is compared. Proprietary software was used to yield the Krigging surface so the statistics

generated to do this are included as a separate appendix.

The Generic Singular Platform (GSP), The Higher-order Unit Network Testing Environment (H.U.N.T.E.) and scripting engine software required both an introduction and description. These have been provided in a separate appendix.

As suggested earlier in this Foreward, there are four topics of interest. Unconventionally, this thesis includes passages concerned with each of these topics. Chapter 3 describes a new topology. Chapter 5 modifies the existing backward error propagation paradigm. Part of chapter 6 is given to the issue of universal approximation to attempt a theoretical proof of performance by robust mathematics. Empirical proof of performance through software simulation is

completed in chapter 7 and chapter 8. It should be recognised by the reader that in an attempt to rationalise the size of this thesis tracts of research, although completed has been regrettably omitted. These include in depth evaluations of published mathematical analysis of network function. Comparisons were made of the published methodologies and attempts at formalism. The weight vector, its size and preferred weight states were studied. Better proofs of the

approximating capacities of feed forward constructs were summarized, as was the validity of the application of conventional statistics to network performance.

Chapter 2 and chapter 4 are concerned with reviewing the field. Because of the age and multidisciplinary nature of the field any survey of the literature, if it is to provide a sound foundation for critical discussion, will be relatively large. This is true of both chronology and

ix

number of citations referenced. These are certainly true in the case of this thesis. A histogram is provided below that relates the number of citations as a function of their year of publication. From the perspective of this thesis the review of the field begins with McCullochs and Pitts seminal work from 1943 and ends with articles from the year this work was completed, 2006.

citations as a function of year


20 18 16 14 12 10 8 6 4 2 0 1943 1947 1951 1955 1959 1963 1967 1971 1975 1979 1983 1987 1991 1995 1999 2003

number of publications

year of publication

The trend, as seen, in the histogram above shows a skewed distribution beginning in 1943. Sparse representation from the next three decades is observed. These are the works of

McCulloch and Pitts, Hebb, McCulloch alone, Rosenblatt, Widro and Specht. Although Minsky and Pappert belong here, it is their second edition from the early eighties that is cited in this thesis.

An almost Gaussian distribution is observed between 1985 and 2006. The centroid of this distribution occurs around 1997. This distribution provides the reader with a graphical indication of the temporal validity of information in the citations sampled.

Connectionism as a field itself, as evident from the survey of the literature, has always been controversial but there are elements among its practitioners that suggest it is chaotic, worried, in bad shape and misused [Roy00, SVS00, Thor94b and vdSH98]. These claims in themselves are nothing new. McCulloch and Pitts were criticised for failing to realize all of Booles logical operators. The criticisms of their detractors were tempered by temporal

knowledge of the biological state and the lack of understanding of the enormity of the task they were attempting. Later, in the sixties, the field was famously attacked by Minsky and Pappert, and went somewhat into recession. The advent of the digital computer as the hardware

substrate of choice sparked a resurgence that branched into the field of artificial neural computation.

Unfortunately, the attacks on the field today arise from the legacy of an unrealistic anthropomorphism that has stifled a more formal description and utilisation of massively parametric systems.

Conventionally, it is considered sufficient to establish a topology and a means of adapting that topology, randomly initialise the system, provide training data and iteratively adapt the initial state to achieve some predefined accepted state. This simple statement has raised a number of questions that this candidate is unable to answer.

With software simulation being the conduit of choice where topological performance is concerned, what software language should be used to establish the desired topology? What functionality is available both in the software and the hardware that the software is running on? This includes the identity and quality of the random number generator used to initialise the system. How are network parameters arrived at? This candidate only knows of empirical or rule of thumb decision structures used for this. How should the data be subdivided to train and test the system without introducing user, defined bias? Is it appropriate to apply parametric statistics to the results obtained form stochastic processes? Can the Central Limit Theorem (CLT) sufficiently address the outcome of a randomly initialised system? parametric statistical regimen be (more correctly) applied? Should a non-

In the light of these and other no less important questions that this candidates research fails to find answered in the publications produced by the field, the candidate asserts that there exists no proper methodology for comparing the success or otherwise of topologies against each other. This is also true of comparison of variants of a single topology. What is left then is only convention. With that realisation, this thesis does not adopt a conventional approach to

convergence testing.

Chapter 6 describes a number of hypotheses concerning the performance of the topology introduced in earlier chapters in the context of data sets specifically chosen for inherent characteristics. Chapter 7 and chapter 8 provide empirical insight into the accuracy of the hypotheses postulated in chapter 6. Rather than discuss convergence statistics the candidate suggests that it is of more use to examine the system while it is still dynamic and so performance statistics are sacrificed for an examination of the system while it is still being trained.

Finally, this thesis is clearly a large document. This is the case because there is an amount of redundancy contained herein. The redundancy is retained by design as it is quite possible to relate some of the concepts either by written language, diagrams and schematics or mathematics, often a combination of these was used to relate and reinforce the ideas presented. None the less this candidate is proud of the outcome.

xi

Contents

1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 1.10

INTRODUCTION PREAMBLE SURVEY OF THE LITERATURE THE FEED FORWARD PARADIGM AND CHANNEL CONNECTIVITY TRAINING CHANNEL ARCHITECTURE CONVERGENCE HYPOTHETICAL PROPERTIES OF CHANNELS THE MICRO-NET AND SYNTHETIC DATA THE MICRO-NET AND REAL DATA CONCLUSIONS IMPLICATIONS AND EXTENSIONS

1-1 1-1 1-2 1-4 1-5 1-5 1-6 1-6 1-7 1-7 1-7

2 SURVEY OF THE LITERATURE PREAMBLE MONOLITHIC NEURAL PROCESSING MODELS 2.2.1 MCCULLOCH PITTS NEURONS 2.2.2 WIDROW AND ADALINE 2.2.3 WIDROW AND MADALINE 2.2.4 ADALINE AND TWO DIMENSIONAL FEATURE SPACE 2.2.5 INTERNAL REPRESENTATION AND MCCULLOCH NEURONS 2.2.6 HIGHER ORDER NEURAL PROCESSING MODELS 2.2.7 SIGMA-PI ARTIFICIAL NEURAL NETWORKS 2.2.8 PI-SIGMA ARTIFICIAL NEURAL NETWORKS 2.2.9 TREE STRUCTURED ARTIFICIAL NEURAL NETWORKS 2.2.9.1 Cotters Tree Network 2.2.9.2 Heinzs Tree Network 2.2.9.3 Friedmans TreeNet 2.3 POLYMEROUS NEURAL PROCESSING MODELS 2.3.1 MODULAR ARTIFICIAL NEURAL NETWORKS 2.3.2 RIDGE POLYNOMIAL NETWORKS 2.3.3 THE CHARACTERIZATION AND LEARNING MODULE 2.3.4 MODULARITY IN SPITE OF MUHLENBEIN 2.3.5 THE COOPERATIVE MODULAR NEURAL NETWORK CLASSIFIER 2.3.6 MOTIVATIONS FOR MODULARITY 2.3.6.1 Biological motivation 2.3.6.2 Psychological motivation 2.3.6.3 Computational motivation 2.3.6.4 Decision making motivation 2.3.6.5 Embedded hardware motivation 2.4 CONNECTIVITY SUMMARY 2.4.1 MCCULLOCH DEFINES THE SCOPE OF NEURAL COMPUTATION 2.4.2 DISCRIMINATOR AND LIMITATION ARE SYNONYMS 2.4.3 GENERATING NON-LINEAR DISCRIMINATORS 2.1 2.2

2-1 2-1 2-2

2-3 2-5 2-8 2-9 2-10 2-12 2-14 2-15 2-17 2-18 2-19 2-21
2-22

2-23 2-24 2-26 2-28 2-28 2-29 2-29 2-30 2-30 2-32 2-32
2-32

2-33 2-33 2-33

xii

2.4.4 2.4.5 2.4.6 2.4.7


2.5.

GENERATING MULTIPLE DISCRIMINATORS IN MONOLITHS TREES MODULARITY AS AN ALTERNATIVE TO THE MONOLITH MOTIVE AND INTENT

2-34 2-34 2-35 2-36


2-38

2.5.1 2.5.2

FEED FORWARD PARADIGMS AND COMPUTATIONALLY DRIVEN MODIFICATIONS TOPOLOGY NETWORK STRUCTURE 2.5.2.1 Input considerations and structure 2.5.2.2 Network weights and structure 2.5.2.3 Inter-neural mathematical operators and structure 2.5.2.4 Intra-neural mathematical operators and structure 2.5.2.5 Transfer functions and structure 2.5.3 TRAINING REGIMEN 2.5.3.1 Backpropagation 2.5.3.2 Backpropagation: the Parallel Distributed Processing form 2.5.3.2 Gradient descent 2.5.3.3 The delta rule for semi-linear activation functions in MLP networks 2.5.4 LEARNING EFFICIENCY 2.5.5 GENERALISATION 2.6 SUMMARY OF COMPUTATIONAL CONSIDERATIONS 2.6.1 TOPOLOGY CAN BE ABSTRACTED 2.6.2 NETWORK STRUCTURE HAS MANY CONTRIBUTORS 2.6.3 TRAINING UNDER SUPERVISION 2.6.4 BACKPROPAGATION CAN BE OPTIMIZED AND ACCELERATED 2.6.5 GENERALISATION AS A NEURAL NETWORK CONCEPT 2.7 CHAPTER SUMMARY

2-38 2-43 2-43 2-46 2-49 2-50 2-50 2-51 2-52 2-53 2-54 2-54 2-59 2-61
2-63

2-63 2-64 2-65 2-65 2-65


2-66

3 3.1 3.2 3.3 3.4 3.5

CHANNEL CONNECTIVITY

3-1 3-1 3-1 3-2 3-3 3-4

PREAMBLE THE POLYNOMIAL DISCRIMINANT THE DISCRIMINANT OF THE HIGHER-ORDER PROCESSING UNIT COROLLARY THE PARALLEL PATH ARCHITECTURE 3.5.1 LATERAL STRATIFICATION VERSUS LONGITUDINAL SEGREGATION 3.5.2 THE MICRO-NET CHANNEL

3.5.2.1 The Output of the Control Neuron 3.5.2.2 The Output of the Network Output Neuron 3.5.2.3 Choice of Transfer Function and the Order of Input Representation 3.5.3 THE MICRO-NET PATH 3.5.3.1 The Control Neuron Output when a Path Contains Multiple Channels 3.5.3.2 The Output of a Micro-net when a Path Contains Multiple Channels 3.5.4 THE MULTIPLE PATHS IN A MICRO-NET 3.5.5 THE FUNCTIONALITY OF THE CONTROL ELEMENT 3.5.5.1 Unsuitable Feed Forward Paradigms for Control Element Generation 3.5.5.2 MLP Architecture for Control Element Generation 3.5.6 THE FUNCTIONALITY OF THE MICRO-NET
3.6 SUMMARY

3-4 3-4 3-6 3-7 3-8 3-9 3-10 3-11 3-11 3-13 3-13 3-14 3-15
3-16

xiii

4 4.1 4.2 4.3 4.4 4.5 4.6 4.7

THE MICRO-NET AND OTHER PARADIGMS

4-1 4-1 4-1 4-3 4-5 4-8 4-9 4-10

PREAMBLE SIMILARITY AND EQUIVALENCE BASED ON TOPOLOGY MA AND ORPONEN SUGGEST A NEURAL NETWORK TAXONOMY EXTENDING AN ESTABLISHED COMPUTATIONAL TAXONOMY DEVIATIONS FROM THE ESTABLISHED COMPUTATIONAL TAXONOMY COMPUTATIONAL TAXONOMIC INFLUENCES ON UNIQUENESS MICRO-NET UNIQUENESS 4.7.1 CANDIDATE NETWORKS SIMILAR TO THE MICRO-NET 4.7.2 ELIMINATION OF CANDIDATES ON COMPUTATIONAL TAXONOMY 4.7.3 OTHER NEURAL NETWORK MODELS NOT PRECEPTRON BASED

4.7.3.1 Winner-Take-All Strategies are not Micro-net Compatible 4.7.4 FIRST-ORDER MODULAR PROBABILISTIC NEURAL NETWORK MODELS 4.7.4.1 On Combining Multiple Probabilistic Classifiers 4.7.5 HIGHER-ORDER MODULAR PROBABILISTIC NEURAL NETWORK MODELS 4.7.5.1 The Polynomial Discriminant Method 4.7.5.2 Products of Experts 4.7.6 MICRO-NETS AND PROBABILISTIC NEURAL COMPUTATION 4.7.7 DETERMINISTIC FIRST-ORDER FEED FORWARD MONOLITHS 4.7.7.1 Adaptive Spline Neural Networks 4.7.8 THE MICRO-NET AND FIRST-ORDER FEED FORWARD MONOLITHS 4.7.9 DETERMINISTIC HIGHER-ORDER FEED FORWARD MONOLITHS 4.7.9.1 Specht and PADALINE 4.7.9.2 Adaptive Polynomial Neural Networks 4.7.9.3 Polynomial Higher-Order Neural Networks 4.7.9.4 Trigonometric Higher-Order Neural Networks 4.7.9.5 Neuron Adaptive Higher-Order Neural Networks 4.7.10 THE MICRO-NET AND DETERMINISTIC HIGHER-ORDER FEED FORWARD MONOLITHS 4.7.11 THE MICRO-NET AND FIRST-ORDER FEED FORWARD MODULAR APPROACHES 4.7.11.1 Localized Threshold Decomposition 4.7.11.2 Parallel and Modular Multi-Sieving 4.7.11.3 Successive Linearization 4.7.11.4 Modularity Through Diverse Neural Network Composites 4.7.11.5 Piecewise Linear Neural Networks 4.7.11.6 Function Approximation by Integral Representation 4.7.11.7 Piecewise Linear Sigmoidal Neural Networks 4.7.12 THE MICRO-NET AND HIGHER-ORDER FEED FORWARD MODULAR APPROACHES 4.7.12.1 The Ridge Polynomial Extension 4.7.12.2 The Functional Link Network 4.7.12.3 Non-linear Vector Space Connectionist Network 4.7.12.4 The Synaptic Modulation Artificial Neural Network 4.7.12.5 Adaptive Feedback Linearization 4.7.12.6 Tensor Product Artificial Neural Networks 4.7.12.7 Neuron-Adaptive Higher Order Neural Network Groups 4.7.13 THE MICRO-NET AND SINGLE PERCEPTRON PARADIGMS
MICRO-NET AND SIMILAR TOPOLOGIES 4.8.1 MCCULLOCHS SOLUTIONS TO XOR AND ITS COMPLEMENT 4.8

4-10 4-11 4-13 4-13 4-14 4-14 4-15 4-15 4-16 4-16 4-16 4-17 4-17 4-18 4-19 4-19 4-19 4-20 4-20 4-21 4-21 4-21 4-22 4-22 4-24 4-24 4-24 4-25 4-26 4-26 4-27 4-28 4-28 4-29 4-29 4-30 4-30
4-31

4-31

xiv

4.8.2 RAUF, AHMAD AND SUCCESSIVE LINEARIZATION 4.8.3 TENSOR PRODUCT NEURAL NETWORKS
4.9 SUMMARIZING SIMILAR TOPOLOGIES 4.10 MODIFYING THE FUNDAMENTALS OF BACKWARD ERROR PROPAGATION 4.10.1 BACKPROPAGATION AND THE INTRACTABLE MICRO-NET 4.10.2 DEALING WITH A NON-EXISTENT CONDUIT FOR ERROR PROPAGATION

4-32 4-35
4-37 4-38

4.10.2.1 Component Block Backward Error Propagation 4.10.2.2 On Component Blocks and a Punctuated Error Conduit
4.11 CHAPTER SUMMARY

4-39 4-40 4-41 4-43


4-43

5 5.1 5.2 5.3 5.4 5.5

CHANNELS AND THE GENERALIZED DELTA RULE PREAMBLE THE GENERALISED DELTA RULE AND LINEAR SUMMING CONTROL UNITS CHOICE OF ERROR ESTIMATE BACKWARD ERROR PROPAGATION IN THE MICRO-NET CONTEXT THE GENERALISED DELTA RULE AND LINEAR SUMMING UNITS IN A SINGLE CHANNEL THE GENERALISED DELTA RULE AND LINEAR SUMMING CONTROL UNITS THE GENERALISED DELTA RULE AND THE BIAS WEIGHT OF THE LSCU THE GENERALISED DELTA RULE AND CHANNEL WEIGHTS THE GENERALISED DELTA RULE AND THE BIAS WEIGHT OF THE OUTPUT CHANNEL CONNECTIVITY AND NON-LINEAR SUMMING UNITS THE GENERALISED DELTA RULE AND NON-LINEAR SUMMING UNITS UPDATING THE BIAS WEIGHT OF THE CONTROL NEURON UPDATING THE CHANNEL WEIGHT UPDATING THE OUTPUT NODE BIAS WEIGHT ERROR PROPAGATION IN CHANNELS WITH A SQUASHED CONVENTIONAL CONNECTION UPDATING A CONTROL NEURON WEIGHT IN A CHANNEL: SQUASHED CASE UPDATING THE BIAS WEIGHT OF THE CONTROL NEURON: SQUASHED CASE UPDATING A CONTROL NEURON WEIGHT IN A CHANNEL: SQUASHED CASE UPDATING THE BIAS WEIGHT OF THE OUTPUT NODE: SQUASHED CASE COMPONENT BLOCK BACKWARD ERROR PROPAGATION IS RECURSIVE COMPONENT BLOCK BACKWARD ERROR PROPAGATION IS IGNORANT OF CREDIT

5-1 5-1 5-2 5-5 5-6 5-8

5.5.1 5.5.2 5.5.3 5.5.4 LSU


5.6

5-9 5-13 5-15 5-18


5-20

5.6.1 5.6.2 5.6.3 5.6.4


5.7

5-21 5-26 5-26 5-30


5-33

5.7.1 5.7.2 5.7.3 5.7.4

5-35 5-38 5-38 5-40


5-40 5-47 5-47 5-48

5.8 5.9 ASSIGNMENT 5.10 THE OUTPUT BIAS WEIGHT AS AN ERROR SINK 5.11 CHAPTER SUMMARY

6 6.1 6.2

HYPOTHETICAL PROPERTIES OF CHANNELS

6-1 6-1 6-2

PREAMBLE UNIVERSAL APPROXIMATION BY EXISTENCE PROOFS 6.2.1 UNIVERSAL APPROXIMATION AND MICRO-NET OUTPUT CLASSES 6.2.2 MICRO-NET AND UNIVERSAL APPROXIMATION BY CONSTRUCTIVE PROOFS 6.2.3 UNIVERSAL APPROXIMATION BY THE STONE-WEIERSTRASS THEOREM 6.2.4 MICRO-NET OUTPUT AND THE STONE-WEIERSTRASS EXISTENCE PROOF

6.2.4.1 Computation of the Identity Function 6.2.4.2 Separability 6.2.4.3 Algebraic Closure 6.2.5 UNIVERSAL APPROXIMATION BY INHERITANCE

6-3 6-4 6-4 6-5 6-5 6-6 6-6 6-9

xv

6.2.6 ON THE APPROPRIATENESS OF UNIVERSAL APPROXIMATION


LIMITATIONS OF COMPONENT BLOCK BACKWARD ERROR PROPAGATION ISSUES ASSOCIATED WITH PERFORMANCE TESTING 6.4.1 THE INPUT VECTOR AND LEVELS OF RESOLUTION 6.3 6.4

6-10
6-11 6-12

6.4.1.1 Input Vectors of a Single Dimension 6.4.1.2 Input Vectors of Two Dimensions 6.4.1.3 High Dimensional Input Vectors 6.4.2 ON THE SIGNIFICANCE OF COLLECTING PERFORMANCE STATISTICS 6.4.3 MICRO-NET IMPLEMENTATION VERSUS MICRO-NET IMPLEMENTATION 6.4.4 PRESENTATION OF TRAINING EXEMPLARS 6.4.5 KRIGING AS A STATISTICAL STANDARD REPRESENTATION OF FEATURE SPACE 6.4.6 HYPOTHESES DRAWN FROM THE SYNTHETIC DATA USED 6.4.6.1 Modeling Linearity with a Non-linear System 6.4.6.2 Modified Philpots Four Shapes 6.4.6.3 Eight Spirals 6.4.6.4 McNames Ramp and Hill 6.4.7 HYPOTHESES DRAWN FROM THE REAL DATA USED 6.4.7.1 Nuclear Binding Energy and the Periodic Table 6.4.7.2 The Relationship Between Intelligence Quotient and Brain Size 6.4.7.3 Percentage Body Fat as a Function of Fourteen Physiological Indicators
6.5 CHAPTER SUMMARY

6-12 6-12 6-13 6-13 6-13 6-14 6-14 6-15 6-15 6-16 6-18 6-19 6-22 6-23 6-23 6-27 6-28
6-32

7 7.1 7.2

THE MICRO-NET AND 2D-INPUT VECTORS

7-1 7-1 7-2

PREAMBLE LINEARITY AND THE MICRO-NET NONLINEAR DISCRIMINATOR 7.2.1 NONLINEAR APPROXIMATION OF THE FUNCTION OF A STRAIGHT LINE

7.2.1.1 G.S.P. and Functions of a Single Variable 7.2.1.2 Paths Each of Two Channels
7.2.1.2.1 7.2.1.2.2 7.2.1.2.3 7.2.1.2.4 7.2.1.2.5 7.2.1.2.6 7.2.1.2.7 7.3 The MLP Instance Visual Inspection of MLP Components MLP Approximation of the Function of a Straight Line The Standard Micro-net Implementation The Micro-net Approximation of the Function of a Straight Line A Micro-net Implementation with a Squashed Conventional Connection Squashed Conventional Connections and the Function of a Straight Line

7-2 7-3 7-4


7-5 7-5 7-6 7-8 7-10 7-14 7-15

7.2.1.3 The Micro-net and Functions of a Single Variable


CLASSIFICATION OF PHILPOTS FOUR SHAPES 7.3.1 FOUR SHAPES WITH SQUASHED CHANNEL CONVENTIONAL CONNECTIONS

7-18
7-18

7.3.1.1 7.3.1.2 7.3.1.3 7.3.1.4


7.4

Micro-net Component Output Functions Global Output Functions with Squashed Conventional Connections Global Output Functions with Conventional Connections Output Functions and the Modified Four Shapes

7-21 7-22 7-25 7-28 7-31


7-32

FUNCTION APPROXIMATION AND THE EIGHT SPIRALS

7.4.1 EIGHT SPIRALS WITH SQUASHED CHANNEL CONVENTIONAL CONNECTIONS 7.4.1.1 Component Output Functions
7.5 CHAPTER SUMMARY

7-33 7-33
7-37

TESTING WITH HIGHER ORDER INPUT VECTORS

8-1

xvi

PREAMBLE MODELLING THE PERIODIC DISTRIBUTION OF ISOTOPES 8.2.1 CHANNEL RESOURCES AND NETWORK TOPOLOGY 8.3 MCNAMES RAMP AND HILL FEATURE SPACE 8.3.1 RAMP AND HILL AND THE H.U.N.T.E. SCRIPTING ENGINE

8.1 8.2

8-1 8-1

8-4
8-5

8.3.1.1 8.3.1.2 8.3.1.3 8.3.1.4


8.4

8-6 Squashed Versus Conventional Connections and Micro-net Topology 8-6 The Cumulative Error Field 8-7 The Cumulative Error Field of a Micro-net with Squashed Connections 8-7 The Cumulative Error Field of a Micro-net with Conventional Connections 8-8
8-9

BRAIN IQ AS A FUNCTION OF BRAIN DIMENSIONS

8.4.1 MICRO-NET TOPOLOGY 8.4.1.1 Biological Variation and the Micro-net


8.5 BODY FAT AS A FUNCTION OF BODY DIMENSIONS

8-10 8-10
8-12

8.5.1 MICRO-NET TOPOLOGY 8.5.1.1 The Micro-net and a Fourteen Dimensional Input Vector
8.6 CHAPTER SUMMARY

8-13 8-13
8-14

9 9.1 9.2 9.3 9.4 9.5

CONCLUSIONS PREAMBLE NETWORK TOPOLOGY NETWORK TRAINING NETWORK TESTING CHAPTER SUMMARY

9-1 9-1 9-1 9-2 9-2 9-2

10 10.1 10.2

POSSIBLE EXTENSIONS OF THE MICRO-NET

10-1 10-1 10-1

PREAMBLE NETWORK TOPOLOGY 10.2.1 CHANNEL RESOURCES AND NETWORK TOPOLOGY

10.2.1.1 Channels with Multiple Control Mechanisms 10.2.1.2 Control Mechanism Resources 10.2.2 PATH RESOURCES AND NETWORK TOPOLOGY 10.2.3 MICRO-NET LATTICES AND NETWORK TOPOLOGY 10.2.4 MICRO-NET INTERCONNECTION WITHIN NETWORK LATTICES 10.2.5 MICRO-NET INTERCONNECTION BETWEEN NETWORK LATTICES 10.3.1 10.3.2 10.3.3 10.3.4
ANALYSIS OF NETWORK FUNCTION ANALYSIS OF FUNCTION CLASSES UNIVERSAL APPROXIMATION THE VAPNIK-CHERVONENKIS DEMENSION VISUALIZATION AS AN INTERROGATOR OF MICRO-NET FUNCTION 10.4 PARAMETRIC SENSITIVITY 10.4.1 RANDOM INITIALIZATION 10.4.2 LEARNING RATE (STEP SIZE) 10.4.3 TRANSFER FUNCTIONS 10.5 MICRO-NET TRAINING 10.6 MICRO-NET APPLICATIONS 10.7 HARDWARE IMPLEMENTATIONS 10.8 CHAPTER SUMMARY 10.3

10-1 10-1 10-2 10-2 10-3 10-4 10-4


10-4

10-5 10-5 10-6 10-6


10-7

10-7 10-7 10-8


10-8 10-9 10-9 10-10

xvii

BIBLIOGRAPHY APPENDICES

xviii

You might also like